Research Showcase March 21, 2018 (11:30 AM PDT | 18:30 UTC)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Research Showcase March 21, 2018 (11:30 AM PDT | 18:30 UTC)

Sarah Rodlund
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, March 21,
2018 at 11:30 AM (PDT) 18:30 UTC.

YouTube stream:  https://www.youtube.com/watch?v=ACevHs0sMMw

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2018>.


Over the past years, the Research team at Wikimedia Foundation and some of
our formal collaborators have been focused on doing research and building
technologies that can help editors across Wikimedia languages find tasks
for contributions. While the early effort was heavily focused on article
recommendation for creation (horizontal expansion), in 2016 we started a
new direction of research with a focus on vertical expansion of Wikipedia
articles. The two talks in the March 2018 Research Showcase will share some
of what we have learned from this research. More specifically, we will talk
about Wikipedia category network as a great signal for creating
templates/structures for Wikipedia articles as well as ongoing research to
learn what content (sections) are missing from Wikipedia across its many
languages. The two corresponding abstracts with more details are below.
Join us! :)


Using Wikipedia categories for research: opportunities, challenges, and
solutionsBy *Tiziano Piccardi, EPFL*The category network in Wikipedia is
used by editors as a way to label articles and organize them in a
hierarchical structure. This manually created and curated network of 1.6
million nodes in English Wikipedia generated by arranging the categories in
a child-parent relation (i.e., Scientists-People, Cities-Human Settlement)
allows researchers to infer valuable relations between concepts. A clean
structure in this format would be a valuable resource for a variety of
tools and application including automatic reasoning tools. Unfortunately,
Wikipedia category network contains some "noise" since in many cases the
association as subcategory does not define an is-a relation (Scientists
is-a People vs. Billionaires‎ is-a Wealth). Inspired to develop a model for
recommending sections to be added to the already existing Wikipedia
articles, we developed a method to clean this network and to keep only the
categories that have a high chance to be associated with their children by
an is-a relation. The strategy is based on the concept of "pure"
categories, and the algorithm uses the types of the attached articles to
determine how homogenous the category is. The approach does not rely on any
linguistic feature and therefore is suitable for all Wikipedia languages.
In this talk, we will discuss the high-level overview of the algorithm and
some of the possible applications for the generated network beyond article
section recommendations.


Beyond Automatic Translation: Aligning Wikipedia sections across multiple
languagesBy *Diego Saez-Trumper*Sections are the building blocks of
Wikipedia articles. For editors, they can be used as an entry point for
creating and expanding articles. For readers, they enhance readability of
Wikipedia content. In this talk, we present an ongoing research to align
article sections across Wikipedia languages. We show how the available
technology for automatic translations are not good enough for translating
section titles. We then show a complementary approach for section
alignment, using Wikidata and cross-lingual word embeddings. We will
present some of the use-cases of a methodology for aligning sections across
languages, including improved section recommendation, especially in medium
to smaller size languages where the language itself may not contain enough
signal about the structure of the articles and signals can be inferred from
other larger Wikipedia languages.

Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
[hidden email]
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Research Showcase March 21, 2018 (11:30 AM PDT | 18:30 UTC)

Sarah Rodlund
Hi Everyone,

Just a reminder -- this is beginning in a half hour. Hope to see you there!

On Mon, Mar 19, 2018 at 1:54 PM, Sarah R <[hidden email]> wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday, March 21,
> 2018 at 11:30 AM (PDT) 18:30 UTC.
>
> YouTube stream:  https://www.youtube.com/watch?v=ACevHs0sMMw
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2018>.
>
>
> Over the past years, the Research team at Wikimedia Foundation and some of
> our formal collaborators have been focused on doing research and building
> technologies that can help editors across Wikimedia languages find tasks
> for contributions. While the early effort was heavily focused on article
> recommendation for creation (horizontal expansion), in 2016 we started a
> new direction of research with a focus on vertical expansion of Wikipedia
> articles. The two talks in the March 2018 Research Showcase will share some
> of what we have learned from this research. More specifically, we will talk
> about Wikipedia category network as a great signal for creating
> templates/structures for Wikipedia articles as well as ongoing research to
> learn what content (sections) are missing from Wikipedia across its many
> languages. The two corresponding abstracts with more details are below.
> Join us! :)
>
>
> Using Wikipedia categories for research: opportunities, challenges, and
> solutionsBy *Tiziano Piccardi, EPFL*The category network in Wikipedia is
> used by editors as a way to label articles and organize them in a
> hierarchical structure. This manually created and curated network of 1.6
> million nodes in English Wikipedia generated by arranging the categories in
> a child-parent relation (i.e., Scientists-People, Cities-Human Settlement)
> allows researchers to infer valuable relations between concepts. A clean
> structure in this format would be a valuable resource for a variety of
> tools and application including automatic reasoning tools. Unfortunately,
> Wikipedia category network contains some "noise" since in many cases the
> association as subcategory does not define an is-a relation (Scientists
> is-a People vs. Billionaires‎ is-a Wealth). Inspired to develop a model for
> recommending sections to be added to the already existing Wikipedia
> articles, we developed a method to clean this network and to keep only the
> categories that have a high chance to be associated with their children by
> an is-a relation. The strategy is based on the concept of "pure"
> categories, and the algorithm uses the types of the attached articles to
> determine how homogenous the category is. The approach does not rely on any
> linguistic feature and therefore is suitable for all Wikipedia languages.
> In this talk, we will discuss the high-level overview of the algorithm and
> some of the possible applications for the generated network beyond article
> section recommendations.
>
>
> Beyond Automatic Translation: Aligning Wikipedia sections across multiple
> languagesBy *Diego Saez-Trumper*Sections are the building blocks of
> Wikipedia articles. For editors, they can be used as an entry point for
> creating and expanding articles. For readers, they enhance readability of
> Wikipedia content. In this talk, we present an ongoing research to align
> article sections across Wikipedia languages. We show how the available
> technology for automatic translations are not good enough for translating
> section titles. We then show a complementary approach for section
> alignment, using Wikidata and cross-lingual word embeddings. We will
> present some of the use-cases of a methodology for aligning sections across
> languages, including improved section recommendation, especially in medium
> to smaller size languages where the language itself may not contain enough
> signal about the structure of the articles and signals can be inferred from
> other larger Wikipedia languages.
>
> Sarah R. Rodlund
> Senior Project Coordinator-Product & Technology, Wikimedia Foundation
> [hidden email]
>
>
>
>


--
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation | Hic
sunt leones
[hidden email]


*“Our lives begin to end the day we become silent about things that
matter.”  ~ Martin Luther King Jr
<https://www.goodreads.com/author/show/23924.Martin_Luther_King_Jr_>*
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l