> Hi Everyone,
> The next Wikimedia Research Showcase will be live-streamed Wednesday,
> July 11, 2018 at 11:30 AM (PDT) 18:30 UTC.
> YouTube stream: https://www.youtube.com/watch?v=uK7AvNKq0sg >
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here.
> Hope to see you there!
> This month's presentations:
> Mind the (Language) Gap: Neural Generation of Multilingual Wikipedia
> Summaries from Wikidata for ArticlePlaceholdersBy *Lucie-Aimée Kaffee*While
> Wikipedia exists in 287 languages, its content is unevenly distributed
> among them. It is therefore of the utmost social and cultural interests to
> address languages for which native speakers have only access to an
> impoverished Wikipedia. In this work, we investigate the generation of
> summaries for Wikipedia articles in underserved languages, given structured
> data as an input.
> In order to address the information bias towards widely spoken languages,
> we focus on an important support for such summaries: ArticlePlaceholders,
> which are dynamically generated content pages in underserved Wikipedia
> versions. They enable native speakers to access existing information in
> Wikidata, a structured Knowledge Base (KB). Our system provides a
> generative neural network architecture, which processes the triples of the
> KB as they are dynamically provided by the ArticlePlaceholder, and generate
> a comprehensible textual summary. This data-driven approach is tested with
> the goal of understanding how well it matches the communities' needs on two
> underserved languages on the Web: Arabic, a language with a big community
> with disproportionate access to knowledge online, and Esperanto.
> With the help of the Arabic and Esperanto Wikipedians, we conduct an
> extended evaluation which exhibits not only the quality of the generated
> text but also the applicability of our end-system to any underserved
> Wikipedia version. Token-level change tracking: data, tools and insights
> By *Fabian Flöck*This talk first gives an overview of the WikiWho
> infrastructure, which provides tracking of changes to single tokens
> (~words) in articles of different Wikipedia language versions. It exposes
> APIs for accessing this data in near-real time, and is complemented by a
> published static dataset. Several insights are presented regarding
> provenance, partial reverts, token-level conflict and other metrics that
> only become available with such data. Lastly, the talk will cover several
> tools and scripts that are already using the API and will discuss their
> application scenarios, such as investigation of authorship, conflicted
> content and editor productivity.