Updating Wikipedia based on Wikidata changes

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Updating Wikipedia based on Wikidata changes

Denny Vrandečić
Hi,

sorry for another long Email today.

Currently, when you change a Wikidata item, its associated Wikipedia
articles get told to update, too. So your change to the IMDB ID of a movie
in Wikidata will be pushed to all language versions of that article on
Wikipedia. Yay!

There are two use cases that currently are not possible:

* a Wikipedia article on a city might display the mayor. Now someone
changes on Wikidata the label of the mayor - the Wikipedia article will get
updated the next time the page is rendered, but there is no active update
of the page.

* a Wikipedia article might want to include data about another item than
the associated item - most importantly for references, where I might be
interested in the author of a book, it's year of publication, etc. This
feature is currently disabled (even though it would be trivial to switch it
on) because this information would only get updated when the page is
actively rerendered.

In order to enable these use cases we need to track on which pages (on
Wikipedia) an item (from Wikidata) is used. We are thinking of doing this
in two tables:

* EntityUsage: one table per client. It has two columns, one with the
pageId and one with the entityId, indexed on both columns (and one column
with a pk, I guess, for OSC).

* Subscriptions: one table on the client. It has two columns, one with the
pageId and one with the siteId, indexed on both columns (and one column
with a pk, I guess, for OSC).

EntityUsage is a potentially big table (something like pagelinks-size).

On a change on Wikidata, Wikidata consults the Subscriptions table, and
based on that it dispatches the changes to all clients listed there for a
given change. Then the client receives the changes and based on the
EntityUsage table performs the necessary updates.

We wanted to ask for input on this approach, and if you see problems or
improvements that we should put in.

Cheers,
Denny


--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Updating Wikipedia based on Wikidata changes

Denny Vrandečić
Small correction.


2013/7/22 Denny Vrandečić <[hidden email]>

> * Subscriptions: one table on the client. It has two columns, one with the
> pageId and one with the siteId, indexed on both columns (and one column
> with a pk, I guess, for OSC).
>
>
That's entityId -> siteId, not pageId to siteId.


--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Updating Wikipedia based on Wikidata changes

Denny Vrandečić
Another correction, same line. Gosh, it's hot here. Brain not working. Me
off home.



2013/7/22 Denny Vrandečić <[hidden email]>

> 2013/7/22 Denny Vrandečić <[hidden email]>
>
>> * Subscriptions: one table on the client. It has two columns, one with
>> the pageId and one with the siteId, indexed on both columns (and one column
>> with a pk, I guess, for OSC).
>>
>>
> That's entityId -> siteId, not pageId to siteId.
>
>
And that's repo. Not client.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Updating Wikipedia based on Wikidata changes

Sean Pringle
In reply to this post by Denny Vrandečić
On Tue, Jul 23, 2013 at 1:42 AM, Denny Vrandečić <
[hidden email]> wrote:

>
> * EntityUsage: one table per client. It has two columns, one with the
> pageId and one with the entityId, indexed on both columns (and one column
> with a pk, I guess, for OSC).


> * Subscriptions: one table on the client. It has two columns, one with the
> pageId and one with the siteId, indexed on both columns (and one column
> with a pk, I guess, for OSC).
>
> EntityUsage is a potentially big table (something like pagelinks-size).
>
> On a change on Wikidata, Wikidata consults the Subscriptions table, and
> based on that it dispatches the changes to all clients listed there for a
> given change. Then the client receives the changes and based on the
> EntityUsage table performs the necessary updates.
>
> We wanted to ask for input on this approach, and if you see problems or
> improvements that we should put in.
>

Sounds OK to me.

Will (or could) pageId/entityId and pageId/siteId have unique constraints?

BR
Sean
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Updating Wikipedia based on Wikidata changes

Denny Vrandečić
No, neither table would have unqiueness constraints (besides the primary
keys).


2013/7/29 Sean Pringle <[hidden email]>

> On Tue, Jul 23, 2013 at 1:42 AM, Denny Vrandečić <
> [hidden email]> wrote:
>
> >
> > * EntityUsage: one table per client. It has two columns, one with the
> > pageId and one with the entityId, indexed on both columns (and one column
> > with a pk, I guess, for OSC).
>
>
> > * Subscriptions: one table on the client. It has two columns, one with
> the
> > pageId and one with the siteId, indexed on both columns (and one column
> > with a pk, I guess, for OSC).
> >
> > EntityUsage is a potentially big table (something like pagelinks-size).
> >
> > On a change on Wikidata, Wikidata consults the Subscriptions table, and
> > based on that it dispatches the changes to all clients listed there for a
> > given change. Then the client receives the changes and based on the
> > EntityUsage table performs the necessary updates.
> >
> > We wanted to ask for input on this approach, and if you see problems or
> > improvements that we should put in.
> >
>
> Sounds OK to me.
>
> Will (or could) pageId/entityId and pageId/siteId have unique constraints?
>
> BR
> Sean
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l