Use cases for Sites handling change (Re: Wikidata blockers weekly update)

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Rob Lanphier-4
Hi everyone,

I'm starting a separate thread, because this is an important topic and
I don't think it's well served as a subtopic of a "Wikidata blockers"
thread.

To recap, Jeroen submitted changeset 14295 in Gerrit
<https://gerrit.wikimedia.org/r/#/c/14295/> with the following
summary:
> This commit introduces a new table to hold site data and configuration,
> objects to represent the table, site objects and lists of sites and
> associated tests.

> The sites code is a more generalized and less contrived version of the
> interwiki code we currently have and is meant to replace it eventually.
> This commit does not do away with the existing interwiki code in any way yet.

> The reasons for this change where outlined and discussed on wikitech here:
> http://lists.wikimedia.org/pipermail/wikitech-l/2012-June/060992.html

Thanks Brian for summarizing an important point:

On Fri, Aug 10, 2012 at 6:33 AM, bawolff <[hidden email]> wrote:

> First and foremost, I'm a little confused as to what the actual use
> cases here are. Could we get a short summary for those who aren't
> entirely following how wikidata will work, why the current interwiki
> situation is insufficient? I've read the I0a96e585 and
> http://lists.wikimedia.org/pipermail/wikitech-l/2012-June/060992.html,
> but everything seems very vague "It doesn't work for our situation",
> without any detailed explanation of what that situation is. At most
> the messages kind of hint at wanting to be able to access the list of
> interwiki types of the wikidata "server" from a wikidata "client" (and
> keep them in sync, or at least have them replicated from
> server->client). But there's no explanation given to why one needs to
> do that (are we doing some form of interwiki transclusion and need to
> render foreign interwiki links correctly? Want to be able to do global
> whatlinkshere and need unique global ids for various wikis? Something
> else?)

I've included the rest of Brian's mail below because I think his other
points are worth responding to as well, but included the above because
I wanted to reiterate his core set of questions.

I don't mean to jerk y'all around.  I'm pushing the Platform devs
(Tim, Aaron, Chad, and Sam in particular) to be responsive here, and
based on the conversations that I've had with them, they have these
questions too.

Rob
[1] http://lists.wikimedia.org/pipermail/wikitech-l/2012-June/thread.html#60992

---------- Forwarded message ----------
From: bawolff <[hidden email]>
Date: Fri, Aug 10, 2012 at 6:33 AM
Subject: [Wikitech-l] Wikidata blockers weekly update
To: wikitech-l <[hidden email]>


> Hey,
>
> You mean site_config?
> > You're suggesting the interwiki system should look for a site by
> > site_local_key, when it finds one parse out the site_config, check if it's
> > disabled and if so ignore the fact it found a site with that local key?
> > Instead of just not having a site_local_key for that row in the first place?
> >
>
> No. Since the interwiki system is not specific to any type of site, this
> approach would be making it needlessly hard. The site_link_inline field
> determines if the site should be usable as interwiki link, as you can see
> in the patchset:
>
>   -- If the site should be linkable inline as an "interwiki link" using
>   -- [[site_local_key:pageTitle]].
>   site_link_inline           bool                NOT NULL,
>
> So queries would be _very_ simple.
>
> > So data duplication simply because one wiki needs a second local name
> will mean that one url now has two different global ids this sounds
> precisely like something that is going to get in the way of the whole
> reason you wanted this rewrite.
>
> * It does not get in our way at all, and is completely disjunct from why we
> want the rewrite
> * It's currently done like this
> * The changes we do need and are proposing to make will make such a rewrite
> at a later point easier then it is now
>
> > Doing it this way frees us from creating any restrictions on whatever
> source we get sites from that we shouldn't be placing on them.
>
> * We don't need this for Wikidata
> * It's a new feature that might or might not be nice to have that currently
> does not exist
> * The changes we do need and are proposing to make will make such a rewrite
> at a later point easier then it is now
>
> > So you might as well drop the 3 url related columns and just use the data
> blob that you already have.
>
> I don't see what this would gain us at all. It's just make things more
> complicated.
>
> > The $1 pattern may not even work for some sites.
>
> * We don't need this for Wikidata
> * It's a new feature that might or might not be nice to have that currently
> does not exist
> * The changes we do need and are proposing to make will make such a rewrite
> at a later point easier then it is now
>
> And in fact we are making this more flexible by having the type system. The
> MediaWiki site type could for instance be able to form both "nice" urls and
> index.php ones. Or a gerrit type could have the logic to distinguish
> between the gerrit commit number and a sha1 hash.
>
> Cheers

[Just to clarify, I'm doing inline replies to things various people
said, not just Jeroen]

First and foremost, I'm a little confused as to what the actual use
cases here are. Could we get a short summary for those who aren't
entirely following how wikidata will work, why the current interwiki
situation is insufficient? I've read the I0a96e585 and
http://lists.wikimedia.org/pipermail/wikitech-l/2012-June/060992.html,
but everything seems very vague "It doesn't work for our situation",
without any detailed explanation of what that situation is. At most
the messages kind of hint at wanting to be able to access the list of
interwiki types of the wikidata "server" from a wikidata "client" (and
keep them in sync, or at least have them replicated from
server->client). But there's no explanation given to why one needs to
do that (are we doing some form of interwiki transclusion and need to
render foreign interwiki links correctly? Want to be able to do global
whatlinkshere and need unique global ids for various wikis? Something
else?)

>* Site definitions can exist that are not used as "interlanguage link" and
>not used as "interwiki link"

And if we put one of those on a talk page, what would happen? Or if
foo was one such link, doing [[:foo:some page]]  (Current behaviour is
it becomes an interwiki).

Although to be fair, I do see how the current way we distinguish
between interwiki and interlang links is a bit hacky.

>And in fact we are making this more flexible by having the type system. The
>MediaWiki site type could for instance be able to form both "nice" urls and
>index.php ones. Or a gerrit type could have the logic to distinguish
>between the gerrit commit number and a sha1 hash.

I must admit I do like this this idea. In particular the current
situation where we treat the value of an interwiki link as a title
(aka spaces -> underscores etc) even for sites that do not use such
conventions, has always bothered me. Having interwikis that support
url re-writing based on the value does sound cool, but I certainly
wouldn't want said code in a db blob (and just using an integer
site_type identifier is quite far away from giving us that, but its
still a step in a positive direction), which raises the question of
where would such rewriting code go.


> The issue I was trying to deal with was storage. Currently we 100% assume
>that the interwiki list is a table and there will only ever be one of them.

Do we really assume that? Certainly that's the default config, but I
don't think that is the config used on WMF. As far as I'm aware,
Wikimedia uses a cdb database file (via $wgInterwikiCache), which
contains all the interwikis for all sites. From what I understand, it
supports doing various "scope" levels of interwikis, including per db,
per site (Wikipedia, Wiktionary, etc), or global interwikis that act
on all sites.

The feature is a bit wmf specific, but it does seem to support
different levels of interwiki lists.

Furthermore, I imagine (but don't know, so lets see how fast I get
corrected ;) that the cdb database was introduced not just as
convenience measure for easier administration of the interwiki tables,
but also for better performance.  If so, one should also take into
account any performance hit that may come with switching to the
proposed "sites" facility.

Cheers,
-bawolff

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Denny Vrandečić
Hi everyone,

2012/8/11 Rob Lanphier <[hidden email]>:

> To recap, Jeroen submitted changeset 14295 in Gerrit
> <https://gerrit.wikimedia.org/r/#/c/14295/> with the following
> summary:
>> This commit introduces a new table to hold site data and configuration,
>> objects to represent the table, site objects and lists of sites and
>> associated tests.
>
>> The sites code is a more generalized and less contrived version of the
>> interwiki code we currently have and is meant to replace it eventually.
>> This commit does not do away with the existing interwiki code in any way yet.
>
>> The reasons for this change where outlined and discussed on wikitech here:
>> http://lists.wikimedia.org/pipermail/wikitech-l/2012-June/060992.html
>
> Thanks Brian for summarizing an important point:
>
> On Fri, Aug 10, 2012 at 6:33 AM, bawolff <[hidden email]> wrote:
>> First and foremost, I'm a little confused as to what the actual use
>> cases here are. Could we get a short summary for those who aren't
>> entirely following how wikidata will work, why the current interwiki
>> situation is insufficient?

The use case is the following: in order for Wikidata to be able to
provide language links for the wikis using Wikidata, we need to use
consistent global IDs when communicating about the involved wikis
(i.e. if a "client wiki", i.e. a Wikipedia like fr.wp, asks Wikidata
for the language links for an article X, the client and the repo need
to know that e.g. "enwiki" refers to en.wp. Right now the table does
not sport any such field -- the local prefix "en" might be differently
defined on fr.wp and fr.wikinews, for example, and we obviously do not
want to break that).

We further made some configurations explicit that are as of now
embedded in the code using the current interwiki table.

The change also facilitates synchronizing that data, but this is part
of another changeset and of other code.


I am a bit confused here. As far as I can see everyone agrees that
this changeset goes in the right direction. I also did not see
contentions about how the changeset is working that have not been
resolved yet. The reservations that are raised are that the changeset
does not go *far enough*. Considering that we want to keep changesets
small, and that this changeset keeps the old system in place and thus
should not break anything, wouldn't that be a good first step?

If this is the case, why do we not move by taking this step and
continue to discuss about how to iterate further from there to an even
better and more comprehensive solution?

Cheers,
Denny


--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Rob Lanphier-4
Hi Denny,

I think we may be talking past each other.  Comments inline...

On Mon, Aug 13, 2012 at 9:47 AM, Denny Vrandečić
<[hidden email]> wrote:
> I am a bit confused here. As far as I can see everyone agrees that
> this changeset goes in the right direction.

I don't think enough people actually understand the patch well enough
to say that.  The fear is that it's a step sideways, trading crufty
but well-tested code for something larger, more confusing, and less
stable.

> I also did not see
> contentions about how the changeset is working that have not been
> resolved yet. The reservations that are raised are that the changeset
> does not go *far enough*. Considering that we want to keep changesets
> small, and that this changeset keeps the old system in place and thus
> should not break anything, wouldn't that be a good first step?

It depends.  Every time someone asks for specifics ("where is this
code used?", "what exactly is this needed for?"), they get very meta
answers ("it's used in Wikidata").

If you want to expedite this review, give specific answers.  Point to
line numbers in files, and show how the code there would be far more
complicated without this change.  Point to specific functionality we
can see in a running instance.  Use this as an opportunity to educate
everyone on Wikidata internals.

Thanks
Rob

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Jeroen De Dauw-2
Hey,

Every time someone asks for specifics ("where is this
> code used?", "what exactly is this needed for?"), they get very meta
> answers ("it's used in Wikidata").
>

Can you be specific and point to these questions we've answered to vague,
then I'll try to answer then in more detail.

If you want to expedite this review, give specific answers.  Point to
> line numbers in files, and show how the code there would be far more
> complicated without this change.  Point to specific functionality we
> can see in a running instance.  Use this as an opportunity to educate
> everyone on Wikidata internals.
>

We need generalizations provided by this patch. Yes, that's not specific at
all to why and where we need them. You'd need to know that to verify we're
not doing stupid stuff in Wikidata. However, these generalizations make
sense on their own, and can be judged entirely loose from Wikidata.
Educating people on Wikidata internals really seems to be out of scope to
me.

I don't think enough people actually understand the patch well enough
> to say that.
>

The code is well documented and I've been answering questions both on the
list here and gerrit. If you want to understand the patch, look at it, and
if you're still not clear on anything, ask about it. I don't see how we can
do much more from our end - got any suggestions?

The fear is that it's a step sideways, trading crufty
> but well-tested code for something larger, more confusing, and less
> stable.
>

How do you figure this? My interpretation from the thread is similar to
that of Denny - we're basically all agreeing that this change improves on
the current system in various ways, but some thing it should tackle some
issues it's not currently dealing with as well.

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Rob Lanphier-4
On Mon, Aug 13, 2012 at 11:03 AM, Jeroen De Dauw <[hidden email]> wrote:
> Can you be specific and point to these questions we've answered to vague,
> then I'll try to answer then in more detail.

Two places to start off with:
1.  In response to Brian Wolff's email.  Many interesting questions
were redacted in Denny's response.
2.  In response to Tim's July 18 comment here:
https://gerrit.wikimedia.org/r/#/c/14295/

> We need generalizations provided by this patch. Yes, that's not specific at
> all to why and where we need them. You'd need to know that to verify we're
> not doing stupid stuff in Wikidata. However, these generalizations make
> sense on their own, and can be judged entirely loose from Wikidata.

Not really. Basically, what you're proposing is that these changes are
necessary for Wikidata, that you don't have time to implement the full
solution, and that's why we have to settle for a halfway solution
instead of finishing the job.

I can understand not wanting the scope creep of "finishing the job",
since there's not consensus on what that means.  What Daniel suggested
(which seems to also have the support of Chad and Aaron, at least) is
that this is RfC material.  If avoiding scope creep is the goal, then
it becomes more important to understand exactly what Wikidata needs
out of this patch, and that involves understanding the parts of
Wikidata that use this.

> Educating people on Wikidata internals really seems to be out of scope to
> me.

Given that the Wikidata code needs a full review by many of the same
people that are asking about this particular change, doesn't that seem
largely academic?

> How do you figure this? My interpretation from the thread is similar to
> that of Denny - we're basically all agreeing that this change improves on
> the current system in various ways, but some thing it should tackle some
> issues it's not currently dealing with as well.

My reading is that folks like Daniel and Chad are conceding that the
current system needs to be improved, and that this change *might* be a
step in the right direction, but is probably not far enough to be
worth dealing with the problems of doing this halfway.

Rob

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Daniel Friesen-4
On Mon, 13 Aug 2012 17:56:49 -0700, Rob Lanphier <[hidden email]>  
wrote:

> On Mon, Aug 13, 2012 at 11:03 AM, Jeroen De Dauw  
> <[hidden email]> wrote:
>> Can you be specific and point to these questions we've answered to  
>> vague,
>> then I'll try to answer then in more detail.
>
> Two places to start off with:
> 1.  In response to Brian Wolff's email.  Many interesting questions
> were redacted in Denny's response.
> 2.  In response to Tim's July 18 comment here:
> https://gerrit.wikimedia.org/r/#/c/14295/
>
>> We need generalizations provided by this patch. Yes, that's not  
>> specific at
>> all to why and where we need them. You'd need to know that to verify  
>> we're
>> not doing stupid stuff in Wikidata. However, these generalizations make
>> sense on their own, and can be judged entirely loose from Wikidata.
>
> Not really. Basically, what you're proposing is that these changes are
> necessary for Wikidata, that you don't have time to implement the full
> solution, and that's why we have to settle for a halfway solution
> instead of finishing the job.
>
> I can understand not wanting the scope creep of "finishing the job",
> since there's not consensus on what that means.  What Daniel suggested
> (which seems to also have the support of Chad and Aaron, at least) is
> that this is RfC material.  If avoiding scope creep is the goal, then
> it becomes more important to understand exactly what Wikidata needs
> out of this patch, and that involves understanding the parts of
> Wikidata that use this.
>
>> Educating people on Wikidata internals really seems to be out of scope  
>> to
>> me.
>
> Given that the Wikidata code needs a full review by many of the same
> people that are asking about this particular change, doesn't that seem
> largely academic?
>
>> How do you figure this? My interpretation from the thread is similar to
>> that of Denny - we're basically all agreeing that this change improves  
>> on
>> the current system in various ways, but some thing it should tackle some
>> issues it's not currently dealing with as well.
>
> My reading is that folks like Daniel and Chad are conceding that the
> current system needs to be improved, and that this change *might* be a
> step in the right direction, but is probably not far enough to be
> worth dealing with the problems of doing this halfway.
>
> Rob

I also feel that some of the changes that don't go far enough or don't  
look like the ideal I would have used if I wrote this code, are in areas  
such as database schema and potentially overall API. Areas which if this  
is committed now will require anyone who tries to finish the project to  
add in migrations, etc... just to fix the schema that should have been  
done right from the start.
Also there is a key question undecided. Will the sites table be a  
first-class edited table. Or act like an index. Not deciding the way we  
treat this table right now will make it practically impossible to change  
that perception later on, and if we do decide that it should be more of an  
index when people have started writing editing interfaces on top of the  
table then we would practically have to rewrite it yet again.


Frankly some of the code sets off my rewrite nerves. And if I had the  
time/backing I'd collect all the requirements on an RfC page and write the  
new system myself.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Daniel Friesen-4
In reply to this post by Denny Vrandečić
On Mon, 13 Aug 2012 09:47:21 -0700, Denny Vrandečić  
<[hidden email]> wrote:

> Hi everyone,
>
> The use case is the following: in order for Wikidata to be able to
> provide language links for the wikis using Wikidata, we need to use
> consistent global IDs when communicating about the involved wikis
> (i.e. if a "client wiki", i.e. a Wikipedia like fr.wp, asks Wikidata
> for the language links for an article X, the client and the repo need
> to know that e.g. "enwiki" refers to en.wp. Right now the table does
> not sport any such field -- the local prefix "en" might be differently
> defined on fr.wp and fr.wikinews, for example, and we obviously do not
> want to break that).
>
> We further made some configurations explicit that are as of now
> embedded in the code using the current interwiki table.
>
> The change also facilitates synchronizing that data, but this is part
> of another changeset and of other code.
>
> Cheers,
> Denny

I actually have a side question in this area.

You mention using a global id to refer to sites for making links. And  
synchronization of the sites table.

So you're saying that this part of Wikidata only works within Wikimedia  
projects right?

Does Wikidata overall only function within Wikimedia projects. Or is there  
a different mechanism to deal with clients from external wikis?

--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Jeroen De Dauw-2
Hey,

You mention using a global id to refer to sites for making links. And
> synchronization of the sites table.
>
> So you're saying that this part of Wikidata only works within Wikimedia
> projects right?
>
> Does Wikidata overall only function within Wikimedia projects. Or is there
> a different mechanism to deal with clients from external wikis?
>

The software we're writing is completely Wikimedia agnostic and the actual
Wikidata project will obvious be usable outside of Wikimedia projects. We
will allow for links to non Wikimedia sites (although we have not agreed on
how open this will be), and for non-Wikimedia sites to access all data
stored within Wikidata (including our "equivalent links" using the sites
table). Does that answer your question or am I missing something?

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Daniel Friesen-4
On Tue, 14 Aug 2012 07:32:07 -0700, Jeroen De Dauw  
<[hidden email]> wrote:

> Hey,
>
> You mention using a global id to refer to sites for making links. And
>> synchronization of the sites table.
>>
>> So you're saying that this part of Wikidata only works within Wikimedia
>> projects right?
>>
>> Does Wikidata overall only function within Wikimedia projects. Or is  
>> there
>> a different mechanism to deal with clients from external wikis?
>>
>
> The software we're writing is completely Wikimedia agnostic and the  
> actual
> Wikidata project will obvious be usable outside of Wikimedia projects. We
> will allow for links to non Wikimedia sites (although we have not agreed  
> on
> how open this will be), and for non-Wikimedia sites to access all data
> stored within Wikidata (including our "equivalent links" using the sites
> table). Does that answer your question or am I missing something?
>
> Cheers
>
> --
> Jeroen De Dauw
> http://www.bn2vs.com
> Don't panic. Don't be evil.
> --

Ok, so the data is available to 3rd party wikis.

I was asking how you planned to handle sites in 3rd party wikis.
Do you have a separate mechanism to handle links from 3rd party clients?  
Or are they supposed to sync their sites from Wikimedia's Wikidata?

--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Jeroen De Dauw-2
Hey,

I was asking how you planned to handle sites in 3rd party wikis.
> Do you have a separate mechanism to handle links from 3rd party clients?
> Or are they supposed to sync their sites from Wikimedia's Wikidata?
>

AFAIK we're providing full urls in our export formats, not sure what our
current status on this is and what our exact plans are. We're not exporting
site data ourselves (that's really not our job), but third parties can
obtain it via the sites API (which has not been created yet, but would be
very similar to the existing interwiki API). We _could_ include site data
in our export formats as well, but that really is a different discussion
altogether :)

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Use cases for Sites handling change (Re: Wikidata blockers weekly update)

Denny Vrandečić
In reply to this post by Daniel Friesen-4
Hi all,

thanks to Daniel (F.) for structuring the discussion. The discussion
is currently ongoing here:

<https://www.mediawiki.org/wiki/Requests_for_comment/New_sites_system>

I hope that the requirements and use cases section is complete. If
not, please tune in now. We will build on the use cases and their
discussion there.

I also created a first draft based for a schema, which was very
quickly completely ripped apart, and replaced by a much better one on
the discussion page. There are also other discussions going on there.
Please tune in if you are interested in the Sites table, in order to
achieve consensus on the topic.

<https://www.mediawiki.org/wiki/Talk:Requests_for_comment/New_sites_system#Database_schema_proposal_18334>

Furthermore, I want to address the unanswered questions Rob raised:

* Re Tim's July 18th comment and Rob's following comment: where is the
calling code?

The code calling the sitetables is in the Wikibase Library, basically
all the files starting with Site*:

<https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/Wikibase.git;a=tree;f=lib/includes;h=7debe083be74ad42028f37e17e26ce9a419bf7ab;hb=HEAD>

But since they are part of the patchset, you probably seen them. The
Sites info is being used in:

* most importantly Wikibase/lib/includes/SiteLink.php, where the site
link (e.g. the link from a Wikidata item to a Wikipedia article) is
defined using the Sites data. The Sitelinks are the most prominent
object depending on the data, and are used basically everywhere on the
repository. Wikibase/repo/includes/api/ApiSetSiteLink.php offers a
good example of that.

* some utils in Wikibase/lib/includes/Utils.php
* further, a few places on the client, like LangLinkHandler and the hooks



* Questions by Bawulff I redacted from my answer (because I was
focusing on other stuff):

> First and foremost, I'm a little confused as to what the actual use
> cases here are. Could we get a short summary for those who aren't
> entirely following how wikidata will work, why the current interwiki
> situation is insufficient?

Most of all, we need global identifiers for the different wikis. We
could add a table which only contains mapping of the local prefixes to
global identifiers, but we think that the current interwiki table
could use some love anyway, and thus we decided to restructure it as a
whole. This now has lead to the above mentioned RFC, but the original
blocker is: for providing language links form a central source --
Wikidata -- we need to have global wiki identifiers.

>>* Site definitions can exist that are not used as "interlanguage link" and
>>not used as "interwiki link"

> And if we put one of those on a talk page, what would happen? Or if
> foo was one such link, doing [[:foo:some page]]  (Current behaviour is
> it becomes an interwiki).

I probably misunderstand. If currently something is not set up as an
interlanguage link and neither as an interwiki link, it will become a
normal link, not an interwiki link (i.e. it will point to the local
page foo:some page in the main namespace). Did you mean something
else?

> Although to be fair, I do see how the current way we distinguish
> between interwiki and interlang links is a bit hacky.

Agreed, the way it is currently done in core is a bit hacky.

>>And in fact we are making this more flexible by having the type system. The
>>MediaWiki site type could for instance be able to form both "nice" urls and
>>index.php ones. Or a gerrit type could have the logic to distinguish
>>between the gerrit commit number and a sha1 hash.

> I must admit I do like this this idea. In particular the current
> situation where we treat the value of an interwiki link as a title
> (aka spaces -> underscores etc) even for sites that do not use such
> conventions, has always bothered me. Having interwikis that support
> url re-writing based on the value does sound cool, but I certainly
> wouldn't want said code in a db blob (and just using an integer
> site_type identifier is quite far away from giving us that, but its
> still a step in a positive direction), which raises the question of
> where would such rewriting code go.

A handler class for each type of site, that would construct links to
that type of side based on the data about this site.

>> The issue I was trying to deal with was storage. Currently we 100% assume
>>that the interwiki list is a table and there will only ever be one of them.

> Do we really assume that? Certainly that's the default config, but I
> don't think that is the config used on WMF. As far as I'm aware,
> Wikimedia uses a cdb database file (via $wgInterwikiCache), which
> contains all the interwikis for all sites. From what I understand, it
> supports doing various "scope" levels of interwikis, including per db,
> per site (Wikipedia, Wiktionary, etc), or global interwikis that act
> on all sites.

We did not know about that database. Who can tell us more about it?
This would be very interesting to get our synching code optimized.

It still wouldn't help us with the global identifiers, though, but it
would be good to know more about it.

Cheers,
Denny

--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.


2012/8/14 Daniel Friesen <[hidden email]>:

> On Tue, 14 Aug 2012 07:32:07 -0700, Jeroen De Dauw <[hidden email]>
> wrote:
>
>> Hey,
>>
>> You mention using a global id to refer to sites for making links. And
>>>
>>> synchronization of the sites table.
>>>
>>> So you're saying that this part of Wikidata only works within Wikimedia
>>> projects right?
>>>
>>> Does Wikidata overall only function within Wikimedia projects. Or is
>>> there
>>> a different mechanism to deal with clients from external wikis?
>>>
>>
>> The software we're writing is completely Wikimedia agnostic and the actual
>> Wikidata project will obvious be usable outside of Wikimedia projects. We
>> will allow for links to non Wikimedia sites (although we have not agreed
>> on
>> how open this will be), and for non-Wikimedia sites to access all data
>> stored within Wikidata (including our "equivalent links" using the sites
>> table). Does that answer your question or am I missing something?
>>
>> Cheers
>>
>> --
>> Jeroen De Dauw
>> http://www.bn2vs.com
>> Don't panic. Don't be evil.
>> --
>
>
> Ok, so the data is available to 3rd party wikis.
>
> I was asking how you planned to handle sites in 3rd party wikis.
> Do you have a separate mechanism to handle links from 3rd party clients? Or
> are they supposed to sync their sites from Wikimedia's Wikidata?
>
>
> --
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l