Ifexists across wikis

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Ifexists across wikis

Lars Aronsson
If I write a [[link]] it will be blue if the page exists and red otherwise.
But if I write [[:sw:link]] that will be an external or cross-wiki link,
that is never red, as if it were impossible to know whether that page
existed in Swahili Wikipedia.

But determining the existence of a page is just a quick database table
lookup, and all databases run on WMF's servers, so it shouldn't be more
expensive to look up a cross-wiki link, as long as it is one of WMF's wikis.

In Wiktionary, it is common to link to entries in foreign languages both
on the local wiki and to the native wiki for that language. For example,
in English Wikitionary the entry for "blue" links to the Swahili word "bluu"
both on en.wiktionary and on sw.wiktionary, using the template
{{t+|sw|bluu}}.

https://en.wiktionary.org/wiki/blue#Translations

But since the Afrikaans translation "blou" doesn't have an entry on the
Afrikaans Wiktionary, another template is used: {{t|af|blou}}. And it is
a pain to know which one of these two templates to use. If it was possible
in {{#ifexists}} to determine the existence of a page in another wiki,
only one template would be needed, and the bot job to change to the right
template would not be needed.

#ifexist already works across namespaces (well, of course), so is there any
good reason it shouldn't work across wikis?

Oddly, the documentation says #ifexist is an "expensive" parser function.
That doesn't make much sense to me. It's as if red/blue links were
expensive, and most of our list pages should be banned.
https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions#.23ifexist


--
   Lars Aronsson ([hidden email])
   Aronsson Datateknik - http://aronsson.se



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Ifexists across wikis

Alex Monk
I don't think there is a way to get a database name from an interwiki
prefix.

Also, whether a page is known or not does not just depend on a simple
database lookup. Extensions can add arbitrary rules about which titles
should be considered known or not. EducationProgram, GlobalUserPage, and
WikimediaIncubator all do this.

On 6 December 2015 at 16:26, Lars Aronsson <[hidden email]> wrote:

> If I write a [[link]] it will be blue if the page exists and red otherwise.
> But if I write [[:sw:link]] that will be an external or cross-wiki link,
> that is never red, as if it were impossible to know whether that page
> existed in Swahili Wikipedia.
>
> But determining the existence of a page is just a quick database table
> lookup, and all databases run on WMF's servers, so it shouldn't be more
> expensive to look up a cross-wiki link, as long as it is one of WMF's
> wikis.
>
> In Wiktionary, it is common to link to entries in foreign languages both
> on the local wiki and to the native wiki for that language. For example,
> in English Wikitionary the entry for "blue" links to the Swahili word
> "bluu"
> both on en.wiktionary and on sw.wiktionary, using the template
> {{t+|sw|bluu}}.
>
> https://en.wiktionary.org/wiki/blue#Translations
>
> But since the Afrikaans translation "blou" doesn't have an entry on the
> Afrikaans Wiktionary, another template is used: {{t|af|blou}}. And it is
> a pain to know which one of these two templates to use. If it was possible
> in {{#ifexists}} to determine the existence of a page in another wiki,
> only one template would be needed, and the bot job to change to the right
> template would not be needed.
>
> #ifexist already works across namespaces (well, of course), so is there any
> good reason it shouldn't work across wikis?
>
> Oddly, the documentation says #ifexist is an "expensive" parser function.
> That doesn't make much sense to me. It's as if red/blue links were
> expensive, and most of our list pages should be banned.
> https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions#.23ifexist
>
>
> --
>   Lars Aronsson ([hidden email])
>   Aronsson Datateknik - http://aronsson.se
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Ifexists across wikis

Florian Schmidt
In reply to this post by Lars Aronsson
I'm not very familar with this, but wouldn't this need a bigger change in LinksUpdate? Or the question: how would a wiki know, if a page get's created after it was linked and mark it blue instead of red?

Gesendet mit meinem HTC

----- Nachricht beantworten -----
Von: "Alex Monk" <[hidden email]>
An: "Wikimedia developers" <[hidden email]>
Betreff: [Wikitech-l] Ifexists across wikis
Datum: So., Dez. 6, 2015 18:04

I don't think there is a way to get a database name from an interwiki
prefix.

Also, whether a page is known or not does not just depend on a simple
database lookup. Extensions can add arbitrary rules about which titles
should be considered known or not. EducationProgram, GlobalUserPage, and
WikimediaIncubator all do this.

On 6 December 2015 at 16:26, Lars Aronsson <[hidden email]> wrote:

> If I write a [[link]] it will be blue if the page exists and red otherwise.
> But if I write [[:sw:link]] that will be an external or cross-wiki link,
> that is never red, as if it were impossible to know whether that page
> existed in Swahili Wikipedia.
>
> But determining the existence of a page is just a quick database table
> lookup, and all databases run on WMF's servers, so it shouldn't be more
> expensive to look up a cross-wiki link, as long as it is one of WMF's
> wikis.
>
> In Wiktionary, it is common to link to entries in foreign languages both
> on the local wiki and to the native wiki for that language. For example,
> in English Wikitionary the entry for "blue" links to the Swahili word
> "bluu"
> both on en.wiktionary and on sw.wiktionary, using the template
> {{t+|sw|bluu}}.
>
> https://en.wiktionary.org/wiki/blue#Translations
>
> But since the Afrikaans translation "blou" doesn't have an entry on the
> Afrikaans Wiktionary, another template is used: {{t|af|blou}}. And it is
> a pain to know which one of these two templates to use. If it was possible
> in {{#ifexists}} to determine the existence of a page in another wiki,
> only one template would be needed, and the bot job to change to the right
> template would not be needed.
>
> #ifexist already works across namespaces (well, of course), so is there any
> good reason it shouldn't work across wikis?
>
> Oddly, the documentation says #ifexist is an "expensive" parser function.
> That doesn't make much sense to me. It's as if red/blue links were
> expensive, and most of our list pages should be banned.
> https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions#.23ifexist
>
>
> --
>   Lars Aronsson ([hidden email])
>   Aronsson Datateknik - http://aronsson.se
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Ifexists across wikis

Bartosz Dziewoński
In reply to this post by Lars Aronsson
On 2015-12-06 17:26, Lars Aronsson wrote:

> If I write a [[link]] it will be blue if the page exists and red otherwise.
> But if I write [[:sw:link]] that will be an external or cross-wiki link,
> that is never red, as if it were impossible to know whether that page
> existed in Swahili Wikipedia.
>
> But determining the existence of a page is just a quick database table
> lookup, and all databases run on WMF's servers, so it shouldn't be more
> expensive to look up a cross-wiki link, as long as it is one of WMF's
> wikis.
>
 > (...)
>
> #ifexist already works across namespaces (well, of course), so is there any
> good reason it shouldn't work across wikis?
>
> Oddly, the documentation says #ifexist is an "expensive" parser function.
> That doesn't make much sense to me. It's as if red/blue links were
> expensive, and most of our list pages should be banned.
> https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions#.23ifexist

To add to what Alex and Florian said, the simple database lookup to
check page existence is not actually that simple. When parsing a page,
the query to determine link color (and to mark links to non-existent,
redirect or disambig pages) is done in batches of 1000 links, after the
whole page has been parsed and we know all the pages it links to.
Special pages that have lists of links use a similar method.

This wouldn't be possible if we needed to query a different database for
each link (at best, perhaps we could batch them per-database, which
doesn't help the Wiktionary use case of links to various sites).

It's also why #ifexist is expensive: it needs a separate database query
for each time it's used, to check for a single page, because it's
impossible to determine the list of pages to check in advance.

--
Bartosz Dziewoński

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Ifexists across wikis

Purodha Blissenbach
In reply to this post by Alex Monk
How about using the API on the targe side?
Purodha

On 06.12.2015 18:04, Alex Monk wrote:

> I don't think there is a way to get a database name from an interwiki
> prefix.
>
> Also, whether a page is known or not does not just depend on a simple
> database lookup. Extensions can add arbitrary rules about which
> titles
> should be considered known or not. EducationProgram, GlobalUserPage,
> and
> WikimediaIncubator all do this.
>
> On 6 December 2015 at 16:26, Lars Aronsson <[hidden email]> wrote:
>
>> If I write a [[link]] it will be blue if the page exists and red
>> otherwise.
>> But if I write [[:sw:link]] that will be an external or cross-wiki
>> link,
>> that is never red, as if it were impossible to know whether that
>> page
>> existed in Swahili Wikipedia.
>>
>> But determining the existence of a page is just a quick database
>> table
>> lookup, and all databases run on WMF's servers, so it shouldn't be
>> more
>> expensive to look up a cross-wiki link, as long as it is one of
>> WMF's
>> wikis.
>>
>> In Wiktionary, it is common to link to entries in foreign languages
>> both
>> on the local wiki and to the native wiki for that language. For
>> example,
>> in English Wikitionary the entry for "blue" links to the Swahili
>> word
>> "bluu"
>> both on en.wiktionary and on sw.wiktionary, using the template
>> {{t+|sw|bluu}}.
>>
>> https://en.wiktionary.org/wiki/blue#Translations
>>
>> But since the Afrikaans translation "blou" doesn't have an entry on
>> the
>> Afrikaans Wiktionary, another template is used: {{t|af|blou}}. And
>> it is
>> a pain to know which one of these two templates to use. If it was
>> possible
>> in {{#ifexists}} to determine the existence of a page in another
>> wiki,
>> only one template would be needed, and the bot job to change to the
>> right
>> template would not be needed.
>>
>> #ifexist already works across namespaces (well, of course), so is
>> there any
>> good reason it shouldn't work across wikis?
>>
>> Oddly, the documentation says #ifexist is an "expensive" parser
>> function.
>> That doesn't make much sense to me. It's as if red/blue links were
>> expensive, and most of our list pages should be banned.
>>
>> https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions#.23ifexist
>>
>>
>> --
>>   Lars Aronsson ([hidden email])
>>   Aronsson Datateknik - http://aronsson.se
>>
>>
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Ifexists across wikis

John Erling Blad
Use Q-ids and get the links from Wikidata.

On Sun, Dec 6, 2015 at 10:49 PM, Purodha Blissenbach <
[hidden email]> wrote:

> How about using the API on the targe side?
> Purodha
>
>
> On 06.12.2015 18:04, Alex Monk wrote:
>
>> I don't think there is a way to get a database name from an interwiki
>> prefix.
>>
>> Also, whether a page is known or not does not just depend on a simple
>> database lookup. Extensions can add arbitrary rules about which titles
>> should be considered known or not. EducationProgram, GlobalUserPage, and
>> WikimediaIncubator all do this.
>>
>> On 6 December 2015 at 16:26, Lars Aronsson <[hidden email]> wrote:
>>
>> If I write a [[link]] it will be blue if the page exists and red
>>> otherwise.
>>> But if I write [[:sw:link]] that will be an external or cross-wiki link,
>>> that is never red, as if it were impossible to know whether that page
>>> existed in Swahili Wikipedia.
>>>
>>> But determining the existence of a page is just a quick database table
>>> lookup, and all databases run on WMF's servers, so it shouldn't be more
>>> expensive to look up a cross-wiki link, as long as it is one of WMF's
>>> wikis.
>>>
>>> In Wiktionary, it is common to link to entries in foreign languages both
>>> on the local wiki and to the native wiki for that language. For example,
>>> in English Wikitionary the entry for "blue" links to the Swahili word
>>> "bluu"
>>> both on en.wiktionary and on sw.wiktionary, using the template
>>> {{t+|sw|bluu}}.
>>>
>>> https://en.wiktionary.org/wiki/blue#Translations
>>>
>>> But since the Afrikaans translation "blou" doesn't have an entry on the
>>> Afrikaans Wiktionary, another template is used: {{t|af|blou}}. And it is
>>> a pain to know which one of these two templates to use. If it was
>>> possible
>>> in {{#ifexists}} to determine the existence of a page in another wiki,
>>> only one template would be needed, and the bot job to change to the right
>>> template would not be needed.
>>>
>>> #ifexist already works across namespaces (well, of course), so is there
>>> any
>>> good reason it shouldn't work across wikis?
>>>
>>> Oddly, the documentation says #ifexist is an "expensive" parser function.
>>> That doesn't make much sense to me. It's as if red/blue links were
>>> expensive, and most of our list pages should be banned.
>>>
>>> https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions#.23ifexist
>>>
>>>
>>> --
>>>   Lars Aronsson ([hidden email])
>>>   Aronsson Datateknik - http://aronsson.se
>>>
>>>
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Ifexists across wikis

Tim Starling-2
In reply to this post by Bartosz Dziewoński
On 07/12/15 06:29, Bartosz Dziewoński wrote:
> To add to what Alex and Florian said, the simple database lookup to
> check page existence is not actually that simple. When parsing a page,
> the query to determine link color (and to mark links to non-existent,
> redirect or disambig pages) is done in batches of 1000 links, after
> the whole page has been parsed and we know all the pages it links to.
> Special pages that have lists of links use a similar method.

Also, when you make a red link, and then someone creates the page,
people expect the link to turn blue straight away. That's implemented
using the pagelinks table -- when a page is created, we use pagelinks
to find all pages with red links to that page, update all their
page_touched fields, and purge them from Varnish, so that all the
links will turn blue in under a second.

It's possible to do that for interwiki links, but it increases the
amount of time it would take to implement such a feature. We currently
don't have a way to efficiently find all interwiki links to a page, so
one would have to be added.

-- Tim Starling



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Ifexists across wikis

Stas Malyshev
In reply to this post by Alex Monk
Hi!

> I don't think there is a way to get a database name from an interwiki
> prefix.

Not a good/easy way, AFAIK. I've looked into it recently and the way
current code does it is with a lot of ad-hoc stuff, external configs,
hard-coded configs and special cases. I think this ticket:
https://phabricator.wikimedia.org/T113034
aims to improve it.

--
Stas Malyshev
[hidden email]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Ifexists across wikis

MZMcBride-2
In reply to this post by Bartosz Dziewoński
Bartosz Dziewoński wrote:
>It's also why #ifexist is expensive: it needs a separate database query
>for each time it's used, to check for a single page, because it's
>impossible to determine the list of pages to check in advance.

I'm not sure I understand the impossibility here.

When the expensive parser function count feature was added, I remember
this issue being discussed and my memory is that it seemed possible to
batch the ifexist lookups in a similar way to how we batch regular
internal link lookups against the pagelinks table, but nobody was
interested in implementing it at the time.

If the wikitext is parsed/evaluated on page save, I don't see why ifexist
lookups would be impossible to batch. We're already using the pagelinks
table for the ifexist functionality to properly work, as I understand it
(cf. <https://phabricator.wikimedia.org/T14019>).

MZMcBride



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Ifexists across wikis

Federico Leva (Nemo)
In reply to this post by Lars Aronsson
Italian projects would also like such a feature, especially for
(semi)automatic creation of interproject links.
https://it.wikipedia.org/wiki/Discussioni_template:Interprogetto#Interprogetto_a_wikt:_quando_metterlo.3F

(By the way, the lack of Wiktionary on Wikidata even for interwiki links
is extremely detrimental for a huge pile of things.)

Nemo

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l