The dark side of automatic interwiki linking

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

The dark side of automatic interwiki linking

Anders Wegge Keller

 Consider a random article about the 1190s, like
[[da:1190'erne]]. Follow the interwiki links from that one. Follow te
interwiki links from the newly found articles. Wonder how you ended up
back af dawiki, but this time at the year 1190.

 There is something rotten, not just in the state of denmark, but in a
lot of the wikipedia articles about years and decades. I've made a
simplified interwiki graph, <http://wegge.dk/interwiki-1190.png>. Be
warned that it's a 13522 x 309 png image. The graph clearly shows that
a lot of wikipedias have a path from their respective decade article,
via [[he:1190]], back to the same wikipedias year article. This is not
the only decade showing the problem, so it's not easy to fix. Given
the the large number of wikipedias involved, it's too large a task to
perform by hand, and I'm also afraid that before one such loop has
been fixed, one or more iw bots will start spreading the problem
again.

 So does anyone have an idea about how to solve this mess by bot?

--
// Wegge
<http://geowiki.wegge.dk/wiki/Forside> - Alt om geocaching
Bruger du den gratis spamfighther ser jeg kun dine indlæg *EN* gang.

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


_______________________________________________
Wikibots-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wikibots-l
Reply | Threaded
Open this post in threaded view
|

Re: The dark side of automatic interwiki linking

Andre Engels
2008/1/28, Anders Wegge Jakobsen <[hidden email]>:

>  Consider a random article about the 1190s, like
> [[da:1190'erne]]. Follow the interwiki links from that one. Follow te
> interwiki links from the newly found articles. Wonder how you ended up
> back af dawiki, but this time at the year 1190.
>
>  There is something rotten, not just in the state of denmark, but in a
> lot of the wikipedia articles about years and decades. I've made a
> simplified interwiki graph, <http://wegge.dk/interwiki-1190.png>. Be
> warned that it's a 13522 x 309 png image. The graph clearly shows that
> a lot of wikipedias have a path from their respective decade article,
> via [[he:1190]], back to the same wikipedias year article. This is not
> the only decade showing the problem, so it's not easy to fix. Given
> the the large number of wikipedias involved, it's too large a task to
> perform by hand, and I'm also afraid that before one such loop has
> been fixed, one or more iw bots will start spreading the problem
> again.
>
>  So does anyone have an idea about how to solve this mess by bot?

Just do

python interwiki.py 1190'erne -ignore:he:1190 -force

with a bot that is registered at all languages that have a page on the
decade (or, if it is not, do the remaining ones by hand). It will
remove the incorrect link, and get the thing working correctly again.

As for your fear that "before one such loop has been fixed, one or
more iw bots will start spreading the problem again." - this will not
happen unless their operators are doing a really bad job. A bot
working on the 1190s will find that there are languages for which it
gets two links - one for 1190s and one for 1190. In such a case, it
will not out of itself make a decision or in fact make any changes at
all. Instead,
* if the bot is running autonomously, it will skip the page
* if the bot is running interactively, it will ask the operator which
pages to include and which ones not

The only bot who will re-create the mess is an interactive bot in
which the operator makes the wrong choice as to what to include. Bots
do have a risk of copying mistakes, but once any loop to a different
page in the same language has been found, the bots will stop and just
ignore the pages involved.

--
Andre Engels, [hidden email]
ICQ: 6260644  --  Skype: a_engels

_______________________________________________
Wikibots-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wikibots-l
Reply | Threaded
Open this post in threaded view
|

Re: The dark side of automatic interwiki linking

Anders Wegge Keller
"Andre Engels" <[hidden email]> writes:

> 2008/1/28, Anders Wegge Jakobsen <[hidden email]>:

>>  So does anyone have an idea about how to solve this mess by bot?
>
> Just do
>
> python interwiki.py 1190'erne -ignore:he:1190 -force
>
> with a bot that is registered at all languages that have a page on the
> decade (or, if it is not, do the remaining ones by hand). It will
> remove the incorrect link, and get the thing working correctly again.

 Oh, nice :)

> As for your fear that "before one such loop has been fixed, one or
> more iw bots will start spreading the problem again." - this will not
> happen unless their operators are doing a really bad job.

 Well, the problem did not create itself, but if bots in general are
well behaved, it should be fixable.

 I'll fix the wikis I ave an account on, but quite a large number
remains:

WARNING: wikipedia: [[ro:Anii 210]] links to incorrect [[he:210]]
WARNING: wikipedia: [[jv:210-an]] links to incorrect [[he:210]]
WARNING: wikipedia: [[ja:210年代]] links to incorrect [[he:210]]
WARNING: wikipedia: [[la:Decennium 22]] links to incorrect [[he:210]]
WARNING: wikipedia: [[pt:Década de 210]] links to incorrect [[he:210]]
WARNING: wikipedia: [[tt:210. yıllar]] links to incorrect [[he:210]]
WARNING: wikipedia: [[gd:210an]] links to incorrect [[he:210]]
WARNING: wikipedia: [[id:210-an]] links to incorrect [[he:210]]
WARNING: wikipedia: [[zh:210年代]] links to incorrect [[he:210]]
WARNING: wikipedia: [[es:Años 210]] links to incorrect [[he:210]]
WARNING: wikipedia: [[et:210. aastad]] links to incorrect [[he:210]]
WARNING: wikipedia: [[eu:210eko hamarkada]] links to incorrect [[he:210]]
WARNING: wikipedia: [[ca:Dècada del 210]] links to incorrect [[he:210]]
WARNING: wikipedia: [[it:Anni 210]] links to incorrect [[he:210]]
WARNING: wikipedia: [[ast:Años 210]] links to incorrect [[he:210]]
WARNING: wikipedia: [[sk:10. roky 3. storočia]] links to incorrect [[he:210]]
WARNING: wikipedia: [[sl:210.]] links to incorrect [[he:210]]
WARNING: wikipedia: [[ksh:210-er Joohre]] links to incorrect [[he:210]]
WARNING: wikipedia: [[ms:210-an]] links to incorrect [[he:210]]
WARNING: wikipedia: [[sq:Vitet 210]] links to incorrect [[he:210]]
WARNING: wikipedia: [[su:210-an]] links to incorrect [[he:210]]
WARNING: wikipedia: [[ko:210년대]] links to incorrect [[he:210]]
WARNING: wikipedia: [[bs:210te]] links to incorrect [[he:210]]
WARNING: wikipedia: [[br:Bloavezhioù 210]] links to incorrect [[he:210]]
WARNING: wikipedia: [[fi:210-luku]] links to incorrect [[he:210]]
WARNING: wikipedia: [[uz:210-lar]] links to incorrect [[he:210]]
WARNING: wikipedia: [[fr:Années 210]] links to incorrect [[he:210]]
WARNING: wikipedia: [[scn:210ini]] links to incorrect [[he:210]]
WARNING: wikipedia: [[de:210er]] links to incorrect [[he:210]]
WARNING: wikipedia: [[uk:210-ті]] links to incorrect [[he:210]]
WARNING: wikipedia: [[hr:210-ih]] links to incorrect [[he:210]]
WARNING: wikipedia: [[hu:210-es évek]] links to incorrect [[he:210]]

 It looks like it's all decades from 210 to 1240 (inclusive) that has
problems.

--
// Wegge
<http://geowiki.wegge.dk/wiki/Forside> - Alt om geocaching
Bruger du den gratis spamfighther ser jeg kun dine indlæg *EN* gang.

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


_______________________________________________
Wikibots-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wikibots-l
Reply | Threaded
Open this post in threaded view
|

Re: The dark side of automatic interwiki linking

Jan Dudík
There was problem in he:wiki, that they merged aricles about years to
decades, but in names of years. When it was corrected, they made another
articles about some years.

And some small wikis create automated stubs of years with interwiki to
all languages, so there are sometimes 20 or more incorrect links, most
of them are dead, but if one of them is redirect...

My bot have registration on all wikis, so I can do something...

cs:User:JAn Dudík
Anders Wegge Jakobsen napsal(a):

> "Andre Engels" <[hidden email]> writes:
>
>  
>> 2008/1/28, Anders Wegge Jakobsen <[hidden email]>:
>>    
>
>  
>>>  So does anyone have an idea about how to solve this mess by bot?
>>>      
>> Just do
>>
>> python interwiki.py 1190'erne -ignore:he:1190 -force
>>
>> with a bot that is registered at all languages that have a page on the
>> decade (or, if it is not, do the remaining ones by hand). It will
>> remove the incorrect link, and get the thing working correctly again.
>>    
>
>  Oh, nice :)
>
>  
>> As for your fear that "before one such loop has been fixed, one or
>> more iw bots will start spreading the problem again." - this will not
>> happen unless their operators are doing a really bad job.
>>    
>
>  Well, the problem did not create itself, but if bots in general are
> well behaved, it should be fixable.
>
>  I'll fix the wikis I ave an account on, but quite a large number
> remains:
>
> WARNING: wikipedia: [[ro:Anii 210]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[jv:210-an]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[ja:210年代]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[la:Decennium 22]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[pt:Década de 210]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[tt:210. yıllar]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[gd:210an]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[id:210-an]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[zh:210年代]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[es:Años 210]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[et:210. aastad]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[eu:210eko hamarkada]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[ca:Dècada del 210]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[it:Anni 210]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[ast:Años 210]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[sk:10. roky 3. storočia]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[sl:210.]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[ksh:210-er Joohre]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[ms:210-an]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[sq:Vitet 210]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[su:210-an]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[ko:210년대]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[bs:210te]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[br:Bloavezhioù 210]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[fi:210-luku]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[uz:210-lar]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[fr:Années 210]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[scn:210ini]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[de:210er]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[uk:210-ті]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[hr:210-ih]] links to incorrect [[he:210]]
> WARNING: wikipedia: [[hu:210-es évek]] links to incorrect [[he:210]]
>
>  It looks like it's all decades from 210 to 1240 (inclusive) that has
> problems.
>
>  


_______________________________________________
Wikibots-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wikibots-l