Tools for repointing reference dead links to archive.org?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Tools for repointing reference dead links to archive.org?

David Gerard-2
Heise just took down the H Online archive (the English-language
version of Heise.de, a computer news site). This has broken a *huge*
pile of reference links.

[{Special:LinkSearch]] only shows links in the wikitext - not links
inside reference citation templates.

https://www.google.co.uk/search?q=site:en.wikipedia.org+link:h-online.com
shows hundreds of links. Argh.

What I need to do is (a) find all the links (b) add archiveurl=
(something on archive.org, which seems to have captured the whole
site) and archivedate= .

Are there tools that do any of this job?


- d.

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Tools for repointing reference dead links to archive.org?

Kevin Gorman
Hi David -

Funny you ask... there are not currently any solid ones afaik, but I've
been talking with the Internet Archive about building out a bot and trying
to achieve community consensus on ENWP to autoreplace deadlinks with
archive.org ones.  The IA has been crawling all new external links on all
Wikimedia projects at least once every couple of hours for months, and has
a strong interest in killing off literally all of our dead links.  Unless
something falls through, I should be bringing a more detailed plan up
within maybe five or six weeks.

Best,
Kevin Gorman


On Sun, Jan 26, 2014 at 4:10 PM, David Gerard <[hidden email]> wrote:

> Heise just took down the H Online archive (the English-language
> version of Heise.de, a computer news site). This has broken a *huge*
> pile of reference links.
>
> [{Special:LinkSearch]] only shows links in the wikitext - not links
> inside reference citation templates.
>
> https://www.google.co.uk/search?q=site:en.wikipedia.org+link:h-online.com
> shows hundreds of links. Argh.
>
> What I need to do is (a) find all the links (b) add archiveurl=
> (something on archive.org, which seems to have captured the whole
> site) and archivedate= .
>
> Are there tools that do any of this job?
>
>
> - d.
>
> _______________________________________________
> WikiEN-l mailing list
> [hidden email]
> To unsubscribe from this mailing list, visit:
> https://lists.wikimedia.org/mailman/listinfo/wikien-l
>
_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Tools for repointing reference dead links to archive.org?

David Gerard-2
On 27 January 2014 00:17, Kevin Gorman <[hidden email]> wrote:

> Funny you ask... there are not currently any solid ones afaik, but I've
> been talking with the Internet Archive about building out a bot and trying
> to achieve community consensus on ENWP to autoreplace deadlinks with
> archive.org ones.  The IA has been crawling all new external links on all
> Wikimedia projects at least once every couple of hours for months, and has
> a strong interest in killing off literally all of our dead links.  Unless
> something falls through, I should be bringing a more detailed plan up
> within maybe five or six weeks.


Yes, I knew you were cooking up something :-) I was just surprised it
wasn't the sort of task that people had already automated, or written
a nice toolserver bot for, or something.

The ones that use {{cite web}} and variants are pretty simple: you
just whack in archiveurl= and archivedate= (preferably as close as
possible to any cited accessdate=) ... then double-check by eye, of
course. It just gets very tedious and error-prone doing it by hand,
cut'n'pasting URLs into the middle of the computer guacamole we
lovingly euphemise as "wikitext". VE isn't a much happier method.


- d.

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Tools for repointing reference dead links to archive.org?

Risker
On 26 January 2014 19:38, David Gerard <[hidden email]> wrote:

> On 27 January 2014 00:17, Kevin Gorman <[hidden email]> wrote:
>
> > Funny you ask... there are not currently any solid ones afaik, but I've
> > been talking with the Internet Archive about building out a bot and
> trying
> > to achieve community consensus on ENWP to autoreplace deadlinks with
> > archive.org ones.  The IA has been crawling all new external links on
> all
> > Wikimedia projects at least once every couple of hours for months, and
> has
> > a strong interest in killing off literally all of our dead links.  Unless
> > something falls through, I should be bringing a more detailed plan up
> > within maybe five or six weeks.
>
>
> Yes, I knew you were cooking up something :-) I was just surprised it
> wasn't the sort of task that people had already automated, or written
> a nice toolserver bot for, or something.
>
> The ones that use {{cite web}} and variants are pretty simple: you
> just whack in archiveurl= and archivedate= (preferably as close as
> possible to any cited accessdate=) ... then double-check by eye, of
> course. It just gets very tedious and error-prone doing it by hand,
> cut'n'pasting URLs into the middle of the computer guacamole we
> lovingly euphemise as "wikitext". VE isn't a much happier method.
>
>
Concur that it's a great idea....but perhaps a WMF Tools labs tool, instead
of toolserver?  Running battle, I know - but so many of the tools I have
greatly valued over the years are now pretty much useless, or at least
unreliable.

In any case - it would be great to have a bot that did a fair bit of that,
but it should probably be manually run to ensure proper matching, kind of
like AWB.

Risker/Anne
_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Tools for repointing reference dead links to archive.org?

Mark
In reply to this post by David Gerard-2
On 1/27/14, 1:10 AM, David Gerard wrote:
> What I need to do is (a) find all the links (b) add archiveurl=
> (something on archive.org, which seems to have captured the whole
> site) and archivedate= .
>
This bot used to do something along those lines on en.wiki, but hasn't
been active in some months:
https://en.wikipedia.org/wiki/User:DASHBot/Dead_Links

Perhaps it or something similar could be revived?

-Mark

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Tools for repointing reference dead links to archive.org?

David Gerard-2
On 3 February 2014 14:31, Delirium <[hidden email]> wrote:
> On 1/27/14, 1:10 AM, David Gerard wrote:

>> What I need to do is (a) find all the links (b) add archiveurl=
>> (something on archive.org, which seems to have captured the whole
>> site) and archivedate= .

> This bot used to do something along those lines on en.wiki, but hasn't been active in some months: https://en.wikipedia.org/wiki/User:DASHBot/Dead_Links
> Perhaps it or something similar could be revived?



That looks like pretty much what I was after.

Though I ended up fixing a hundred-odd pages by hand for the case of
h-online.com :-)


- d.

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l