Old Wikipedia backups discovered

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Old Wikipedia backups discovered

Tim Starling-2
I was looking through some old files in our SourceForge project. I
opened a file called wiki.tar.gz, and inside were three complete
backups of the text of Wikipedia, from February, March and August 2001!

This is exciting, because there is lots of article history in here
which was assumed to be lost forever.

I've long been interested in Wikipedia's history, and I've tried in
the past to locate such backups. I asked various people who might have
had one. I had given up hope.

The history of particularly old Wikipedia articles, as seen in the
present Wikipedia database, is incomplete, due to Usemod's policy of
deleting old revisions of pages after about a month. The script which
Brion wrote to import the article histories from UseMod to MediaWiki
only fetched those revisions which hadn't been purged yet.

I didn't want to believe that those revisions had been lost forever,
and I even opened the UseMod source code and stared forlornly at the
unlink() call. What I (and Brion before) missed is that UseMod appends
a record of every change made to two files, called diff_log and rclog.
In these two files is a record of every change made to Wikipedia from
January 15 to August 17, 2001.

I've put the two log files up on the web, at:

http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z

The 7-zip archive is only 8.4MB -- much more manageable than today's
backups.

rclog contains IP addresses. The Usemod software made IP addresses of
logged-in users public, so the people who made these edits had no
expectation that their IP address would be kept private. That, coupled
with the passage of time, makes me think that no harm to user privacy
can come from releasing these files.

-- Tim Starling

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Chad
On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling <[hidden email]> wrote:

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>

I have to say this is super cool. It's like digging up a time capsule
right before the 10th anniversary. One of my favorite early edits:

"This is the new WikiPedia!  The idea here is to write a complete
encyclopedia from scratch, without peer review process, etc.
Some people think that this may be a hopeless endeavor, that
the result will necessarily suck.  We aren't so sure.  So, let's get
to work!"

-Chad

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Michael Snow-5
In reply to this post by Tim Starling-2
On 12/14/2010 7:54 AM, Tim Starling wrote:
> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
I guess producing database dumps was easier in those days. Seriously
though, this is absolutely fantastic news!

--Michael Snow

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

WereSpielChequers-2
Can these edits be imported into wikipedia in time for the tenth anniversary?

I'm assuming some will relate to pages that have since been moved or
deleted so I appreciate this won't be an easy project.

WereSpielChequers

On 14 December 2010 16:16, Michael Snow <[hidden email]> wrote:

> On 12/14/2010 7:54 AM, Tim Starling wrote:
>> I was looking through some old files in our SourceForge project. I
>> opened a file called wiki.tar.gz, and inside were three complete
>> backups of the text of Wikipedia, from February, March and August 2001!
> I guess producing database dumps was easier in those days. Seriously
> though, this is absolutely fantastic news!
>
> --Michael Snow
>
> _______________________________________________
> WikiEN-l mailing list
> [hidden email]
> To unsubscribe from this mailing list, visit:
> https://lists.wikimedia.org/mailman/listinfo/wikien-l
>

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Emilio J. Rodríguez-Posada
In reply to this post by Tim Starling-2
Hi;

Thanks Tim. Congratulations.

Is Wikipedia:UuU[1] now out-of-date?

Regards,
emijrp

[1] http://en.wikipedia.org/wiki/Wikipedia:UuU


2010/12/14 Tim Starling <[hidden email]>

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z<http://noc.wikimedia.org/%7Etstarling/wikipedia-logs-2001-08-17.7z>
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
FT2
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

FT2
In reply to this post by WereSpielChequers-2
Deferring to tech views but I'd have thought "almost certainly not". There
may well be gaps after August 2001 for one thing; importing earlier records
would incorrectly imply a complete history was shown of user and page edits.

We probably could make a museum piece of them by creating "
January2001.wikipedia.org" though.

FT2



On Tue, Dec 14, 2010 at 5:08 PM, WereSpielChequers <
[hidden email]> wrote:

> Can these edits be imported into wikipedia in time for the tenth
> anniversary?
>
> I'm assuming some will relate to pages that have since been moved or
> deleted so I appreciate this won't be an easy project.
>
> WereSpielChequers
>
> On 14 December 2010 16:16, Michael Snow <[hidden email]> wrote:
> > On 12/14/2010 7:54 AM, Tim Starling wrote:
> >> I was looking through some old files in our SourceForge project. I
> >> opened a file called wiki.tar.gz, and inside were three complete
> >> backups of the text of Wikipedia, from February, March and August 2001!
> > I guess producing database dumps was easier in those days. Seriously
> > though, this is absolutely fantastic news!
> >
> > --Michael Snow
> >
> > _______________________________________________
> > WikiEN-l mailing list
> > [hidden email]
> > To unsubscribe from this mailing list, visit:
> > https://lists.wikimedia.org/mailman/listinfo/wikien-l
> >
>
> _______________________________________________
> WikiEN-l mailing list
> [hidden email]
> To unsubscribe from this mailing list, visit:
> https://lists.wikimedia.org/mailman/listinfo/wikien-l
>
_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

phoebe ayers-3
In reply to this post by Tim Starling-2
On Tue, Dec 14, 2010 at 7:54 AM, Tim Starling <[hidden email]> wrote:

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling

AWESOME. This is so cool. I've copied the research list too, since
there's many Wikipedia historians that will be eager to see the older
versions.

I hope we can get them up in a browsable way, like nostalgia.wikipedia.org!

-- phoebe

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Rob Lanphier
In reply to this post by Tim Starling-2
On Tue, Dec 14, 2010 at 7:54 AM, Tim Starling <[hidden email]> wrote:
> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.

Wow, this is really, really amazing!  I'm not sure just how you
avoided having a heart attack after seeing this:
> ------
> HomePage|979586833
> 1c1
> < Describe the new page here.
> ---
> > This is the new WikiPedia!

Great work!

Rob

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
FT2
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

FT2
In reply to this post by phoebe ayers-3
Would prefer on its own wiki as this is comprehensive up to a given date.
Maybe January2001.wikipedia.org -- immediate impact.

(DNS software cannot handle 2001.wikipedia.org)

FT2

On Tue, Dec 14, 2010 at 6:04 PM, phoebe ayers <[hidden email]> wrote:

>  On Tue, Dec 14, 2010 at 7:54 AM, Tim Starling <[hidden email]>
> wrote:
> > I was looking through some old files in our SourceForge project. I
> > opened a file called wiki.tar.gz, and inside were three complete
> > backups of the text of Wikipedia, from February, March and August 2001!
> >
> > This is exciting, because there is lots of article history in here
> > which was assumed to be lost forever.
> >
> > I've long been interested in Wikipedia's history, and I've tried in
> > the past to locate such backups. I asked various people who might have
> > had one. I had given up hope.
> >
> > The history of particularly old Wikipedia articles, as seen in the
> > present Wikipedia database, is incomplete, due to Usemod's policy of
> > deleting old revisions of pages after about a month. The script which
> > Brion wrote to import the article histories from UseMod to MediaWiki
> > only fetched those revisions which hadn't been purged yet.
> >
> > I didn't want to believe that those revisions had been lost forever,
> > and I even opened the UseMod source code and stared forlornly at the
> > unlink() call. What I (and Brion before) missed is that UseMod appends
> > a record of every change made to two files, called diff_log and rclog.
> > In these two files is a record of every change made to Wikipedia from
> > January 15 to August 17, 2001.
> >
> > I've put the two log files up on the web, at:
> >
> > http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
> >
> > The 7-zip archive is only 8.4MB -- much more manageable than today's
> > backups.
> >
> > rclog contains IP addresses. The Usemod software made IP addresses of
> > logged-in users public, so the people who made these edits had no
> > expectation that their IP address would be kept private. That, coupled
> > with the passage of time, makes me think that no harm to user privacy
> > can come from releasing these files.
> >
> > -- Tim Starling
>
> AWESOME. This is so cool. I've copied the research list too, since
> there's many Wikipedia historians that will be eager to see the older
> versions.
>
> I hope we can get them up in a browsable way, like nostalgia.wikipedia.org
> !
>
> -- phoebe
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Tim Starling-2
In reply to this post by Emilio J. Rodríguez-Posada
On 15/12/10 04:17, emijrp wrote:
> Hi;
>
> Thanks Tim. Congratulations.
>
> Is Wikipedia:UuU[1] now out-of-date?

Yes, the earliest surviving edit is now "This is the new WikiPedia!",
made to HomePage by office.bomis.com, presumably Jimmy. Larry signed a
comment a short time later from a different IP address, so it wasn't
him. Articles were created in the following order:

* HomePage
* WikiPedia
* PhilosophyAndLogic
* UnitedStates
* PopularMusic
* SportS
* MathematicsAndStatistics
* CountriesOfTheWorld
* AaA
* AfghanistaN
* UuU
* TechnologY
* ComputinG
* ComputerSoftware
* TransporT
* NamingConventions

-- Tim Starling


_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Charles Matthews
I appreciate the challenge in getting old versions posted again. But I'm
also interested in the folks, rather more than in CamelCase and UseMod.

As I asked somewhere else recently, where are they now? I don't mean
outing people; just what do we really know about the Old Bolsheviks,
shot or not? (I was rather saddened, talking of Old Bolsheviks, at
Stevertigo's recent ban and departure, not because I agreed with him,
but he was apparently editing in 13 June 2002, i.e. a year before me,
and despite our conflict on this list offered and played with me a
couple of games of online go.)

Où sont les Wiks d'antan?

Charles

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Mike DuPont
In reply to this post by Tim Starling-2
On Tue, Dec 14, 2010 at 11:02 PM, Tim Starling <[hidden email]> wrote:

> HomePage
> * WikiPedia
> * PhilosophyAndLogic
> * UnitedStates
> * PopularMusic
> * SportS
> * MathematicsAndStatistics
> * CountriesOfTheWorld
> * AaA
> * AfghanistaN
> * UuU
> * TechnologY
> * ComputinG
> * ComputerSoftware
> * TransporT
> * NamingConventions

Nice, I have added this as a userpage
http://en.wikipedia.org/wiki/User:Mdupont/FirstPages

All of them work except for. They have been deleted as meaningless
with no relevant historical value.
20:12, 18 April 2006 RexNL (talk | contribs) deleted "AfghanistaN" ‎
(content was: '{{db|R3:Redirects as a result of an implausible
typo}}#REDIRECT Afghanistan')
09:19, 24 May 2005 Thue (talk | contribs) deleted "TechnologY" ‎
(content was: '#REDIRECT Technology')
04:48, 8 March 2007 Raul654 (talk | contribs) deleted
"NamingConventions" ‎ (content was: '#REDIRECT wikipedia:Naming
conventions')

The should all be restored under the catagory Muesum of WIkipedia!

mike

--
James Michael DuPont
Member of Free Libre Open Source Software Kosova and Albania
flossk.org flossal.org

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Andrew Gray-3
In reply to this post by Tim Starling-2
On 14 December 2010 22:02, Tim Starling <[hidden email]> wrote:

> him. Articles were created in the following order:
>
> * HomePage
> * WikiPedia
> * PhilosophyAndLogic

It's interesting to note our early priorities!

http://grey.colorado.edu/wikipedia_2001/979602227.txt

Two months later...

http://en.wikipedia.org/w/index.php?title=PhilosophyAndLogic&oldid=272836

...we'd only done the last one of those.

--
- Andrew Gray
  [hidden email]

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Bugzilla from joseph.2008@reagle.org
In reply to this post by Tim Starling-2
On Tuesday, December 14, 2010, Tim Starling wrote:
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.

Unfortunately, it doesn't look like versions of the articles beyond the first ~10 are automatically recoverable. I wrote a Python script to reconstruct the early WP, but it fails because of apparent weaknesses in "normal diffs", which is what UseMod apparently uses. To reconstruct any particular version in time, I iteratively apply all diffs via `patch` up to that point. It doesn't take long before patch chokes on a diff. In fact, I've discovered there are simple cases in which normal_diff/patch are incapable of round tripping.

I hope someone will eventually prove me wrong, or some log is found that is actually capable of recreating the state. (I wonder what the point of providing a diff_log export is if it isn't useable, and perhaps UseMod folks could speak to that.)

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Federico Leva (Nemo)
In reply to this post by Tim Starling-2
Good news from Wiki-research-l in case you're not subscribed to it...

Nemo

-------- Messaggio Originale  --------
Oggetto: Re: [Wiki-research-l] [WikiEN-l] Old Wikipedia backups discovered
Data: Thu, 16 Dec 2010 13:53:14 -0500
Da: Joseph Reagle

I have the first 10K edits up reconstructed in their various pages at:
   http://cyber.law.harvard.edu/~reagle/wp-redux/

-------- Messaggio Originale  --------
Oggetto: Re: [Wiki-research-l] [WikiEN-l] Old Wikipedia backups discovered
Data: Fri, 17 Dec 2010 00:03:00 +1100
Da: Tim Starling

On 16/12/10 23:10, Joseph Reagle wrote:
 > On Wednesday, December 15, 2010, Tim Starling wrote:
 >> There were some changes made to the page text that weren't represented
 >> in diff_log, specifically changing certain camel-case links to free
 >> links.
 > It appears my problems were related to some CR/LF issues not
round-tripping between diff and patch, but I hope to be able to address
that. And yes, in addition to some of the CamelCase issues, I expect
another problem is that if a page is blanked "Describe the new page
here." will reappear outside of the diff_log.

I don't think that will be a problem. But there are other problems
that I've encountered.

UseMod had a deletion feature. It turns out to be easy enough to skip
deleted pages, since they don't have a corresponding entry in rclog.

It also had an admin-only rename feature, which optionally fixed links
in all pages. This accounts for the free link changes I was seeing
earlier. And it had a link replacement feature which could be invoked
without a page move. These features were rarely used, due to the
arcane interface, usually people just moved pages by copying and
pasting. But during the free-link conversion, a lot of pages were
renamed using the admin-only feature.

All these admin-only features were unlogged, but it turns out to be
possible to reconstruct page moves, because when a page was moved, its
name was updated in rclog but not in diff_log. By finding the first
diff_log entry with the new name, you can roughly work out when the
page moves were done.

Anyway, I'm developing a script which will import the dump into a
modified MediaWiki instance, the idea being that I can then export XML
from it. Once it works, I'll upload the XML to somewhere. I'm not sure
when that will be.

-- Tim Starling

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Charles Matthews
On 16/12/2010 20:01, Federico Leva (Nemo) wrote:

> Good news from Wiki-research-l in case you're not subscribed to it...
>
> Nemo
>
> -------- Messaggio Originale  --------
> Oggetto: Re: [Wiki-research-l] [WikiEN-l] Old Wikipedia backups discovered
> Data: Thu, 16 Dec 2010 13:53:14 -0500
> Da: Joseph Reagle
>
> I have the first 10K edits up reconstructed in their various pages at:
>     http://cyber.law.harvard.edu/~reagle/wp-redux/
>
Amazingly, AfghanistanTransportations still exists as a redirect. I
thought there were too many people with time on their hands persecuting
such dinosaur tracks. Of course it is now doomed ...

Charles


_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Nathan Awrich
From the [hidden email] log posted the other day, written by
Larry Sanger:

"Second, a little bit of history will help to explain this as well.  I was
more or less offered the job of editing Nupedia when I was, as an ABD
philosophy graduate student, soliciting Jimbo's (and other friends')
advice on a website I was thinking of starting.  It was the first I had
heard of Jimbo's idea of an open content encyclopedia, and I was delighted
to take the job."

Nathan

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Bugzilla from joseph.2008@reagle.org
In reply to this post by Federico Leva (Nemo)
On Thursday, December 16, 2010, Federico Leva (Nemo) wrote:
> I have the first 10K edits up reconstructed in their various pages at:
>    http://cyber.law.harvard.edu/~reagle/wp-redux/

I fixed some of the encoding issues. The DB dump contained different encodings. So, the encoding of each diff in the dump is independently now guessed using Python's CharDet (Universal Encoding Detector) library.

So now you can read up on the few "accented" topics in the early Wikipedia including: Göteborg, Köpenhamn, and Křbenhavn. (Nothing very exciting.) But it means articles, such as ASCII, are much improved as well. Interestingly, the ASCII page isn't about ASCII itself so much, but as to how to type non-ascii characters in the early Wikipedia.

  http://cyber.law.harvard.edu/~reagle/wp-redux/ASCII/983670583.html

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Martin Møller Skarbiniks Pedersen
On 17 December 2010 21:18, Joseph Reagle <[hidden email]> wrote:
> On Thursday, December 16, 2010, Federico Leva (Nemo) wrote:
>> I have the first 10K edits up reconstructed in their various pages at:
>>    http://cyber.law.harvard.edu/~reagle/wp-redux/
>
> I fixed some of the encoding issues. The DB dump contained different encodings. So, the encoding of each diff in the dump is independently now guessed using Python's CharDet (Universal Encoding Detector) library.
>
> So now you can read up on the few "accented" topics in the early Wikipedia including: Göteborg, Köpenhamn, and Křbenhavn.

Should probably be København and not Křbenhavn

/Martin

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Old Wikipedia backups discovered

Bugzilla from joseph.2008@reagle.org
On Sunday, December 19, 2010, Martin Møller Skarbiniks Pedersen wrote:
> Should probably be København and not Křbenhavn

Thanks Martin, that's evidence that there are still bugs, and that Python's Universal Encoding Detector is probabilistic!

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
12