Old Wikipedia backups discovered

classic Classic list List threaded Threaded
53 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Old Wikipedia backups discovered

Tim Starling-2
I was looking through some old files in our SourceForge project. I
opened a file called wiki.tar.gz, and inside were three complete
backups of the text of Wikipedia, from February, March and August 2001!

This is exciting, because there is lots of article history in here
which was assumed to be lost forever.

I've long been interested in Wikipedia's history, and I've tried in
the past to locate such backups. I asked various people who might have
had one. I had given up hope.

The history of particularly old Wikipedia articles, as seen in the
present Wikipedia database, is incomplete, due to Usemod's policy of
deleting old revisions of pages after about a month. The script which
Brion wrote to import the article histories from UseMod to MediaWiki
only fetched those revisions which hadn't been purged yet.

I didn't want to believe that those revisions had been lost forever,
and I even opened the UseMod source code and stared forlornly at the
unlink() call. What I (and Brion before) missed is that UseMod appends
a record of every change made to two files, called diff_log and rclog.
In these two files is a record of every change made to Wikipedia from
January 15 to August 17, 2001.

I've put the two log files up on the web, at:

http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z

The 7-zip archive is only 8.4MB -- much more manageable than today's
backups.

rclog contains IP addresses. The Usemod software made IP addresses of
logged-in users public, so the people who made these edits had no
expectation that their IP address would be kept private. That, coupled
with the passage of time, makes me think that no harm to user privacy
can come from releasing these files.

-- Tim Starling

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Peter Coombe
That's fantastic news, and just in time for the 10th anniversary too,
when I'm sure the early days of Wikipedia will be in the limelight.
Great find Tim!

Would it be at all possible to import these into the current system? I
know someone was importing edits from the Nostalgia wiki. It would be
wonderful to finally have a complete article history.

Pete / the wub


On 14 December 2010 15:54, Tim Starling <[hidden email]> wrote:

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Chad
In reply to this post by Tim Starling-2
On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling <[hidden email]> wrote:

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>

I have to say this is super cool. It's like digging up a time capsule
right before the 10th anniversary. One of my favorite early edits:

"This is the new WikiPedia!  The idea here is to write a complete
encyclopedia from scratch, without peer review process, etc.
Some people think that this may be a hopeless endeavor, that
the result will necessarily suck.  We aren't so sure.  So, let's get
to work!"

-Chad

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Magnus Manske-2
In reply to this post by Tim Starling-2
Great news indeed!

Now I can finally figure out when my first edit was :-)

Magnus



On Tue, Dec 14, 2010 at 3:54 PM, Tim Starling <[hidden email]> wrote:

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Teun Spaans
In reply to this post by Tim Starling-2
Tim,

wonderful news!
Thank you for making them publicly available!

Of course I immediately downloaded them, and I must have a look at them
later this week. Though they are from before I became active (2003) I am
very curious if the articles in these files still exist, and how much they
changed.

teun spaans




On Tue, Dec 14, 2010 at 4:54 PM, Tim Starling <[hidden email]>wrote:

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z<http://noc.wikimedia.org/%7Etstarling/wikipedia-logs-2001-08-17.7z>
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Michael Snow-5
In reply to this post by Tim Starling-2
On 12/14/2010 7:54 AM, Tim Starling wrote:
> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
I guess producing database dumps was easier in those days. Seriously
though, this is absolutely fantastic news!

--Michael Snow

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Steven Walling
In reply to this post by Chad
This is fantastic, and the timing could not be better.

If anyone finds anything noteworthy, please add it to the timeline of
Wikipedia that we're building at the 10th anniversary wiki,[1] as well as
the other tools for cataloging interesting tidbits from our history.[2]

1. http://ten.wikipedia.org/wiki/Wikipedia_timeline
2. http://ten.wikipedia.org/wiki/Share

On Tue, Dec 14, 2010 at 8:11 AM, Chad <[hidden email]> wrote:

> On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling <[hidden email]>
> wrote:
> > I was looking through some old files in our SourceForge project. I
> > opened a file called wiki.tar.gz, and inside were three complete
> > backups of the text of Wikipedia, from February, March and August 2001!
> >
> > This is exciting, because there is lots of article history in here
> > which was assumed to be lost forever.
> >
> > I've long been interested in Wikipedia's history, and I've tried in
> > the past to locate such backups. I asked various people who might have
> > had one. I had given up hope.
> >
> > The history of particularly old Wikipedia articles, as seen in the
> > present Wikipedia database, is incomplete, due to Usemod's policy of
> > deleting old revisions of pages after about a month. The script which
> > Brion wrote to import the article histories from UseMod to MediaWiki
> > only fetched those revisions which hadn't been purged yet.
> >
> > I didn't want to believe that those revisions had been lost forever,
> > and I even opened the UseMod source code and stared forlornly at the
> > unlink() call. What I (and Brion before) missed is that UseMod appends
> > a record of every change made to two files, called diff_log and rclog.
> > In these two files is a record of every change made to Wikipedia from
> > January 15 to August 17, 2001.
> >
> > I've put the two log files up on the web, at:
> >
> > http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
> >
> > The 7-zip archive is only 8.4MB -- much more manageable than today's
> > backups.
> >
> > rclog contains IP addresses. The Usemod software made IP addresses of
> > logged-in users public, so the people who made these edits had no
> > expectation that their IP address would be kept private. That, coupled
> > with the passage of time, makes me think that no harm to user privacy
> > can come from releasing these files.
> >
> > -- Tim Starling
> >
>
> I have to say this is super cool. It's like digging up a time capsule
> right before the 10th anniversary. One of my favorite early edits:
>
> "This is the new WikiPedia!  The idea here is to write a complete
> encyclopedia from scratch, without peer review process, etc.
> Some people think that this may be a hopeless endeavor, that
> the result will necessarily suck.  We aren't so sure.  So, let's get
> to work!"
>
> -Chad
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
FT2
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

FT2
In reply to this post by Tim Starling-2
Wow, Tim. Just wow!

Is it just me who sees NYT carrying a headline, "On eve of 10th anniversary,
WIkipedia developers turn up earliest records" ?

FT2



On Tue, Dec 14, 2010 at 3:54 PM, Tim Starling <[hidden email]>wrote:

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Emilio J. Rodríguez-Posada
In reply to this post by Tim Starling-2
Hi;

Thanks Tim. Congratulations.

Is Wikipedia:UuU[1] now out-of-date?

Regards,
emijrp

[1] http://en.wikipedia.org/wiki/Wikipedia:UuU


2010/12/14 Tim Starling <[hidden email]>

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z<http://noc.wikimedia.org/%7Etstarling/wikipedia-logs-2001-08-17.7z>
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

WJhonson
In reply to this post by Tim Starling-2
In a message dated 12/14/2010 8:21:09 AM Pacific Standard Time,
[hidden email] writes:


> This is fantastic, and the timing could not be better.
>
> If anyone finds anything noteworthy, please add it to the timeline of
> Wikipedia that we're building at the 10th anniversary wiki,[1] as well as
> the other tools for cataloging interesting tidbits from our history.[2]
>
> 1. http://ten.wikipedia.org/wiki/Wikipedia_timeline
> 2. http://ten.wikipedia.org/wiki/Share
>

Hmm I wonder if some things can be added there.... (sound of feathers
ruffling)

Btw how does one *open* this tarball thing (on Windows) ?
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

phoebe ayers-3
In reply to this post by Tim Starling-2
On Tue, Dec 14, 2010 at 7:54 AM, Tim Starling <[hidden email]> wrote:

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling

AWESOME. This is so cool. I've copied the research list too, since
there's many Wikipedia historians that will be eager to see the older
versions.

I hope we can get them up in a browsable way, like nostalgia.wikipedia.org!

-- phoebe

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Jay Walsh
This is definitely a tremendous asset leading up to our big bday in January. I hope we can extract and post some of the real gems.  

Thanks for the resourcefulness and the sharing, Tim.

On Dec 14, 2010, at 10:04 AM, phoebe ayers wrote:

> On Tue, Dec 14, 2010 at 7:54 AM, Tim Starling <[hidden email]> wrote:
>> I was looking through some old files in our SourceForge project. I
>> opened a file called wiki.tar.gz, and inside were three complete
>> backups of the text of Wikipedia, from February, March and August 2001!
>>
>> This is exciting, because there is lots of article history in here
>> which was assumed to be lost forever.
>>
>> I've long been interested in Wikipedia's history, and I've tried in
>> the past to locate such backups. I asked various people who might have
>> had one. I had given up hope.
>>
>> The history of particularly old Wikipedia articles, as seen in the
>> present Wikipedia database, is incomplete, due to Usemod's policy of
>> deleting old revisions of pages after about a month. The script which
>> Brion wrote to import the article histories from UseMod to MediaWiki
>> only fetched those revisions which hadn't been purged yet.
>>
>> I didn't want to believe that those revisions had been lost forever,
>> and I even opened the UseMod source code and stared forlornly at the
>> unlink() call. What I (and Brion before) missed is that UseMod appends
>> a record of every change made to two files, called diff_log and rclog.
>> In these two files is a record of every change made to Wikipedia from
>> January 15 to August 17, 2001.
>>
>> I've put the two log files up on the web, at:
>>
>> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
>>
>> The 7-zip archive is only 8.4MB -- much more manageable than today's
>> backups.
>>
>> rclog contains IP addresses. The Usemod software made IP addresses of
>> logged-in users public, so the people who made these edits had no
>> expectation that their IP address would be kept private. That, coupled
>> with the passage of time, makes me think that no harm to user privacy
>> can come from releasing these files.
>>
>> -- Tim Starling
>
> AWESOME. This is so cool. I've copied the research list too, since
> there's many Wikipedia historians that will be eager to see the older
> versions.
>
> I hope we can get them up in a browsable way, like nostalgia.wikipedia.org!
>
> -- phoebe
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

--
Jay Walsh
Head of Communications
WikimediaFoundation.org
blog.wikimedia.org
+1 (415) 839 6885 x 609, @jansonw


_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Wikis analysed

Olaf Simons
In reply to this post by Peter Coombe
Hi,

I am thinking of recommending a wiki database to a research project
planned at Erfurt University. The group I have to advise is planning to
edit late 17th and early 18th century letters of the "republic of
letters" with the aim to reconstruct the flow of ideas and the personal
networks that generated this flow. A wiki should be a superb tool for
the editing process the project will have to get through. Yet I am more
interested in tools we would later on use to analyse our data (we will
prabably create pages of individual letters, other pages on authors and
topics, and, of course, categories etc.).

My question is now: I have seen exploits (yet never taken any notes)
that analysed Wikis and gave net-work structures of the interrelated
pages and category trees. One such thing was shown here only recently:

http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-11-29/News_and_notes

...yet the digest given here would be too vague for our purposes. We
would probably have to plan the entire wiki in a way that we could get
defiinite pictures of the development of 17th century intellectual
networks (how do they spread on the European map? Who is communicating
with whom? Who is playing what role in the process?), and of the flow of
topics within these networks.

Ideas of who would provide technical solutions and give advise on how to
create such wiki in a manner that it can be analysed fruitfully, would
be most welcome,

regards
Olaf Simons


Gotha Research Centre, Germany
...and Germany's wikipedia

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Rob Lanphier
In reply to this post by Tim Starling-2
On Tue, Dec 14, 2010 at 7:54 AM, Tim Starling <[hidden email]> wrote:
> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.

Wow, this is really, really amazing!  I'm not sure just how you
avoided having a heart attack after seeing this:
> ------
> HomePage|979586833
> 1c1
> < Describe the new page here.
> ---
> > This is the new WikiPedia!

Great work!

Rob

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

Moka Pantages
In reply to this post by Tim Starling-2
This is so exciting!  To Steven's point: we've also started a page
where folks can add bits of interesting information as they excavate
the files [1].   Can't wait to dig in!

Congrats, Tim!

[1] http://ten.wikipedia.org/wiki/Wikipedia_in_the_Beginning


Date: Tue, 14 Dec 2010 08:20:10 -0800
From: Steven Walling <[hidden email]>
Subject: Re: [Foundation-l] Old Wikipedia backups discovered
To: Wikimedia Foundation Mailing List
       <[hidden email]>
Message-ID:
       <AANLkTin9CjXR1S_eCfR3nR6Xmt6C4o=[hidden email]>
Content-Type: text/plain; charset=ISO-8859-1

This is fantastic, and the timing could not be better.

If anyone finds anything noteworthy, please add it to the timeline of
Wikipedia that we're building at the 10th anniversary wiki,[1] as well as
the other tools for cataloging interesting tidbits from our history.[2]

1. http://ten.wikipedia.org/wiki/Wikipedia_timeline
2. http://ten.wikipedia.org/wiki/Share

On Tue, Dec 14, 2010 at 8:11 AM, Chad <[hidden email]> wrote:

> On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling <[hidden email]>
> wrote:
> > I was looking through some old files in our SourceForge project. I
> > opened a file called wiki.tar.gz, and inside were three complete
> > backups of the text of Wikipedia, from February, March and August 2001!
> >
> > This is exciting, because there is lots of article history in here
> > which was assumed to be lost forever.
> >
> > I've long been interested in Wikipedia's history, and I've tried in
> > the past to locate such backups. I asked various people who might have
> > had one. I had given up hope.
> >
> > The history of particularly old Wikipedia articles, as seen in the
> > present Wikipedia database, is incomplete, due to Usemod's policy of
> > deleting old revisions of pages after about a month. The script which
> > Brion wrote to import the article histories from UseMod to MediaWiki
> > only fetched those revisions which hadn't been purged yet.
> >
> > I didn't want to believe that those revisions had been lost forever,
> > and I even opened the UseMod source code and stared forlornly at the
> > unlink() call. What I (and Brion before) missed is that UseMod appends
> > a record of every change made to two files, called diff_log and rclog.
> > In these two files is a record of every change made to Wikipedia from
> > January 15 to August 17, 2001.
> >
> > I've put the two log files up on the web, at:
> >
> > http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
> >
> > The 7-zip archive is only 8.4MB -- much more manageable than today's
> > backups.
> >
> > rclog contains IP addresses. The Usemod software made IP addresses of
> > logged-in users public, so the people who made these edits had no
> > expectation that their IP address would be kept private. That, coupled
> > with the passage of time, makes me think that no harm to user privacy
> > can come from releasing these files.
> >
> > -- Tim Starling
> >
>
> I have to say this is super cool. It's like digging up a time capsule
> right before the 10th anniversary. One of my favorite early edits:
>
> "This is the new WikiPedia!  The idea here is to write a complete
> encyclopedia from scratch, without peer review process, etc.
> Some people think that this may be a hopeless endeavor, that
> the result will necessarily suck.  We aren't so sure.  So, let's get
> to work!"
>
> -Chad
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

phoebe ayers-3
FYI, there is an existing timeline at:

http://meta.wikimedia.org/wiki/Wikipedia_timeline

And lots of other wikipedia history pages on English, too.

:)
Phoebe

On Tue, Dec 14, 2010 at 10:23 AM, Moka Pantages <[hidden email]> wrote:

> This is so exciting!  To Steven's point: we've also started a page
> where folks can add bits of interesting information as they excavate
> the files [1].   Can't wait to dig in!
>
> Congrats, Tim!
>
> [1] http://ten.wikipedia.org/wiki/Wikipedia_in_the_Beginning
>
>
> Date: Tue, 14 Dec 2010 08:20:10 -0800
> From: Steven Walling <[hidden email]>
> Subject: Re: [Foundation-l] Old Wikipedia backups discovered
> To: Wikimedia Foundation Mailing List
>       <[hidden email]>
> Message-ID:
>       <AANLkTin9CjXR1S_eCfR3nR6Xmt6C4o=[hidden email]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> This is fantastic, and the timing could not be better.
>
> If anyone finds anything noteworthy, please add it to the timeline of
> Wikipedia that we're building at the 10th anniversary wiki,[1] as well as
> the other tools for cataloging interesting tidbits from our history.[2]
>
> 1. http://ten.wikipedia.org/wiki/Wikipedia_timeline
> 2. http://ten.wikipedia.org/wiki/Share
>
> On Tue, Dec 14, 2010 at 8:11 AM, Chad <[hidden email]> wrote:
>
>> On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling <[hidden email]>
>> wrote:
>> > I was looking through some old files in our SourceForge project. I
>> > opened a file called wiki.tar.gz, and inside were three complete
>> > backups of the text of Wikipedia, from February, March and August 2001!

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
FT2
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

FT2
In reply to this post by WJhonson
Winrar's your best bet. Other archivers may be equally good.

FT2

On Tue, Dec 14, 2010 at 5:53 PM, <[hidden email]> wrote:

> In a message dated 12/14/2010 8:21:09 AM Pacific Standard Time,
> [hidden email] writes:
>
>
> > This is fantastic, and the timing could not be better.
> >
> > If anyone finds anything noteworthy, please add it to the timeline of
> > Wikipedia that we're building at the 10th anniversary wiki,[1] as well as
> > the other tools for cataloging interesting tidbits from our history.[2]
> >
> > 1. http://ten.wikipedia.org/wiki/Wikipedia_timeline
> > 2. http://ten.wikipedia.org/wiki/Share
> >
>
> Hmm I wonder if some things can be added there.... (sound of feathers
> ruffling)
>
> Btw how does one *open* this tarball thing (on Windows) ?
>  _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
FT2
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

FT2
In reply to this post by phoebe ayers-3
Would prefer on its own wiki as this is comprehensive up to a given date.
Maybe January2001.wikipedia.org -- immediate impact.

(DNS software cannot handle 2001.wikipedia.org)

FT2

On Tue, Dec 14, 2010 at 6:04 PM, phoebe ayers <[hidden email]> wrote:

>  On Tue, Dec 14, 2010 at 7:54 AM, Tim Starling <[hidden email]>
> wrote:
> > I was looking through some old files in our SourceForge project. I
> > opened a file called wiki.tar.gz, and inside were three complete
> > backups of the text of Wikipedia, from February, March and August 2001!
> >
> > This is exciting, because there is lots of article history in here
> > which was assumed to be lost forever.
> >
> > I've long been interested in Wikipedia's history, and I've tried in
> > the past to locate such backups. I asked various people who might have
> > had one. I had given up hope.
> >
> > The history of particularly old Wikipedia articles, as seen in the
> > present Wikipedia database, is incomplete, due to Usemod's policy of
> > deleting old revisions of pages after about a month. The script which
> > Brion wrote to import the article histories from UseMod to MediaWiki
> > only fetched those revisions which hadn't been purged yet.
> >
> > I didn't want to believe that those revisions had been lost forever,
> > and I even opened the UseMod source code and stared forlornly at the
> > unlink() call. What I (and Brion before) missed is that UseMod appends
> > a record of every change made to two files, called diff_log and rclog.
> > In these two files is a record of every change made to Wikipedia from
> > January 15 to August 17, 2001.
> >
> > I've put the two log files up on the web, at:
> >
> > http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
> >
> > The 7-zip archive is only 8.4MB -- much more manageable than today's
> > backups.
> >
> > rclog contains IP addresses. The Usemod software made IP addresses of
> > logged-in users public, so the people who made these edits had no
> > expectation that their IP address would be kept private. That, coupled
> > with the passage of time, makes me think that no harm to user privacy
> > can come from releasing these files.
> >
> > -- Tim Starling
>
> AWESOME. This is so cool. I've copied the research list too, since
> there's many Wikipedia historians that will be eager to see the older
> versions.
>
> I hope we can get them up in a browsable way, like nostalgia.wikipedia.org
> !
>
> -- phoebe
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
FT2
Reply | Threaded
Open this post in threaded view
|

Re: Old Wikipedia backups discovered

FT2
In reply to this post by phoebe ayers-3
See "see also" etc in [[History of Wikipedia]].

FT2

On Tue, Dec 14, 2010 at 7:27 PM, phoebe ayers <[hidden email]> wrote:

> FYI, there is an existing timeline at:
>
> http://meta.wikimedia.org/wiki/Wikipedia_timeline
>
> And lots of other wikipedia history pages on English, too.
>
> :)
> Phoebe
>
> On Tue, Dec 14, 2010 at 10:23 AM, Moka Pantages <[hidden email]>
> wrote:
> > This is so exciting!  To Steven's point: we've also started a page
> > where folks can add bits of interesting information as they excavate
> > the files [1].   Can't wait to dig in!
> >
> > Congrats, Tim!
> >
> > [1] http://ten.wikipedia.org/wiki/Wikipedia_in_the_Beginning
> >
> >
> > Date: Tue, 14 Dec 2010 08:20:10 -0800
> > From: Steven Walling <[hidden email]>
> > Subject: Re: [Foundation-l] Old Wikipedia backups discovered
> > To: Wikimedia Foundation Mailing List
> >       <[hidden email]>
> > Message-ID:
> >       <AANLkTin9CjXR1S_eCfR3nR6Xmt6C4o=[hidden email]>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > This is fantastic, and the timing could not be better.
> >
> > If anyone finds anything noteworthy, please add it to the timeline of
> > Wikipedia that we're building at the 10th anniversary wiki,[1] as well as
> > the other tools for cataloging interesting tidbits from our history.[2]
> >
> > 1. http://ten.wikipedia.org/wiki/Wikipedia_timeline
> > 2. http://ten.wikipedia.org/wiki/Share
> >
> > On Tue, Dec 14, 2010 at 8:11 AM, Chad <[hidden email]> wrote:
> >
> >> On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling <[hidden email]
> >
> >> wrote:
> >> > I was looking through some old files in our SourceForge project. I
> >> > opened a file called wiki.tar.gz, and inside were three complete
> >> > backups of the text of Wikipedia, from February, March and August
> 2001!
>
>  _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikis analysed

John Mark Vandenberg
In reply to this post by Olaf Simons
Hi Olaf,

This would be a good WikiProject within Wikisource, or on top of Wikisource.

Do you have scans of the letters?

http://en.wikipedia.org/wiki/Wikisource

Wikisource is already set up to manage the transcription and
presentation of the letters, pages about authors, etc., and the
community will pitch in with setting up your data.

You can focus on the linking between texts, analysis, etc.

The wiki-research-l list may be of interest to you.

https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

On Wed, Dec 15, 2010 at 5:15 AM, Olaf Simons
<[hidden email]> wrote:

> Hi,
>
> I am thinking of recommending a wiki database to a research project
> planned at Erfurt University. The group I have to advise is planning to
> edit late 17th and early 18th century letters of the "republic of
> letters" with the aim to reconstruct the flow of ideas and the personal
> networks that generated this flow. A wiki should be a superb tool for
> the editing process the project will have to get through. Yet I am more
> interested in tools we would later on use to analyse our data (we will
> prabably create pages of individual letters, other pages on authors and
> topics, and, of course, categories etc.).
>
> My question is now: I have seen exploits (yet never taken any notes)
> that analysed Wikis and gave net-work structures of the interrelated
> pages and category trees. One such thing was shown here only recently:
>
> http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-11-29/News_and_notes
>
> ...yet the digest given here would be too vague for our purposes. We
> would probably have to plan the entire wiki in a way that we could get
> defiinite pictures of the development of 17th century intellectual
> networks (how do they spread on the European map? Who is communicating
> with whom? Who is playing what role in the process?), and of the flow of
> topics within these networks.
>
> Ideas of who would provide technical solutions and give advise on how to
> create such wiki in a manner that it can be analysed fruitfully, would
> be most welcome,
>
> regards
> Olaf Simons
>
>
> Gotha Research Centre, Germany
> ...and Germany's wikipedia
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
123