Fwd: [Wikipedia-l] New static HTML dumps available

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Fwd: [Wikipedia-l] New static HTML dumps available

Casey Brown-3
This is an often asked question on this list...

---------- Forwarded message ----------
From: Tim Starling <[hidden email]>
Date: Tue, Jul 1, 2008 at 10:18 PM
Subject: [Wikipedia-l] New static HTML dumps available
To: [hidden email]
Cc: [hidden email]


New static HTML dumps of all Wikipedia editions are now available:

http://static.wikipedia.org/

Altogether, the dumps are 650GB uncompressed, 40GB compressed.

I think a reasonable next step for this project would be to write filter
scripts that take a compressed dump, reduce the article count in some way,
and then recompress it, possibly in a different format. For instance, we
could have a "most popular 4GB" of the English Wikipedia, based on page
view statistics, recompressed as an SQLite database.

-- Tim Starling


_______________________________________________
Wikipedia-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l



--
Casey Brown
Cbrown1023

---
Note: This e-mail address is used for mailing lists. Personal emails sent to
this address will probably get lost.

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l