[DEPRECATED] datasets.wikimedia.org

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[DEPRECATED] datasets.wikimedia.org

Dan Andreescu
Hi all,

*Who: *This mostly applies to people who have access to the stat1002 and
stat1003 statistics machines on the production cluster, and publish
datasets as static files.

*What:* We are no longer using datasets.wikimedia.org to serve static
datasets.  We have set up a redirect, so requests like
https://datasets.wikimedia.org/ $1 will be sent to
https://analytics.wikimedia.org/datasets/archive/ $1.  Most importantly,
publishing datasets is now much easier.  Any files you put in
published-datasets on either machine:


Are going to be merged together and served together on:


One request as we all enjoy this much simpler process: let's use README
files in these directories to let future versions of us know what the
datasets are all about.  That will make the repository more fun for others
to browse and ease future cleanups.  Thank you!


If something of yours got lost, let us know, we have backups.  If you had
stuff that we might have cleaned up, we put it in
/srv/otto-to-delete-datasets-cleanup and
/a/otto-to-delete-datasets-cleanup.  Take a look there and you can move
files as you see fit into published-datasets


For a long time, publishing files from stat1002 and stat1003 was quite
painful.  There were three folders, some on both boxes, some only on one
box, symlinks, rsyncs, it was bad.  We talked to everyone who had files in
these folders and gathered consensus for this deprecation.  If this message
catches you by surprise, please let us know what channel we should reach
you in next time and we'll add it to our communication plan.

This work is tracked in T159409 <https://phabricator.wikimedia.org/T159409>
Wiki-research-l mailing list
[hidden email]