[MediaWiki-l] How to anonymise MW database?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[MediaWiki-l] How to anonymise MW database?

Adam Nielsen
Hi all,

Just as a safety net I'd like to publish copies of the databases
powering my MediaWiki installations, so that if I lose all my data and
backups, I might still be able to get something back (as well as
reassuring my users that their contributions won't be lost forever!)

I don't want to publish the raw database dump because it contains
people's e-mail addresses and password hashes, but I'm thinking that
perhaps if I take a copy of the database and erase the values in those
tables, I might be able to export and publish the 'anonymised' copy.
I'd rather not omit the user table entirely, because if I do ever need
to restore the wiki from this copy I'll need it to associate the right
account with each edit.

Does anyone know which fields contain sensitive data that I should
remove? I can see these obvious candidates:

  user.user_password
  user.user_newpassword
  user.user_email
  user.user_token
  user.user_email_token

Are there any others that are potentially confidential and should not
be made public?

Many thanks,
Adam.



_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: How to anonymise MW database?

Arcane 21
I can't think of anything offhand. You might want to run a database search to make sure none of those values in those user fields is available elsewhere on the database in cleartext, but otherwise, you should be fine.

> To: [hidden email]
> From: [hidden email]
> Date: Sat, 28 Sep 2013 22:51:38 +1000
> Subject: [MediaWiki-l] How to anonymise MW database?
>
> Hi all,
>
> Just as a safety net I'd like to publish copies of the databases
> powering my MediaWiki installations, so that if I lose all my data and
> backups, I might still be able to get something back (as well as
> reassuring my users that their contributions won't be lost forever!)
>
> I don't want to publish the raw database dump because it contains
> people's e-mail addresses and password hashes, but I'm thinking that
> perhaps if I take a copy of the database and erase the values in those
> tables, I might be able to export and publish the 'anonymised' copy.
> I'd rather not omit the user table entirely, because if I do ever need
> to restore the wiki from this copy I'll need it to associate the right
> account with each edit.
>
> Does anyone know which fields contain sensitive data that I should
> remove? I can see these obvious candidates:
>
>   user.user_password
>   user.user_newpassword
>   user.user_email
>   user.user_token
>   user.user_email_token
>
> Are there any others that are potentially confidential and should not
> be made public?
>
> Many thanks,
> Adam.
>
>
>
> _______________________________________________
> MediaWiki-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
     
_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: How to anonymise MW database?

Adam Nielsen
> I can't think of anything offhand. You might want to run a database
> search to make sure none of those values in those user fields is
> available elsewhere on the database in cleartext, but otherwise, you
> should be fine.

Great!  Thanks for the quick reply!

Cheers,
Adam.



_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: How to anonymise MW database?

YiFei
In reply to this post by Adam Nielsen
I suggest you ask Coren for this. He's in charge of the database of WMF Labs.

> On Sep 28, 2013, at 8:51 PM, Adam Nielsen <[hidden email]> wrote:
>
> Hi all,
>
> Just as a safety net I'd like to publish copies of the databases
> powering my MediaWiki installations, so that if I lose all my data and
> backups, I might still be able to get something back (as well as
> reassuring my users that their contributions won't be lost forever!)
>
> I don't want to publish the raw database dump because it contains
> people's e-mail addresses and password hashes, but I'm thinking that
> perhaps if I take a copy of the database and erase the values in those
> tables, I might be able to export and publish the 'anonymised' copy.
> I'd rather not omit the user table entirely, because if I do ever need
> to restore the wiki from this copy I'll need it to associate the right
> account with each edit.
>
> Does anyone know which fields contain sensitive data that I should
> remove? I can see these obvious candidates:
>
>  user.user_password
>  user.user_newpassword
>  user.user_email
>  user.user_token
>  user.user_email_token
>
> Are there any others that are potentially confidential and should not
> be made public?
>
> Many thanks,
> Adam.
>
>
>
> _______________________________________________
> MediaWiki-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l