You too can clean out the tons of database Default Messages

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

You too can clean out the tons of database Default Messages

jidanni
Gentlemen, if your personal Mediawiki wiki has been around since early
2007, you might want to clean out the thousands of Mediawiki:
namespace rows that were left in the database by
maintenance/deleteDefaultMessages.php . Wouldn't it make you feel good
to clean out thousands of wasted rows, leaving behind e.g., on a small
wiki, perhaps just a few hundred rows that are actually related to us?

I don't know why the design decision was made to just leave those
Mediawiki: namespace items sitting in the archive and text tables. But
OK, we proceed to clean them out by hand. I hope I got this right:

$ mysqlshow --count myDatabase > before.txt
$ mysql myDatabase
SELECT COUNT(*) FROM archive WHERE ar_namespace = 8 AND ar_user_text = 'MediaWiki default';
COUNT(*)
1518
DELETE FROM archive WHERE ar_namespace = 8 AND ar_user_text = 'MediaWiki default';
$ php purgeOldText.php --purge
Purge Old Text

Searching for active text records in revisions table...done.
Searching for active text records in archive table...done.
Searching for inactive text records...done.
1518 inactive items found.
Deleting...done.
$ mysql myDatabase
SELECT COUNT(*) FROM logging WHERE log_comment = 'No longer required' AND log_namespace = 8;
COUNT(*)
1510
SELECT MIN(log_timestamp),MAX(log_timestamp) FROM logging WHERE log_comment = 'No longer required' AND log_namespace = 8;
MIN(log_timestamp) MAX(log_timestamp)
20070226185326 20070226194040
DELETE FROM logging WHERE log_comment = 'No longer required' AND log_namespace = 8;
$ mysqlshow --count myDatabase|diff before.txt -|sed '/|/!d'
< | archive           |       15 |       2206 |
> | archive           |       15 |        688 |
< | logging           |       10 |       2597 |
> | logging           |       10 |       1087 |
< | text              |        3 |       4466 |
> | text              |        3 |       2948 |

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Domas Mituzas

On Mar 24, 2009, at 1:02 PM, [hidden email] wrote:

> Gentlemen, if your personal Mediawiki wiki has been around since early

mediawiki-l@, please

--
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Bugzilla from andrew@epstone.net
In reply to this post by jidanni
On Tue, Mar 24, 2009 at 10:02 PM,  <[hidden email]> wrote:
> I don't know why the design decision was made to just leave those
> Mediawiki: namespace items sitting in the archive and text tables. But

Just run update.php. It does all that crap for you.

--
Andrew Garrett

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Eugene Zelenko
Hi!

I asked similar question week ago. Can anybody from Wikimedia stuff
answer this question?

Eugene.

PS

We have new sysadmin there. May be it's good task to start?

On Tue, Mar 24, 2009 at 4:34 AM, Andrew Garrett <[hidden email]> wrote:

> On Tue, Mar 24, 2009 at 10:02 PM,  <[hidden email]> wrote:
>> I don't know why the design decision was made to just leave those
>> Mediawiki: namespace items sitting in the archive and text tables. But
>
> Just run update.php. It does all that crap for you.
>
> --
> Andrew Garrett
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

jidanni
> mediawiki-l@, please
OK, sorry. Next time...
>> I don't know why the design decision was made to just leave those
>> Mediawiki: namespace items sitting in the archive and text tables. But
> Just run update.php. It does all that crap for you.

I am sorry but update.php just calls deleteDefaultMessages.php which
just leaves them sitting in the text and archive tables forever!

It just does a courteous delete, leaving all 1500 yes, deleted, but
not expunged, forever.

If your wiki has been around since early 2007, have a look in your
archive and text database tables. The 1500 hundred message strings
should be very obvious there in the text table, and their
corresponding items in the logging and archive tables.

Years and years will go by and one day you get curious and look in the
database and surprise, for a small wiki perhaps 90% of the database is
occupied by useless bloat.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Steve Sanbeg
On Wed, 25 Mar 2009 04:38:52 +0800, jidanni wrote:

>> mediawiki-l@, please
> OK, sorry. Next time...
>>> I don't know why the design decision was made to just leave those
>>> Mediawiki: namespace items sitting in the archive and text tables. But
>> Just run update.php. It does all that crap for you.
>
> I am sorry but update.php just calls deleteDefaultMessages.php which
> just leaves them sitting in the text and archive tables forever!
>
> It just does a courteous delete, leaving all 1500 yes, deleted, but
> not expunged, forever.
>
> If your wiki has been around since early 2007, have a look in your
> archive and text database tables. The 1500 hundred message strings
> should be very obvious there in the text table, and their
> corresponding items in the logging and archive tables.
>
> Years and years will go by and one day you get curious and look in the
> database and surprise, for a small wiki perhaps 90% of the database is
> occupied by useless bloat.

maintenance/nukeNS.php was written for pretty much that purpose; to purge
all the obsolete stuff from that namespace without flooding delete logs,
etc.



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

jidanni
>>>>> "SS" == Steve Sanbeg <[hidden email]> writes:

SS> maintenance/nukeNS.php was written for pretty much that purpose; to purge
SS> all the obsolete stuff from that namespace without flooding delete logs,
SS> etc.

Alas, looking in

 * Remove pages with only 1 revision from the MediaWiki namespace, without
 * flooding recent changes, delete logs, etc.
 * Irreversible (can't use standard undelete) and does not update link tables
 *
 * This is mainly useful to run before maintenance/update.php when upgrading
 * to 1.9, to prevent flooding recent changes/deletion logs.  It's intended
 * to be conservative, so it's possible that a few entries will be left for
 * deletion by the upgrade script.  It's also possible that it hasn't been
 * tested thouroughly enough, and will delete something it shouldn't; so
 * back up your DB if there's anything in the MediaWiki that is important to
                                                       ^namespace
 * you.

The problem is nukeNS.php was and is never called by update.php!
Check with ls -ltu (if your access times are tracked in mount(8).)
It is never referred to by any other file but itself:
$ find|LC_ALL=C xargs grep -il nukens
./maintenance/nukeNS.php

Hence its goal of
 * run before maintenance/update.php when upgrading to 1.9,
was never achieved!

And, calling it now years later won't help, as update.php invocation
of deleteDefaultMessages.php has already long ago put the messages
etc. in the areas of the database where nukeNS.php doesn't look.

Hence I have proved that wikis that have been around since 1.8 will
have about 1500 rows of useless messages still in the text table, and
corresponding entries in the archive table.

Sure, on Wikipedia that is a speck of sand, but e.g., on a small quiet
wiki the vast majority of the rows in the database will be useless
trash. Repent now!

Another problem for the reader of this thread is "well gosh, has my
wiki been around since 1.8? How can I tell for sure if my memory is
hazy?" Well, maybe somebody can mention how to detect that. Maybe
version increments are logged or should be, for those tracking SVN or
not. Anyways, you can always just look for those useless entries in
your database, that ought to tell you if your wiki has been around
since 1.8.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Gerard Meijssen-3
Hoi,
I admire your wish for cleaning up. My question is what are we talking
about. Is this about cluttering up disk space or are these messages in
memory.
Thanks,
        GerardM

2009/3/26 <[hidden email]>

> >>>>> "SS" == Steve Sanbeg <[hidden email]> writes:
>
> SS> maintenance/nukeNS.php was written for pretty much that purpose; to
> purge
> SS> all the obsolete stuff from that namespace without flooding delete
> logs,
> SS> etc.
>
> Alas, looking in
>
>  * Remove pages with only 1 revision from the MediaWiki namespace, without
>  * flooding recent changes, delete logs, etc.
>  * Irreversible (can't use standard undelete) and does not update link
> tables
>  *
>  * This is mainly useful to run before maintenance/update.php when
> upgrading
>  * to 1.9, to prevent flooding recent changes/deletion logs.  It's intended
>  * to be conservative, so it's possible that a few entries will be left for
>  * deletion by the upgrade script.  It's also possible that it hasn't been
>  * tested thouroughly enough, and will delete something it shouldn't; so
>  * back up your DB if there's anything in the MediaWiki that is important
> to
>                                                       ^namespace
>  * you.
>
> The problem is nukeNS.php was and is never called by update.php!
> Check with ls -ltu (if your access times are tracked in mount(8).)
> It is never referred to by any other file but itself:
> $ find|LC_ALL=C xargs grep -il nukens
> ./maintenance/nukeNS.php
>
> Hence its goal of
>  * run before maintenance/update.php when upgrading to 1.9,
> was never achieved!
>
> And, calling it now years later won't help, as update.php invocation
> of deleteDefaultMessages.php has already long ago put the messages
> etc. in the areas of the database where nukeNS.php doesn't look.
>
> Hence I have proved that wikis that have been around since 1.8 will
> have about 1500 rows of useless messages still in the text table, and
> corresponding entries in the archive table.
>
> Sure, on Wikipedia that is a speck of sand, but e.g., on a small quiet
> wiki the vast majority of the rows in the database will be useless
> trash. Repent now!
>
> Another problem for the reader of this thread is "well gosh, has my
> wiki been around since 1.8? How can I tell for sure if my memory is
> hazy?" Well, maybe somebody can mention how to detect that. Maybe
> version increments are logged or should be, for those tracking SVN or
> not. Anyways, you can always just look for those useless entries in
> your database, that ought to tell you if your wiki has been around
> since 1.8.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Marco Schuster-2
On Thu, Mar 26, 2009 at 7:02 AM, Gerard Meijssen
<[hidden email]> wrote:
> Hoi,
> I admire your wish for cleaning up. My question is what are we talking
> about. Is this about cluttering up disk space or are these messages in
> memory.
Apparently they are in the MySQL database... and the less data is
present in it, the better, I think.

Marco
--
VMSoft GbR
Nabburger Str. 15
81737 München
Geschäftsführer: Marco Schuster, Volker Hemmert
http://vmsoft-gbr.de

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Tim Landscheidt
Marco Schuster <[hidden email]> wrote:

>> I admire your wish for cleaning up. My question is what are we talking
>> about. Is this about cluttering up disk space or are these messages in
>> memory.
> Apparently they are in the MySQL database... and the less data is
> present in it, the better, I think.

First rule of optimization: Don't think, measure.

Tim


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Aryeh Gregor
In reply to this post by Gerard Meijssen-3
On Thu, Mar 26, 2009 at 2:02 AM, Gerard Meijssen
<[hidden email]> wrote:
> I admire your wish for cleaning up. My question is what are we talking
> about. Is this about cluttering up disk space or are these messages in
> memory.

They use some disk space, which should be negligible for most people.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Platonides
Aryeh Gregor wrote:
> On Thu, Mar 26, 2009 at 2:02 AM, Gerard Meijssen
> <[hidden email]> wrote:
>> I admire your wish for cleaning up. My question is what are we talking
>> about. Is this about cluttering up disk space or are these messages in
>> memory.
>
> They use some disk space, which should be negligible for most people.

All entries at NS_MEDIAWIKI are loaded from db (or memcached) on each
page request. It's a tiny amount, but I'd be wary of things moved so often.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Aryeh Gregor
On Thu, Mar 26, 2009 at 7:05 PM, Platonides <[hidden email]> wrote:
> All entries at NS_MEDIAWIKI are loaded from db (or memcached) on each
> page request. It's a tiny amount, but I'd be wary of things moved so often.

Only pages that exist will be loaded.  The pages in question have been
deleted and therefore will not be loaded.  The only question is
whether the rows are kept in the text table.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

jidanni
>>>>> "AG" == Aryeh Gregor <[hidden email]> writes:

AG> The only question is whether the rows are kept in the text table.

Yes, for wikis that have been around since 1.8, run this simple test
$ mysql MyDatabase <<\EOF
SELECT COUNT(*) FROM archive WHERE ar_namespace  = 8 AND ar_user_text = 'MediaWiki default';
SELECT COUNT(*) FROM logging WHERE log_namespace = 8 AND log_comment  = 'No longer required';
EOF
COUNT(*)
1518
COUNT(*)
1510

(Note you may have to say e.g., FROM wiki_archive depending on how your
tables are named.)

Now allow us to examine a small, but "healthy" wiki,
http://transgender-taiwan.org/ , which was born after the 1.8/1.9
boundary in the Mediawiki stratigraphic record (
http://en.wikipedia.org/wiki/Cretaceous%E2%80%93Tertiary_extinction_event
).

$ mysqlshow --count transgender|egrep text\|\ archive\|Rows
|      Tables       | Columns  | Total Rows |
| archive           |       15 |          7 |
| text              |        3 |         62 |

This small wiki escaped having 1500 dinosaur corpse rows being added
to each of those two tables.

Sure, the sanitary landfill that our database has now become will
still function fine, but doesn't it feel bad to know that at least for
a smaller wiki the majority of the rows may very well just be junk
from a chance mishap at the 1.8-1.9 boundary? Yes, counting bytes
instead of rows might help one sleep at night.

Anyway, how would a WikiSysop who feels this is bad correct it for his
own wiki? I hope the method I used at the beginning of this thread
is safe and correct.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Tim Starling-2
In reply to this post by jidanni
[hidden email] wrote:
> I don't know why the design decision was made to just leave those
> Mediawiki: namespace items sitting in the archive and text tables.

The backup feature was added after initial testing, due to the
potential for loss of important data. I considered that it was worth
using a few hundred KB of disk space in exchange for theoretical
disaster recovery.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

jidanni
>>>>> "TS" == Tim Starling <[hidden email]> writes:

TS> I considered that it was worth using a few hundred KB of disk
TS> space in exchange for theoretical disaster recovery.

OK, I just hope the method mentioned at the beginning of this thread
is a safe way to clean up the 1500 entries from the three affected
tables, a total of 4500 rows, for those who now wish to.

A quick slob check of how many rows my small unaffected (post 1.8-1.9
boundary) wiki has:
$ mysqlshow --count transgender|
perl -nlwe 'next if / set\./;if(/(\d+)\D+$/){$tot+=$1};END{print $tot}'
328

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

jidanni
In reply to this post by jidanni
I asked my pal about his small wiki
http://www.hserus.net/wiki/index.php/Main_Page .
He has even more of those rows, revolving uselessly on his disks...

>>>>> "S" == Suresh Ramasubramanian <[hidden email]> writes:

S> Interesting

mysql> SELECT COUNT(*) FROM archive WHERE ar_namespace  = 8 AND ar_user_text ='MediaWiki default';
S> +----------+
S> | COUNT(*) |
S> +----------+
S> |     1796 |
S> +----------+

mysql> SELECT COUNT(*) FROM logging WHERE log_namespace = 8 AND log_comment = 'No longer required';
S> +----------+
S> | COUNT(*) |
S> +----------+
S> |     1638 |
S> +----------+

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

Bugzilla from andrew@epstone.net
On Sat, Mar 28, 2009 at 3:04 PM,  <[hidden email]> wrote:
> I asked my pal about his small wiki
> http://www.hserus.net/wiki/index.php/Main_Page .
> He has even more of those rows, revolving uselessly on his disks...

Is he short on disk space? Each of those rows is a kB or two at most,
so we're wasting... under ten megabytes. If a megabyte or two is a
significant amount of disk space for you, you need a new web host.


--
Andrew Garrett

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

jidanni
>>>>> "AG" == Andrew Garrett <[hidden email]> writes:

AG> Is he short on disk space? Each of those rows is a kB or two at most,
AG> so we're wasting... under ten megabytes. If a megabyte or two is a
AG> significant amount of disk space for you, you need a new web host.

It's not the megabytes, its the idea that we're walking around with
out diapers full of ____, and we don't know how to control ourselves.

Some folks think its OK to flush any old thing down the toilet, "just
buy a new one" one day if it breaks.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: You too can clean out the tons of database Default Messages

jidanni
In reply to this post by jidanni
Allow me to forward to the list non-subscriber Suresh's reply:
>>>>> "S" == Suresh Ramasubramanian <[hidden email]> writes:

S> I'm not particularly short of disk space or memory, thanks. But as
S> Dan mentions, it does sound like a needless waste - and the volume
S> of dud entries is certainly going to scale far higher up when you
S> try it on, say, wikipedia.org or mediawiki.org

S> srs

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l