MessageCache and MediaWiki namespace redesign

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

MessageCache and MediaWiki namespace redesign

Tim Starling-2
The current localisation system has a number of undesirable properties:

* Start from a cold cache is extremely slow, taking from 20 seconds to
several minutes.
* The database is preloaded with hundreds of default messages, causing:
   * Slow installation, to the point where web installation is entirely
impossible on some resource-limited shared web hosts without commenting
out the message cache section
   * Excessive disk usage and slow backups on sites with large numbers of
near-empty wikis
* The message cache can exceed the 1MB limit of MemCached, causing total
failure
* The performance of the message cache degrades when some of the keys are
large

I spent a fair bit of time pondering how to fix this, but I think it was
Rotem who finally suggested the obvious solution: don't have pages for
default messages.

The only reason for preloading the MediaWiki namespace was to provide
admnis with model text upon which they could base their translations. This
justification has long since disappeared, since action=edit, action=view
and Special:Allmessages are now all capable of drawing default message
text from the message files if the articles do not exist.

So here's what I've done in my working copy, soon to be committed:
* Removed InitialiseMessages.inc and rebuildMessages.php
* During upgrade, delete all pages in the MediaWiki namespace which were
last modified by "MediaWiki default".
* Reoptimised the message cache for the sparse MediaWiki namespace.

The main message cache (i.e. the $wgDBname:messages key) will now be a
faithful representation of the contents of the MediaWiki namespace,
instead of (as it previously was) a representation of the contents of all
messages. If a page does not exist, it will not have a message cache key.

To solve the performance problems of having a small number of large items,
any page which is larger than some threshold (10KB by default) will only
have a placeholder stored in the main message cache, instead of the
complete page text. The full contents of these items are stored separately
in the cache.

-- Tim Starling

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MessageCache and MediaWiki namespace redesign

Rob Church
On 05/01/07, Tim Starling <[hidden email]> wrote:

> So here's what I've done in my working copy, soon to be committed:
> * Removed InitialiseMessages.inc and rebuildMessages.php
> * During upgrade, delete all pages in the MediaWiki namespace which were
> last modified by "MediaWiki default".
> * Reoptimised the message cache for the sparse MediaWiki namespace.
>
> The main message cache (i.e. the $wgDBname:messages key) will now be a
> faithful representation of the contents of the MediaWiki namespace,
> instead of (as it previously was) a representation of the contents of all
> messages. If a page does not exist, it will not have a message cache key.
>
> To solve the performance problems of having a small number of large items,
> any page which is larger than some threshold (10KB by default) will only
> have a placeholder stored in the main message cache, instead of the
> complete page text. The full contents of these items are stored separately
> in the cache.

Sounds good, but...

...will this in any way affect the means by which extensions have to
add messages to the Message Cache? Will the existing interfaces still
work, or do we now have to update all the code, 'cause I'm concerned
about people breaking backwards compatibility again.


Rob Church
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MessageCache and MediaWiki namespace redesign

Tim Starling-2
Rob Church wrote:

> On 05/01/07, Tim Starling <[hidden email]> wrote:
>> So here's what I've done in my working copy, soon to be committed:
>> * Removed InitialiseMessages.inc and rebuildMessages.php
>> * During upgrade, delete all pages in the MediaWiki namespace which were
>> last modified by "MediaWiki default".
>> * Reoptimised the message cache for the sparse MediaWiki namespace.
>>
>> The main message cache (i.e. the $wgDBname:messages key) will now be a
>> faithful representation of the contents of the MediaWiki namespace,
>> instead of (as it previously was) a representation of the contents of all
>> messages. If a page does not exist, it will not have a message cache key.
>>
>> To solve the performance problems of having a small number of large items,
>> any page which is larger than some threshold (10KB by default) will only
>> have a placeholder stored in the main message cache, instead of the
>> complete page text. The full contents of these items are stored separately
>> in the cache.
>
> Sounds good, but...
>
> ...will this in any way affect the means by which extensions have to
> add messages to the Message Cache? Will the existing interfaces still
> work, or do we now have to update all the code, 'cause I'm concerned
> about people breaking backwards compatibility again.

The only interface change is that I've renamed getFromCache() to the more
accurate getMsgFromNamespace(), but that function was implicitly private.
Anything that accesses $wgMessageCache->mCache directly will be broken.
But the usual public interfaces such as addMessages() and get() are preserved.

I'm usually pretty careful these days to maintain interface compatibility,
but it was no challenge this time around, since the code changes are
fairly minimal.

-- Tim Starling

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MessageCache and MediaWiki namespace redesign

Rob Church
On 05/01/07, Tim Starling <[hidden email]> wrote:
> The only interface change is that I've renamed getFromCache() to the more
> accurate getMsgFromNamespace(), but that function was implicitly private.
> Anything that accesses $wgMessageCache->mCache directly will be broken.
> But the usual public interfaces such as addMessages() and get() are preserved.
>
> I'm usually pretty careful these days to maintain interface compatibility,
> but it was no challenge this time around, since the code changes are
> fairly minimal.

Excellent, thanks very much.


Rob Church
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MessageCache and MediaWiki namespace redesign

Evan Martin-2
In reply to this post by Tim Starling-2
On 1/5/07, Tim Starling <[hidden email]> wrote:
> * The message cache can exceed the 1MB limit of MemCached, causing total
> failure

For what it's worth, this can be adjusted with a compile-time constant:
http://lists.danga.com/pipermail/memcached/2006-January/001879.html

(Of course, you have many other good points beyond this...)
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MessageCache and MediaWiki namespace redesign

Mark Clements (HappyDog)
In reply to this post by Tim Starling-2
"Tim Starling" <[hidden email]> wrote in
message news:enlnmo$69l$[hidden email]...

> * During upgrade, delete all pages in the MediaWiki namespace which were
> last modified by "MediaWiki default".

Perhaps a second check if it wasn't, to see if the contents are identical to
the expected contents.  I'm sure there are lots of cases where a message was
changed and subsequently reverted, or where spelling errors were fixed
locally before upgrading to a version where they were fixed by default.

- Mark Clements (HappyDog)



_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] MessageCache and MediaWiki namespace redesign

Brion Vibber
In reply to this post by Tim Starling-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tim Starling wrote:
> So here's what I've done in my working copy, soon to be committed:
> * Removed InitialiseMessages.inc and rebuildMessages.php
> * During upgrade, delete all pages in the MediaWiki namespace which were
> last modified by "MediaWiki default".
> * Reoptimised the message cache for the sparse MediaWiki namespace.

I've gone ahead and taken this live; the batch deletions are running in
the background.

A couple tweaks:

* The deleteDefaultMessages script now ensures that the 'MediaWiki
default' user is set up as a bot, so the flood of deletions is hidden
from recent changes.

* It turns out some messages try to transclude other messages. French
Wikipedia's MediaWiki:Copyrightwarning for instance trancluded
MediaWiki:Copyrightpage to get the default local page name for the
copyright information page; also some of the default messages for the
Special:Export page in various languages fetch the 'Main Page' name this
way for the example text. I've changed the transclusion logic to fetch
from the message cache when pulling a {{MediaWiki:}} page that doesn't
exist in the database; that should better match the 'expected' behavior
from these pages seeming to exist for viewing purposes.


Some wikis also experienced a temporary problem with the '!TOO BIG'
message being showed in place of all UI messages.

I think this could have been due to funny updating behavior; several
machines which were recently reinstalled didn't have the 'sudo'
configuration set up correctly, so parts of the update scripts didn't
run correctly. This may have lead to inconsistent behavior, though I'm
not sure that's the cause; or it may have just been partial updates
where MessageCache had new code but DefaultSettings didn't have the
configuration variable yet, so the maximum message size triggered on
everything.

- -- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFoOsZwRnhpk1wk44RAsVhAJ0XexHAPbcljbfWsQdwnN88KxtMPQCfcpEj
CyTDXbImzJ4u9B8dWKmUWj0=
=XAtU
-----END PGP SIGNATURE-----

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] MessageCache and MediaWiki namespace redesign

Ligulem
On 07.01.2007 13:44, Brion Vibber wrote:
> I've gone ahead and taken this live; the batch deletions are running in
> the background.

On en.wikipedia I currently (14:56, 7 January 2007 UTC) see

&lt;main page&gt;

in the sidebar (html entities appearing on the rendered page?). I'm
using monobook.

I've looked through recent changes in MediaWiki namespace on en but
can't find anything that could explain this.

So could that problem have something to do with this change here?

Apologies for possibly asking stupid things in the wrong place.


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] MessageCache and MediaWiki namespace redesign

Simetrical
On 1/7/07, Ligulem <[hidden email]> wrote:
> On en.wikipedia I currently (14:56, 7 January 2007 UTC) see
>
> &lt;main page&gt;
>
> in the sidebar (html entities appearing on the rendered page?). I'm
> using monobook.

For future reference, that's the secret code for "the message
'main_page' is supposed to be here, but it doesn't exist".  Seems to
be fixed now, anyway.

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] MessageCache and MediaWiki namespace redesign

Tim Starling-2
In reply to this post by Ligulem
Ligulem wrote:

> On 07.01.2007 13:44, Brion Vibber wrote:
>> I've gone ahead and taken this live; the batch deletions are running in
>> the background.
>
> On en.wikipedia I currently (14:56, 7 January 2007 UTC) see
>
> &lt;main page&gt;
>
> in the sidebar (html entities appearing on the rendered page?). I'm
> using monobook.
>
> I've looked through recent changes in MediaWiki namespace on en but
> can't find anything that could explain this.
>
> So could that problem have something to do with this change here?
>
> Apologies for possibly asking stupid things in the wrong place.

It was reported on #wikimedia-tech at 15:08 UTC and I fixed it in under 5
minutes. That would have been the best place to report it -- the sooner we
hear about it, the sooner we can fix it.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wikitech-l