Multi-server distributed MediaWiki

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Multi-server distributed MediaWiki

kellyterryjones
It's "easy" to mirror a MediaWiki from one primary server to a number
of secondary servers, but is it possible to have multiple primary servers?

Example: 10 servers and users can make changes on ANY of the 10
servers. Every night, the servers rsync to each other as follows:

1. If server X's version hasn't changed all day and server Y's version
HAS changed, server X accepts server Y's version.

2. If both server X's and server Y's versions have changed, automatic
CVS style merging is used to resolve the changes.

3. If CVS style merging yields a conflict, the site maintainer is
notified and must merge the two files manually (I'm thinking of a
creating a small site, so this shouldn't be too painful)

I realize the rules above only work for 2 servers -- is there a clever
version of this for n servers (n>2)?

--
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Multi-server distributed MediaWiki

George William Herbert
On 12/31/06, Kelly Jones <[hidden email]> wrote:

> It's "easy" to mirror a MediaWiki from one primary server to a number
> of secondary servers, but is it possible to have multiple primary servers?
>
> Example: 10 servers and users can make changes on ANY of the 10
> servers. Every night, the servers rsync to each other as follows:
>
> 1. If server X's version hasn't changed all day and server Y's version
> HAS changed, server X accepts server Y's version.
>
> 2. If both server X's and server Y's versions have changed, automatic
> CVS style merging is used to resolve the changes.
>
> 3. If CVS style merging yields a conflict, the site maintainer is
> notified and must merge the two files manually (I'm thinking of a
> creating a small site, so this shouldn't be too painful)
>
> I realize the rules above only work for 2 servers -- is there a clever
> version of this for n servers (n>2)?

What are you trying to accomplish by doing that?

The way the data is in the databases, it's a little more hard than that.

How well do you understand clustering theory?


--
-george william herbert
[hidden email]
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Multi-server distributed MediaWiki

KlinT
Hi,

May be you can try to have a look to mysql-cluster feature ...

But, i guess that you must adapt/change the mediawiki source code il  
order to make sql requests mysql-cluster compliant ... :)

should be hard job ... but funny

Best regards

Arnaud.

Le 2 janv. 07 à 01:31, George Herbert a écrit :

> On 12/31/06, Kelly Jones <[hidden email]> wrote:
>> It's "easy" to mirror a MediaWiki from one primary server to a number
>> of secondary servers, but is it possible to have multiple primary  
>> servers?
>>
>> Example: 10 servers and users can make changes on ANY of the 10
>> servers. Every night, the servers rsync to each other as follows:
>>
>> 1. If server X's version hasn't changed all day and server Y's  
>> version
>> HAS changed, server X accepts server Y's version.
>>
>> 2. If both server X's and server Y's versions have changed, automatic
>> CVS style merging is used to resolve the changes.
>>
>> 3. If CVS style merging yields a conflict, the site maintainer is
>> notified and must merge the two files manually (I'm thinking of a
>> creating a small site, so this shouldn't be too painful)
>>
>> I realize the rules above only work for 2 servers -- is there a  
>> clever
>> version of this for n servers (n>2)?
>
> What are you trying to accomplish by doing that?
>
> The way the data is in the databases, it's a little more hard than  
> that.
>
> How well do you understand clustering theory?
>
>
> --
> -george william herbert
> [hidden email]
> _______________________________________________
> MediaWiki-l mailing list
> [hidden email]
> http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
>

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Multi-server distributed MediaWiki

Kasimir Gabert
Hello Kelly Jones,

It seems to me that what you are trying to accomplish is already part
of MediaWiki -- just have all of the ten MediaWikis use the same
server, or set up each section of articles on a different
interwiki-linked database.  This would remove the need to rsync
between each of the servers, and it would remove the need for you to
manually change anything.  The conflicts would not happen, because
everything would be done in real time.

What are you trying to do?

Kasimir

On 1/1/07, KlinT <[hidden email]> wrote:

> Hi,
>
> May be you can try to have a look to mysql-cluster feature ...
>
> But, i guess that you must adapt/change the mediawiki source code il
> order to make sql requests mysql-cluster compliant ... :)
>
> should be hard job ... but funny
>
> Best regards
>
> Arnaud.
>
> Le 2 janv. 07 à 01:31, George Herbert a écrit :
>
> > On 12/31/06, Kelly Jones <[hidden email]> wrote:
> >> It's "easy" to mirror a MediaWiki from one primary server to a number
> >> of secondary servers, but is it possible to have multiple primary
> >> servers?
> >>
> >> Example: 10 servers and users can make changes on ANY of the 10
> >> servers. Every night, the servers rsync to each other as follows:
> >>
> >> 1. If server X's version hasn't changed all day and server Y's
> >> version
> >> HAS changed, server X accepts server Y's version.
> >>
> >> 2. If both server X's and server Y's versions have changed, automatic
> >> CVS style merging is used to resolve the changes.
> >>
> >> 3. If CVS style merging yields a conflict, the site maintainer is
> >> notified and must merge the two files manually (I'm thinking of a
> >> creating a small site, so this shouldn't be too painful)
> >>
> >> I realize the rules above only work for 2 servers -- is there a
> >> clever
> >> version of this for n servers (n>2)?
> >
> > What are you trying to accomplish by doing that?
> >
> > The way the data is in the databases, it's a little more hard than
> > that.
> >
> > How well do you understand clustering theory?
> >
> >
> > --
> > -george william herbert
> > [hidden email]
> > _______________________________________________
> > MediaWiki-l mailing list
> > [hidden email]
> > http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
> >
>
> _______________________________________________
> MediaWiki-l mailing list
> [hidden email]
> http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
>


--
Kasimir Gabert
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Multi-server distributed MediaWiki

kellyterryjones
I'm trying to run mediawiki on a free server, but I'm worried about
bandwidth limits.

I'd like to create mirror sites or let other people create mirror
sites (the content will be 100% open source) and have a "master site"
that "randomly" redirects people to a mirror.

The mirror sites should not be "read-only". People should be able to
edit content on the mirror sites and have their changes pushed to the
other mirror sites.

The mirror servers should be as independent as possible:
geographically diverse, on different backbones, each using their own
MySQL server, owned by different people, etc. Some of the mirror
servers may be free, others may be paid hosting, others may be
dedicated servers, etc.

The only thing they'd have in common is they'd all be running
mediawiki and some cron script that merges changed data in from all
the mirrors.

Thoughts?

--
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.

On 1/1/07, Kasimir Gabert <[hidden email]> wrote:

> Hello Kelly Jones,
>
> It seems to me that what you are trying to accomplish is already part
> of MediaWiki -- just have all of the ten MediaWikis use the same
> server, or set up each section of articles on a different
> interwiki-linked database.  This would remove the need to rsync
> between each of the servers, and it would remove the need for you to
> manually change anything.  The conflicts would not happen, because
> everything would be done in real time.
>
> What are you trying to do?
>
> Kasimir
>
> On 1/1/07, KlinT <[hidden email]> wrote:
> > Hi,
> >
> > May be you can try to have a look to mysql-cluster feature ...
> >
> > But, i guess that you must adapt/change the mediawiki source code il
> > order to make sql requests mysql-cluster compliant ... :)
> >
> > should be hard job ... but funny
> >
> > Best regards
> >
> > Arnaud.
> >
> > Le 2 janv. 07 à 01:31, George Herbert a écrit :
> >
> > > On 12/31/06, Kelly Jones <[hidden email]> wrote:
> > >> It's "easy" to mirror a MediaWiki from one primary server to a number
> > >> of secondary servers, but is it possible to have multiple primary
> > >> servers?
> > >>
> > >> Example: 10 servers and users can make changes on ANY of the 10
> > >> servers. Every night, the servers rsync to each other as follows:
> > >>
> > >> 1. If server X's version hasn't changed all day and server Y's
> > >> version
> > >> HAS changed, server X accepts server Y's version.
> > >>
> > >> 2. If both server X's and server Y's versions have changed, automatic
> > >> CVS style merging is used to resolve the changes.
> > >>
> > >> 3. If CVS style merging yields a conflict, the site maintainer is
> > >> notified and must merge the two files manually (I'm thinking of a
> > >> creating a small site, so this shouldn't be too painful)
> > >>
> > >> I realize the rules above only work for 2 servers -- is there a
> > >> clever
> > >> version of this for n servers (n>2)?
> > >
> > > What are you trying to accomplish by doing that?
> > >
> > > The way the data is in the databases, it's a little more hard than
> > > that.
> > >
> > > How well do you understand clustering theory?
> > >
> > >
> > > --
> > > -george william herbert
> > > [hidden email]
> > > _______________________________________________
> > > MediaWiki-l mailing list
> > > [hidden email]
> > > http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
> > >
> >
> > _______________________________________________
> > MediaWiki-l mailing list
> > [hidden email]
> > http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
> >
>
>
> --
> Kasimir Gabert
> _______________________________________________
> MediaWiki-l mailing list
> [hidden email]
> http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
>
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Multi-server distributed MediaWiki

Brion Vibber
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kelly Jones wrote:
> I'm trying to run mediawiki on a free server, but I'm worried about
> bandwidth limits.

Ah, you want something for nothing. :)

- -- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFmv+YwRnhpk1wk44RAvEUAJ9fjUsHmhdarPfSvqQrzZEzAGM5FQCg3d+q
Io41t+UgTSQPdWSIKH0YF48=
=3KPj
-----END PGP SIGNATURE-----
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Multi-server distributed MediaWiki

Jérémie Bouillon
In reply to this post by kellyterryjones
Kelly Jones wrote:
> I'm trying to run mediawiki on a free server, but I'm worried about
> bandwidth limits.

Hosting is cheap. Unless you are working on a huge project (and then
you'll need money, money maker, and/or sponsors, but that's the way it
should be) it will cost you only a few bucks a month, top.


_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Multi-server distributed MediaWiki

Bugzilla from sy1234@gmail.com
In reply to this post by kellyterryjones
On 1/2/07, Kelly Jones <[hidden email]> wrote:
> The mirror servers should be as independent as possible:
> geographically diverse, on different backbones, each using their own
> MySQL server, owned by different people, etc. Some of the mirror
> servers may be free, others may be paid hosting, others may be
> dedicated servers, etc.

You're looking for the holy grail of hosting.

Some of this has been implemented via ideas like mysql-cluster
(mentioned earlier) but I gather that mediawiki itself would need to
be modified to support that technology.

Load-balancing a webserver is already something that's fairly well known.


Basically you'd have to have some front-end computer which understands
the bandwidth usage of all of its attached hosts, and it would
intelligently balance the load over to those mirrors which have
bandwidth left.

And then those secondary computers would each have an installation
which would have some kind of synthronized database cache.  This would
require a bit of MySQL magic, but it is supported by it.  But it's not
supported by MediaWiki.. so edit collisions and such would be
impossible to sort out if you have cached-reads from or slow-writes
with the mirror sites..

But here's the problem.. what bandwidth is being saved, when the same
copy of a page has to get synced to all of the other mirrors?  Oh, you
could modify mediawiki to only update its local copy if a page edit
request comes in and an old copy is stored locally.. but where is the
disaster recovery when one of your mirrors goes down?

This is a complex issue.. and probably not worth the spare change
which basic hosting would cost.  If the site is wildly popular.. then
become self-supporting via memberships, ads, donations, etc..
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Multi-server distributed MediaWiki

kellyterryjones
On 1/4/07, Sy Ali <[hidden email]> wrote:
> On 1/2/07, Kelly Jones <[hidden email]> wrote:
> > The mirror servers should be as independent as possible:
> > geographically diverse, on different backbones, each using their own
> > MySQL server, owned by different people, etc. Some of the mirror
> > servers may be free, others may be paid hosting, others may be
> > dedicated servers, etc.
>
> You're looking for the holy grail of hosting.

Here's my grail-shaped beacon for small-medium MediaWikis:

All the mirrors publish sha1sums of the most recent versions of all
their pages. Since MySQL has sha1 built-in, this should be do-able
(maybe future versions of MediaWiki will store the sha1sum as a field
in the 'text' table, making this even faster).

A central server pulls the sha1's of all mirrors hourly (or whatever)
and finds pages that aren't identical on all mirrors (including
newly-created pages).

The central server runs the Unix 'merge' command (several times if
needed) to create the 'new' version of the page, which may or may not
match the version on some of the servers. Irreconcilable differences
are handled by the WikiSysop (or Drew Barrymore <G>).

The central server pushes the new version to all servers that don't
already have it.

For larger MediaWikis, perhaps only publish the sha1's of pages
changed in the last 4 hours (if the central server checks hourly, this
gives plenty of overlap/redundancy).

The *only* change a mirror/mirrored MediaWiki has to make is to
install a PHP script that runs a MySQL query to report the sha1sum's
of the latest versions of all its pages. The central server handles
everything else. This works almost out-of-the-box.

Disaster recovery: if server X dies, just copy the db from server Y.

I don't really care about load-balancing or anything like that. Just
tell end users: go to mirror1.mywiki.com-- if that fails, go to
mirror2.mywiki.com, and so on. Or create a metapage that JUST lists all
the mirror URLs for your MediaWiki and tell users to try them in
order. The nice thing is that you can edit from any mirror, not just
the original. Of course, the DNS/metapage for mywiki.com has to be
more reliable than any of the mirrors.

Not sure I even care about saving money (though the pseudo-anonymity
of free hosting is nice)-- creating a semi-robust mirror-able MediaWiki
has a philosophical interest as well.

This has lots of problems (some listed below), but might be a good start?

Problems (which can be resolved long-term with some work + a more
complex process):

Page version numbers will be different on the mirrors.

Not all previous edits will be available on all mirrors (if something
gets edited several times in an hour)

For large sites, there'll be a large number of irreconcilable differences

Reverting an edit that just adds a small piece of text will be
impossible (Unix merge will always favor the version with the added
text)

The comments when pages are changed will be mostly lost

Many changes will appear to come from the central server, not the IP
address of the person who actually edited

At least 317 other problems I haven't thought about <G>

--
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Multi-server distributed MediaWiki

Domas Mituzas
In reply to this post by Bugzilla from sy1234@gmail.com
Hi!

Some of this has been implemented via ideas like mysql-cluster
> (mentioned earlier) but I gather that mediawiki itself would need to
> be modified to support that technology.


MySQL Cluster is designed to run in low-latency environment. Though local
gigabit ethernet provides quite good round trip time, specialized server
interconnects (like dolphin one) help even more. Geo-balancing on cluster is
pain (unless you have really good interconnect, sub-10ms). And you may want
to run one of late 5.1 snapshots.. ;-)

Load-balancing a webserver is already something that's fairly well known.


We do lots of database load balancing, with replication. It still has single
master for write requests, though.


> intelligently balance the load over to those mirrors which have
> bandwidth left.


Just run a caching proxy.. ;) It is probably possible to implement that as
PHP script too, though it would suck.

This is a complex issue.. and probably not worth the spare change
> which basic hosting would cost.  If the site is wildly popular.. then
> become self-supporting via memberships, ads, donations, etc..


Dedicated servers nowadays are at 50eur/month range. With unlimited traffic
\o/

-- Domas
http://dammit.lt/
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l