[Mediawiki-l] Request for comments: Schema change to add Special:Contributors

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[Mediawiki-l] Request for comments: Schema change to add Special:Contributors

Titoxd@Wikimedia
Last night, after I asked Brion whether he could turn on Rob's
Special:Contributors extension on Wikimedia sites, and Domas colorfully
indicated that it would be a Very Bad Idea(TM). The data for the extension
can be retrieved from the revision table, but due to the existing topology
of the table and the current load on the revision table itself, it would
cause a very expensive query. He indicated that another option would be to
create a new database table with page, user and editcount data. I've kept
thinking about the idea, and have modified Domas's suggestion slightly:

 

mysql> describe contributors;

+--------------------+-----------------+------+-----+---------+-------------
---+

| Field              | Type            | Null | Key | Default | Extra
|

+--------------------+-----------------+------+-----+---------+-------------
---+

| co_id              | int(8) unsigned | NO   | PRI | NULL    |
auto_increment |

| co_page            | int(8) unsigned | NO   | MUL | NULL    |
|

| co_user            | int(5) unsigned | NO   | MUL | 0       |
|

| co_user_text       | varchar(255)    | NO   |     | NULL    |
|

| co_editcount       | int(11)         | NO   |     | NULL    |
|

| co_editcount_minor | int(11)         | NO   |     | NULL    |
|

| co_timestamp       | char(14)        | NO   |     | NULL    |
|

+--------------------+-----------------+------+-----+---------+-------------
---+

 

Each row in the database would represent each row in the output of
http://vs.aka-online.de/cgi-bin/wppagehiststat.pl, to pick an example. The
co_id is the primary key of the table, and co_page links to the usual
suspect, page.page_id. Co_user and co_user_text link to user_id and
user_name, as well. Co_editcount is the number of edits said user has made
to the page, and co_editcount_minor represents the number of minor edits
done to the page (added because someone is going to ask for it sooner or
later). Co_timestamp is the timestamp of generation of the row, added in
case a fancy algorithm to detect old queries is made in the future.

 

As to how to initially populate the table: well, a bulk initialization
script is one option, but Domas didn't like it either. ;) :P What I
suggested is to populate the table on demand, and maintain it later, and
after a night's sleep, I thought about how to refine it more:

 

* On page creation, generate a row for the page and the user.

* On page edit, one of two options:

** If a row for the page and user exists already, then UPDATE the
co_editcount and co_editcount_minor fields as necessary.

** If rows for the page exist, but none points to the user, generate a new
row.

** If there are no rows pointing to the page, create the necessary rows from
the data in the revision table.

* On a query from [[Special:Contributors]], if there are no rows pointing to
the row, generate them from revision, then return the expected results,
Otherwise, just return the results.

* On page deletion/undeletion/oversight, SELECT all the rows pointing to a
particular page and DELETE them from the table, essentially invalidating the
cache for the page.

 

As to why the change is desired, it was described in detail at
http://bugzilla.wikimedia.org/show_bug.cgi?id=7988. Also, this could help to
implement some long-lasting suggestions to enhance Wikimedia projects'
credibility and citability. For example, Roy Rosenzweig of George Mason
University made a few recommendations in his article "Can History be Open
Source? Wikipedia and the Future of the Past"
(http://chnm.gmu.edu/resources/essays/d/42, and a good read, by the way),
such as putting the total number of edits made to a page into every page's
footer (next to [[MediaWiki:Lastmodifiedat]]). Also, with a JOIN with the
user table, a percentage of edits made by active editors can be displayed to
the viewer, which is another suggestion by Rosenzweig.  

 

There are probably more efficient ways to do this, but I just wanted to get
the ball rolling... any comments? Should it be done?

 

Titoxd.

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Request for comments: Schema change to add Special:Contributors

Rob Church
On 10/01/07, Titoxd@Wikimedia <[hidden email]> wrote:
> Last night, after I asked Brion whether he could turn on Rob's
> Special:Contributors extension on Wikimedia sites, and Domas colorfully
> indicated that it would be a Very Bad Idea(TM). The data for the extension

Domas loves to hate.

> can be retrieved from the revision table, but due to the existing topology
> of the table and the current load on the revision table itself, it would
> cause a very expensive query. He indicated that another option would be to

Yes. There's an index that we don't have applied that would help this
a little, but not a huge amount.

> create a new database table with page, user and editcount data. I've kept
> thinking about the idea, and have modified Domas's suggestion slightly:

Most of this seems sane.

> later). Co_timestamp is the timestamp of generation of the row, added in
> case a fancy algorithm to detect old queries is made in the future.
>
>
>
> As to how to initially populate the table: well, a bulk initialization
> script is one option, but Domas didn't like it either. ;) :P What I

A bulk initialiser would be possible, but would still incur heavy
expense and isn't so easy to batch up.

> suggested is to populate the table on demand, and maintain it later, and
> after a night's sleep, I thought about how to refine it more:

The basic premise is sound, even if the implementation could be ironed
out a little. I'll sleep on it and mess about with it soon.

> There are probably more efficient ways to do this, but I just wanted to get
> the ball rolling... any comments? Should it be done?

P = "extension goes live"
Q = "extension is optimised"
R = "rob gets peace"

P --> Q and P --> R
¬Q --> ¬R

R, thus Q


Rob Church
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Request for comments: Schema change to add Special:Contributors

Rob Church
On 10/01/07, Rob Church <[hidden email]> wrote:
> P = "extension goes live"
> Q = "extension is optimised"
> R = "rob gets peace"
>
> P --> Q and P --> R
> ¬Q --> ¬R
>
> R, thus Q

That should have been R --> P, of course.


Rob Church
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Request for comments: Schema change to addSpecial:Contributors

Titoxd@Wikimedia
In reply to this post by Rob Church
Gah, pressed the wrong button.

Anyways, what I wanted to ask is, does this require a core change?

Titoxd.

-----Original Message-----
From: Rob Church [mailto:[hidden email]]
Sent: Tuesday, January 09, 2007 8:03 PM
To: MediaWiki announcements and site admin list
Subject: Re: [Mediawiki-l] Request for comments: Schema change to
addSpecial:Contributors
[snip]
>
>
> As to how to initially populate the table: well, a bulk initialization
> script is one option, but Domas didn't like it either. ;) :P What I

A bulk initialiser would be possible, but would still incur heavy
expense and isn't so easy to batch up.

> suggested is to populate the table on demand, and maintain it later, and
> after a night's sleep, I thought about how to refine it more:

The basic premise is sound, even if the implementation could be ironed
out a little. I'll sleep on it and mess about with it soon.

> There are probably more efficient ways to do this, but I just wanted to
get
> the ball rolling... any comments? Should it be done?

P = "extension goes live"
Q = "extension is optimised"
R = "rob gets peace"

P --> Q and P --> R
¬Q --> ¬R

R, thus Q


Rob Church


_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Request for comments: Schema change to addSpecial:Contributors

Rob Church
On 10/01/07, Titoxd@Wikimedia <[hidden email]> wrote:
> Gah, pressed the wrong button.
>
> Anyways, what I wanted to ask is, does this require a core change?

No.


Rob Church

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Request for comments: Schema change to add Special:Contributors

Domas Mituzas
In reply to this post by Rob Church
Hi!

> Domas loves to hate.


I hate just chronic whiners, who start calling you idiot after you misspell
their name or leave weeping after minor criticism, but to not limit
criticism they emit themselves. I do not have issues with other people ;-)

To stay on topic - we might want eventually to put covering index for all
contributions-based operations, but still, for this operation we may want to
keep a separate table (and do that for most queries, that need aggregation).
For different traversing methods covering indexes are way to go.

BR,
Domas
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Request for comments: Schema change to add Special:Contributors

Rob Church
On 13/01/07, Domas Mituzas <[hidden email]> wrote:
> I hate just chronic whiners, who start calling you idiot after you misspell
> their name or leave weeping after minor criticism, but to not limit
> criticism they emit themselves. I do not have issues with other people ;-)

I called you an idiot because I had run out of options - you were, in
fact, being remarkably insulting, which is amazingly difficult to do
or so I'm told.

I don't know where this "leave weeping" nonsense comes from, to be
frank - I chose to go on a break in August for my sanity and for the
sanity of others. It did me the world of good and I've pledged to keep
taking more time off this thing. We obviously have an irrevocably
broken working relationship, which is extremely sad, because I respect
your database knowledge, but I have to point out that it wasn't I who
cast the stone to break it.

> To stay on topic - we might want eventually to put covering index for all
> contributions-based operations, but still, for this operation we may want to
> keep a separate table (and do that for most queries, that need aggregation).
> For different traversing methods covering indexes are way to go.

I haven't questioned the necessity of optimising the extension and
agree that a separate table is probably the best approach.


Rob Church

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
jdd
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Request for comments: Schema change to add Special:Contributors

jdd
Rob Church wrote:
> On 13/01/07, Domas Mituzas <[hidden email]> wrote:
>> I hate just chronic whiners, who start calling you idiot after you misspell

> taking more time off this thing. We obviously have an irrevocably
> broken working relationship, which is extremely sad

well, folks, do you know it quite amusing to read such post
when you haven't any idea of what is the subject :-))

keep cool... MediaWiki is a nice peace of software that can
or can't be enhanced, good thing but not vital. nobody is
going to die just now :-))

have a nice day, all of you :-)

jdd


--
http://www.dodin.net
Votez pour nous, merci - vote for us, thanks :-)
http://musique.sfrjeunestalents.fr/artiste/Magic-Alliance/

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l