Quantcast

Provisional notes for proposed revision table restructuring

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Provisional notes for proposed revision table restructuring

Brion Vibber-4
I've got an early draft of some notes
<https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2>
for a restructuring of the revision table, to support the following:

* making the revision table itself smaller by breaking large things out
* reducing duplicate string storage for content model/format, username/IP
address, and edit comments
* multi-content revisions ("MCR") - multiple Content blobs of different
types on a page, revisioned consistently

There's also some ideas going around about using denormalized summary
tables more aggressively, perhaps changing where the indexes used for
specific uses live. For instance, a 'contribs' table with just the bits
needed for the index lookups for user-contribs, then joined to the other
tables.

Initial notes at
https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2
-- I'll be cleaning this up a bit more in response to feedback and concerns.

If we go through with this sort of change, we'll need to carefully consider
the upgrade transition. We'll also need to make sure that all relevant
queries are updated, and that folks using the databases indirectly (via
tool labs, etc) are all able to cleanly handle the new fun stuff. Feedback
will be crucial here. :)

Potentially we might split this into a couple transitions instead, or
otherwise make major changes to the plan. Nothing's set in stone yet!

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Provisional notes for proposed revision table restructuring

Brion Vibber-4
Whoops I forgot to mention in the list post -- we're planning to talk about
this topic in the public ArchCom IRC meeting this Wednesday (21:00 UTC /
2pm PDT).

Already getting good feedback on the page, am updating it, and looking
forward to more.... Thanks all. :)

-- brion

On Mon, Feb 13, 2017 at 9:28 AM, Brion Vibber <[hidden email]> wrote:

> I've got an early draft of some notes
> <https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2>
> for a restructuring of the revision table, to support the following:
>
> * making the revision table itself smaller by breaking large things out
> * reducing duplicate string storage for content model/format, username/IP
> address, and edit comments
> * multi-content revisions ("MCR") - multiple Content blobs of different
> types on a page, revisioned consistently
>
> There's also some ideas going around about using denormalized summary
> tables more aggressively, perhaps changing where the indexes used for
> specific uses live. For instance, a 'contribs' table with just the bits
> needed for the index lookups for user-contribs, then joined to the other
> tables.
>
> Initial notes at https://www.mediawiki.org/wiki/User:Brion_VIBBER/
> Compacting_the_revision_table_round_2 -- I'll be cleaning this up a bit
> more in response to feedback and concerns.
>
> If we go through with this sort of change, we'll need to carefully
> consider the upgrade transition. We'll also need to make sure that all
> relevant queries are updated, and that folks using the databases indirectly
> (via tool labs, etc) are all able to cleanly handle the new fun stuff.
> Feedback will be crucial here. :)
>
> Potentially we might split this into a couple transitions instead, or
> otherwise make major changes to the plan. Nothing's set in stone yet!
>
> -- brion
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Provisional notes for proposed revision table restructuring

Brion Vibber-4
aaaaand that'll be in #wikimedia-office on irc.freenode.net. :)

-- brion

On Tue, Feb 14, 2017 at 10:38 AM, Brion Vibber <[hidden email]>
wrote:

> Whoops I forgot to mention in the list post -- we're planning to talk
> about this topic in the public ArchCom IRC meeting this Wednesday (21:00
> UTC / 2pm PDT).
>
> Already getting good feedback on the page, am updating it, and looking
> forward to more.... Thanks all. :)
>
> -- brion
>
> On Mon, Feb 13, 2017 at 9:28 AM, Brion Vibber <[hidden email]>
> wrote:
>
>> I've got an early draft of some notes
>> <https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2>
>> for a restructuring of the revision table, to support the following:
>>
>> * making the revision table itself smaller by breaking large things out
>> * reducing duplicate string storage for content model/format, username/IP
>> address, and edit comments
>> * multi-content revisions ("MCR") - multiple Content blobs of different
>> types on a page, revisioned consistently
>>
>> There's also some ideas going around about using denormalized summary
>> tables more aggressively, perhaps changing where the indexes used for
>> specific uses live. For instance, a 'contribs' table with just the bits
>> needed for the index lookups for user-contribs, then joined to the other
>> tables.
>>
>> Initial notes at https://www.mediawiki.org/w
>> iki/User:Brion_VIBBER/Compacting_the_revision_table_round_2 -- I'll be
>> cleaning this up a bit more in response to feedback and concerns.
>>
>> If we go through with this sort of change, we'll need to carefully
>> consider the upgrade transition. We'll also need to make sure that all
>> relevant queries are updated, and that folks using the databases indirectly
>> (via tool labs, etc) are all able to cleanly handle the new fun stuff.
>> Feedback will be crucial here. :)
>>
>> Potentially we might split this into a couple transitions instead, or
>> otherwise make major changes to the plan. Nothing's set in stone yet!
>>
>> -- brion
>>
>
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Provisional notes for proposed revision table restructuring

Brion Vibber-4
In reply to this post by Brion Vibber-4
Correction: 22:00 UTC / 2pm PST in #wikimedia-office. Sorry, I calculated
with the wrong time by mistake!

-- brion

On Tue, Feb 14, 2017 at 10:38 AM, Brion Vibber <[hidden email]>
wrote:

> Whoops I forgot to mention in the list post -- we're planning to talk
> about this topic in the public ArchCom IRC meeting this Wednesday (21:00
> UTC / 2pm PDT).
>
> Already getting good feedback on the page, am updating it, and looking
> forward to more.... Thanks all. :)
>
> -- brion
>
> On Mon, Feb 13, 2017 at 9:28 AM, Brion Vibber <[hidden email]>
> wrote:
>
>> I've got an early draft of some notes
>> <https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2>
>> for a restructuring of the revision table, to support the following:
>>
>> * making the revision table itself smaller by breaking large things out
>> * reducing duplicate string storage for content model/format, username/IP
>> address, and edit comments
>> * multi-content revisions ("MCR") - multiple Content blobs of different
>> types on a page, revisioned consistently
>>
>> There's also some ideas going around about using denormalized summary
>> tables more aggressively, perhaps changing where the indexes used for
>> specific uses live. For instance, a 'contribs' table with just the bits
>> needed for the index lookups for user-contribs, then joined to the other
>> tables.
>>
>> Initial notes at https://www.mediawiki.org/w
>> iki/User:Brion_VIBBER/Compacting_the_revision_table_round_2 -- I'll be
>> cleaning this up a bit more in response to feedback and concerns.
>>
>> If we go through with this sort of change, we'll need to carefully
>> consider the upgrade transition. We'll also need to make sure that all
>> relevant queries are updated, and that folks using the databases indirectly
>> (via tool labs, etc) are all able to cleanly handle the new fun stuff.
>> Feedback will be crucial here. :)
>>
>> Potentially we might split this into a couple transitions instead, or
>> otherwise make major changes to the plan. Nothing's set in stone yet!
>>
>> -- brion
>>
>
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Provisional notes for proposed revision table restructuring

Brion Vibber-4
Great feedback everybody -- I'll make more updates and we'll circle back
for another discussion in a week or two!

Meeting summary (full logs linked from there):
https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-02-15-22.01.html

-- brion

On Wed, Feb 15, 2017 at 9:06 PM, Brion Vibber <[hidden email]> wrote:

> Correction: 22:00 UTC / 2pm PST in #wikimedia-office. Sorry, I calculated
> with the wrong time by mistake!
>
> -- brion
>
> On Tue, Feb 14, 2017 at 10:38 AM, Brion Vibber <[hidden email]>
> wrote:
>
>> Whoops I forgot to mention in the list post -- we're planning to talk
>> about this topic in the public ArchCom IRC meeting this Wednesday (21:00
>> UTC / 2pm PDT).
>>
>> Already getting good feedback on the page, am updating it, and looking
>> forward to more.... Thanks all. :)
>>
>> -- brion
>>
>> On Mon, Feb 13, 2017 at 9:28 AM, Brion Vibber <[hidden email]>
>> wrote:
>>
>>> I've got an early draft of some notes
>>> <https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2>
>>> for a restructuring of the revision table, to support the following:
>>>
>>> * making the revision table itself smaller by breaking large things out
>>> * reducing duplicate string storage for content model/format,
>>> username/IP address, and edit comments
>>> * multi-content revisions ("MCR") - multiple Content blobs of different
>>> types on a page, revisioned consistently
>>>
>>> There's also some ideas going around about using denormalized summary
>>> tables more aggressively, perhaps changing where the indexes used for
>>> specific uses live. For instance, a 'contribs' table with just the bits
>>> needed for the index lookups for user-contribs, then joined to the other
>>> tables.
>>>
>>> Initial notes at https://www.mediawiki.org/w
>>> iki/User:Brion_VIBBER/Compacting_the_revision_table_round_2 -- I'll be
>>> cleaning this up a bit more in response to feedback and concerns.
>>>
>>> If we go through with this sort of change, we'll need to carefully
>>> consider the upgrade transition. We'll also need to make sure that all
>>> relevant queries are updated, and that folks using the databases indirectly
>>> (via tool labs, etc) are all able to cleanly handle the new fun stuff.
>>> Feedback will be crucial here. :)
>>>
>>> Potentially we might split this into a couple transitions instead, or
>>> otherwise make major changes to the plan. Nothing's set in stone yet!
>>>
>>> -- brion
>>>
>>
>>
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Provisional notes for proposed revision table restructuring

Brion Vibber-4
On Wed, Feb 15, 2017 at 3:06 PM, Brion Vibber <[hidden email]> wrote:

> Great feedback everybody -- I'll make more updates and we'll circle back
> for another discussion in a week or two!
>
> Meeting summary (full logs linked from there):
> https://tools.wmflabs.org/meetbot/wikimedia-office/2017/
> wikimedia-office.2017-02-15-22.01.html
>

We're going to have another checkin during ArchCom IRC meeting time this
Wednesday, 22:00 UTC / 2pm PST in #wikimedia-office

Documents will be updated shortly reflecting the previous discussion &
ongoing tweaks.

Open questions include:
* should we go straight to the MCR-ready schema or do this in two steps,
one to break up tables & prep, and another for the MCR content model?
* final model for updating archive & text

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Provisional notes for proposed revision table restructuring

MZMcBride-2
Brion Vibber wrote:

>We're going to have another checkin during ArchCom IRC meeting time this
>Wednesday, 22:00 UTC / 2pm PST in #wikimedia-office
>
>Documents will be updated shortly reflecting the previous discussion &
>ongoing tweaks.
>
>Open questions include:
>* should we go straight to the MCR-ready schema or do this in two steps,
>one to break up tables & prep, and another for the MCR content model?
>* final model for updating archive & text

Re: https://www.mediawiki.org/wiki/?curid=661038

The implementation path isn't clear to me. For a "regular" MediaWiki
installation, will making these changes be a matter of simply updating
MediaWiki's application code and running maintenance/update.php?

For Wikimedia wikis, as far as I know update.php is never run. Are you
planning to write separate maintenance scripts for this?

Regarding scope, this is a lot of changes. How are all of these changes
intended to be divided? Are we able to move forward with some changes
(e.g., adding a comment table) without moving forward with other changes
(e.g., adding a user_entry table)? Some parts of this proposal seem to be
well-received and popular (yay). Other parts, particularly dealing with
users, seem to be hairier and less settled.

MZMcBride



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Provisional notes for proposed revision table restructuring

James Forrester-4
On Mon, 6 Mar 2017 at 16:52 MZMcBride <[hidden email]> wrote:

> For a "regular" MediaWiki installation, will making these changes be a
> matter of simply updating MediaWiki's application code and running
> maintenance/update.php?
>

Yes.


> For Wikimedia wikis, as far as I know update.php is never run.


Correct; it'd take down the cluster.


Are you planning to write separate maintenance scripts for this?
>

Yes.

As is "normal" with schema changes, in Wikimedia production this will be done
manually by the DBAs <https://wikitech.wikimedia.org/wiki/Schema_changes>.
It is a careful, very slow process that manages the otherwise-impossible.
It will take months of their time, is seriously laborious, and blocks any
other such changes. A recent user-facing example is T69223
<https://phabricator.wikimedia.org/T69223>, which was required to support
translation from non-English languages on multi-content wikis. This is why
the DBAs' views are so important. :-)

Once the schema change is done, we may/will back-fill old rows to populate
the new schema, using maintenance scripts for each wiki. However, given
that the table we're talking about is revision with over three quarters of
a billion rows on enwiki alone, that will be exceptionally slow-running.

Once all *that* is done, we could do a further schema change to drop the
old bits of the schema that are no longer used (again, slow), and then drop
the backwards-compatible database code from MediaWiki. But that's optional.


Regarding scope, this is a lot of changes. How are all of these changes
> intended to be divided? Are we able to move forward with some changes
> (e.g., adding a comment table) without moving forward with other changes
> (e.g., adding a user_entry table)?


Yes, but given that this round will take years to complete, deciding to
delay some of the things means upsetting a lot of plans.

J.
--

James D. Forrester
Lead Product Manager, Editing
Wikimedia Foundation, Inc.
jforrester at wikimedia.org
<https://lists.wikimedia.org/mailman/listinfo/wikimedia-l> |
@jdforrester
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Provisional notes for proposed revision table restructuring

Brion Vibber-4
In reply to this post by Brion Vibber-4
Summary from March 8 irc meeting:
https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-03-08-22.03.html

-- brion

On Mon, Mar 6, 2017 at 9:43 AM, Brion Vibber <[hidden email]> wrote:

> On Wed, Feb 15, 2017 at 3:06 PM, Brion Vibber <[hidden email]>
> wrote:
>
>> Great feedback everybody -- I'll make more updates and we'll circle back
>> for another discussion in a week or two!
>>
>> Meeting summary (full logs linked from there):
>> https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wiki
>> media-office.2017-02-15-22.01.html
>>
>
> We're going to have another checkin during ArchCom IRC meeting time this
> Wednesday, 22:00 UTC / 2pm PST in #wikimedia-office
>
> Documents will be updated shortly reflecting the previous discussion &
> ongoing tweaks.
>
> Open questions include:
> * should we go straight to the MCR-ready schema or do this in two steps,
> one to break up tables & prep, and another for the MCR content model?
> * final model for updating archive & text
>
> -- brion
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Loading...