Help needed on fixing discrepancies in db schemas

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Help needed on fixing discrepancies in db schemas

Amir Ladsgroup
Hey,
As a pet project of mine, I have been trying to find and fix discrepancies
between the database schema in the codebase (like tables.sql in mediawiki
core) and production. I wrote a script to do a basic analysis and got the
results but it was so big that I needed to cluster them so now we have this
reports: https://phabricator.wikimedia.org/T104459#4314828

Some are hot fixes (like tmp_2 indexes) that probably need to be added to
the codebase (with proper name) or dropped from production which needs
investigation.

Some are unimplemented schema changes, like user_options column in user
table that was dropped in the code nine years ago but fell into the cracks
and never got fixed in production.

Some are changes that partially implemented like page_no_title_convert
field on page table that only appear in 32 hosts (around two third of the
hosts) and there is mention of adding it in HISTORY but I can't find any
mention of removing it neither its existence so I have no idea how to
proceed here.

The problem is that these cases need investigation before we can hand it
off to our DBAs. For example, rc_cur_time field in recentchanges table has
been dropped from the mediawiki core in 2014 but it exists in production
but it also still exists in oracle, potgres and mssql schemas but not in
mssql, sqlite or mysql schemas. In this case, we need to fix both
production and several schemas in mediawiki core.

Or text table on 21 hosts (around half of them) has an extra field called
inverse_timestamp which I can't find any mention of it in the code but
there used to be a field with this name in revision table that got dropped
in 2005 and I have no idea how to proceed here.

It would be great if you take a look the clustered reports [1] and make a
phabricator ticket for them and investigate how to proceed here. Thank you

[1]: https://phabricator.wikimedia.org/T104459#4310482

Best
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Help needed on fixing discrepancies in db schemas

Brad Jorsch (Anomie)
On Mon, Jul 9, 2018 at 1:10 PM, Amir Ladsgroup <[hidden email]> wrote:

> Some are changes that partially implemented like page_no_title_convert
> field on page table that only appear in 32 hosts (around two third of the
> hosts) and there is mention of adding it in HISTORY but I can't find any
> mention of removing it neither its existence so I have no idea how to
> proceed here.
>

Looks like it was added in r16524
<http://mediawiki.org/wiki/Special:Code/MediaWiki/16524> and reverted
(except for the release note entry) in r16526
<http://mediawiki.org/wiki/Special:Code/MediaWiki/16526>.


> Or text table on 21 hosts (around half of them) has an extra field called
> inverse_timestamp which I can't find any mention of it in the code but
> there used to be a field with this name in revision table that got dropped
> in 2005 and I have no idea how to proceed here.
>

Revisions used to be stored much like how images still are: there was the
"cur" table that had data about the current revision, including its actual
content, and the "old" table for previous revisions.

When all this was redone to have "page" and "revision" and "text" (r6710
<http://mediawiki.org/wiki/Special:Code/MediaWiki/6710>), the "old" table
was just renamed to "text" since most of the text was already in that
table. Which is why all the fields in text use "old_" as a prefix. It seems
that the extraneous columns weren't dropped on all wikis.


--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l