Unclear Meaning of $baseRevId in WikiPage::doEditContent

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Unclear Meaning of $baseRevId in WikiPage::doEditContent

Daniel Kinzler
Hi all.

We (the Wikidata team) ran into an issue recently with the value that gets
passed as $baseRevId to Content::prepareSave(), see Bug 67831 [1]. This comes
from WikiPage::doEditContent(), and, for core, is nearly always set to "false"
(e.g. by EditPage).

We interpreted this rev ID to be the revision that is the nominal base revision
of the edit, and implemented an edit conflict check based on it. Which works
with the way we use doEditContent() for wikibase on wikidata, and with most
stuff in core (which generally has $baseRevId = false). But as it turns out, it
does not work with rollbacks: WikiPage::commitRollback sets $baseRevId to the ID
of the revision we revert *to*.

Now, is that correct, or is it a bug? What does "base revision" mean?

The documentation of WikiPage::doEditContent() is unclear about this (yes, I
wrote this method when introducing the Content class - but I copied the
interface WikiPage::doEdit(), and mostly kept the code as it was). And in the
code, $baseRevId is not used at all except for passing it to hooks and to
Content::prepareSave - which doesn't do anything with it for any of the Content
implementations in core - only in Wikibase we tried to implement a conflict
check here, which should really be in WikiPage, I think.

So, what *does* $baseRevId mean? If you happen to know when and why $baseRevId
was introduced, please enlighten me. I can think of three possibilities:

1) It's the edit's reference revision, used to detect edit conflicts (this is
how we use this in Wikibase). That is, an edit is done with respect to a
specific revision, and that revision is passed back to WikiPage when saving, so
a check for edit conflicts can be done as close to the actual edit as possible
(ideally, in the same DB transaction). Compare bug 56849 [2].

2) The edit's "physical parent": that would be the same as (1), unless there is
a conflict that was detected early and automatic resolved by rebasing the edit.
E.g. if an edit is performed based on revision 11, but revision 12 was added
since, and the edit was successfully rebased, the "parent" would be 12, not 11.
This is what WikiPage::doEditContent() calls $oldid, and what gets saved in
rev_parent_id. Since WikiPage::doEditContent() makes the distinction between
$oldid and $baseRevId, this is probably not what  $baseRevId was intended to be.

3) It could be the "logical parent": this would be identical to (2), except for
a rollback: if I revert revision 15 and 14 back to revision 13, the new
revision's logical parent would be rev 13's parent. The idea is that you are
restoring rev 13 as it was, with the same parent rev 13 had. Something like this
seems to be the intention of what commitRollback() currently does, but the way
it is now, the new revision would have rev 13 as its logical parent (which, for
a rollback, would have identical content).

So at present, what commitRollback currently does is none of the above, and I
can't see how what it does makes sense.

I suggest we fix it, define $baseRevId to mean what I explained under (1), and
implement a "late" conflict check right in the DB transaction that updates the
revision (or page) table. This might confuse some extensions though, we should
double check AbuseFilter, if nothing else.

Is that a good approach? Please let me know.

-- daniel

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=65831
[2] https://bugzilla.wikimedia.org/show_bug.cgi?id=56849

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unclear Meaning of $baseRevId in WikiPage::doEditContent

Brad Jorsch (Anomie)
On Wed, May 28, 2014 at 7:25 AM, Daniel Kinzler <[hidden email]>wrote:

> If you happen to know when and why $baseRevId
> was introduced, please enlighten me.


"When" is easy enough:
http://mediawiki.org/wiki/Special:Code/MediaWiki/34987

If I had to guess at the "why", I'd guess it was probably for FlaggedRevs
and auto-reviewing.


--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unclear Meaning of $baseRevId in WikiPage::doEditContent

Aaron Schulz
Yes it was for auto-reviewing new revisions. New revisions are seen as a combination of (base revision, changes). If the base revision was reviewed and the user is trusted, then so is the new revision. MW core had the obvious cases of rollback and null edits, which are (base revision, no changes). Their is a lot more "base revision" detection in FlaggedRevs for the remaining cases, some less obvious (user supplied baseRevId, X-top edit undo, fall back to prior edit).

If baseRevId is always set to the revision the user started from it would cause problems for that extension for the cases where it was previously false.

It would indeed be useful to have a casRevId value that was the current revision at the time of editing just for CAS style conflict detection.
Reply | Threaded
Open this post in threaded view
|

Re: Unclear Meaning of $baseRevId in WikiPage::doEditContent

Daniel Kinzler
Am 29.05.2014 21:07, schrieb Aaron Schulz:
> Yes it was for auto-reviewing new revisions. New revisions are seen as a
> combination of (base revision, changes).

But EditPage in core sets $baseRevId to false. The info isn't there for the
"standard" case. In fact, the ONLY thing in core that sets it to anything but
false is commitRollback() , and that sets it to a value that to me doesn't make
much sense to me - the revision we revert to, instead of either the revision we
revert *from* (base/physical parent), or at least the *parent* of the revision
we revert to (logical parent).

Also, if you want (base revision, changes), you would use $oldid in
doEditContent, not $baseRevId. Perhaps it's just WRONG to pass $baseRevId to the
hooks called by doEditCOntent, and it should have been $oldid all along? $oldid
is what you need if you want to diff against the previous revision - so
presumably, that's NOT what $baseRevId is.

> If baseRevId is always set to the revision the user started from it would
> cause problems for that extension for the cases where it was previously
> false.

"false" means "don't check", I suppose - or "there is no base", but that could
be identified by the EDIT_NEW flag.

I'm not proposing to change the cases where baseRevId is false. They can stay as
they are. I'm proposing to set baseRevId to the revision the user started with,
OR false, so we can detect conflicts safely & sanely.

> It would indeed be useful to have a casRevId value that was the current
> revision at the time of editing just for CAS style conflict detection.

Indeed - but changing the method signature would be painful, and the existing
$baseRevId parameter does not seem to be used at all - or at least, it's used in
such an inconsistent way as to be useless, of not misleading and harmful.

For now, I propose to just have commitRollback call doEditContent with
$baseRevId = false, like the rest of core does. Since core itself doesn't use
this value anywhere, and sets it to false everywhere, that seems consistent. We
could then just clarify the documentation. This way, Wikibase could use the
$baseRevId value for conflict detection - actually, core could, and should, do
just that in doEditContent; this wouldn't do anything in core until the
$baseRevId is supplied at least by EditPage.

Of course, we need to check FlaggedRevs and other extensions, but seeing how
this argument is essentially unused, I can't imagine how this change could break
anything for extensions.

-- daniel



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unclear Meaning of $baseRevId in WikiPage::doEditContent

Brad Jorsch (Anomie)
On Fri, May 30, 2014 at 4:06 AM, Daniel Kinzler <[hidden email]>
wrote:

> Am 29.05.2014 21:07, schrieb Aaron Schulz:
> > Yes it was for auto-reviewing new revisions. New revisions are seen as a
> > combination of (base revision, changes).
>
> But EditPage in core sets $baseRevId to false. The info isn't there for the
> "standard" case. In fact, the ONLY thing in core that sets it to anything
> but false is commitRollback() , and that sets it to a value that to me
> doesn't make much sense to me - the revision we revert to, instead of
> either the revision we revert *from* (base/physical parent), or at least
> the *parent* of the revision we revert to (logical parent).
>

I think you need to look again into how FlaggedRevs uses it, without the
preconceptions you're bringing in from the way you first interpreted the
name of the variable. The current behavior makes perfect sense for that
specific use case. Neither of your proposals would work for FlaggedRevs.

As for the EditPage code path, note that it has already done edit conflict
resolution so "base revision = current revision of the page". Which is
probably the intended meaning of false.


> Of course, we need to check FlaggedRevs and other extensions, but seeing
> how this argument is essentially unused, I can't imagine how this change
> could break anything for extensions.
>

Except FlaggedRevs.


--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unclear Meaning of $baseRevId in WikiPage::doEditContent

Aaron Schulz
In reply to this post by Daniel Kinzler
FlaggedRevs uses the NewRevisionFromEditComplete hook. Grepping for that, I see reasonable values set in the callers at a quick glance. This cover various null edit scenarios too. The $baseRevId in WikiPage is just one of the cases of that value passed to the hook, and is fine there (being mostly false). "false" indeed means "not determined" and that behavior is needed for the hook values. The values given in that hook variable make sense and are more or less consistent.

As I said before, if the NewRevisionFromEditComplete hook is given the same base revision ID values for all cases, then I don't care to much what happens to the $baseRevId value semantics in doEditContent(). As long as everything is changed to keep that part consistent then it won't effect anything. However, just naively change the $baseRevId values for the non-false cases will break the extension using it.

As as side note, FlaggedRevs doesn't just end up using $oldid. It only uses that as the last resort after picking other values in difference scenarios it detects.
Reply | Threaded
Open this post in threaded view
|

Re: Unclear Meaning of $baseRevId in WikiPage::doEditContent

Daniel Kinzler
In reply to this post by Brad Jorsch (Anomie)
Am 30.05.2014 15:38, schrieb Brad Jorsch (Anomie):
> I think you need to look again into how FlaggedRevs uses it, without the
> preconceptions you're bringing in from the way you first interpreted the
> name of the variable. The current behavior makes perfect sense for that
> specific use case. Neither of your proposals would work for FlaggedRevs.

As far as I understand the rather complex FlaggedRevs.hooks.php code, it assumes
that

a) if $newRev === $baseRevId, it's a null edit. As far as I can see, this does
not work, since $baseRevId will be null for a null edit (and all other regular
edits).

b) if $newRev !== $baseRevId but the new rev's hash is the same as the base
rev's hash, it's a rollback. This works with the current implementation of
commitRollback(), but does not for manual reverts or trivial undos.

So, FlaggedRevs assumes that EditPage resp WikiPage set $baseRevId to the edits
logical parent (basically, the revision the user loaded when starting to edit).
That's what I described as option (3) in my earlier mail, except for the
rollback case; It would be fined with me to use the target rev as the base for
rollbacks, as is currently done.

FlaggedRevs.hooks.php also injects a baseRevId form field and uses it in some
cases, adding to the confusion.

In order to handle manual reverts and null edits consistently, EditPage should
probably have a base revision as a form field, and pass it on to doEditContent.
As far as I can tell, this would work with the current code in FlaggedRevs.

> As for the EditPage code path, note that it has already done edit conflict
> resolution so "base revision = current revision of the page". Which is
> probably the intended meaning of false.

Right. If that's the case though, WikiPage::doEditContent should probably set
$baseRevId = $oldid, before passing it to the hooks.

Without changing core, it seems that there is no way to implement a late/strict
conflict check based on the base rev id. That would need an additional "anchor
revision" for checking.

The easiest solution for the current situation is to simply drop the strict
conflict check in Wikibase and accept a race condition that may cause a revision
to be silently overwritten, as is currently the case in core.

-- daniel


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unclear Meaning of $baseRevId in WikiPage::doEditContent

Adam Wight-2
It looks like we should leave the existing hook parameters values alone for
the moment, but it would improve the situation if we renamed variables
which seem to be overloaded or unclear, in MediaWiki core and in
FlaggedRevs.  What do you think of the following conventions,

'oldid' (index.php parameter) -- keep this name only to preserve interface
compatibility.  This refers to a historical revision when used in the
action=view case, and to the latest revision ID of the page at the time an
edit session begins.

$oldid -- keep as-is in the action=view codepath, rename to $parentRevId in
action=edit

$parentRevId -- latest available revision ID at the time an edit session
begins.  Used to detect conflicts, and identify the parent revision record
upon save.  This is updated during successful automatic rebase.  I don't
see a good use case for preserving what Daniel calls the "reference
revision," the parentRevId before rebase.

$baseRevId and $baseId -- rename everywhere to $contentsRevId, but
examining the code contexts for the smell of confounding with $parentRevId.

$contentsRevId -- revision ID of the source text to copy when performing
undo or rollback.  We will probably want to supplement hooks that only
passed $contentsRevId, such as NewRevisionFromEditComplete, with
$parentRevId as an additional parameter.

A refactor along these lines would keep me from losing already scant
marbles as I attempt to fix related issues in core:
https://gerrit.wikimedia.org/r/#/c/94584/ , I see now that I've already
begun to introduce mistakes caused by the difficult common-sense
interpretation of current variable naming.

-Adam


On Mon, Jun 2, 2014 at 1:22 AM, Daniel Kinzler <[hidden email]> wrote:

> Am 30.05.2014 15:38, schrieb Brad Jorsch (Anomie):
> > I think you need to look again into how FlaggedRevs uses it, without the
> > preconceptions you're bringing in from the way you first interpreted the
> > name of the variable. The current behavior makes perfect sense for that
> > specific use case. Neither of your proposals would work for FlaggedRevs.
>
> As far as I understand the rather complex FlaggedRevs.hooks.php code, it
> assumes
> that
>
> a) if $newRev === $baseRevId, it's a null edit. As far as I can see, this
> does
> not work, since $baseRevId will be null for a null edit (and all other
> regular
> edits).
>
> b) if $newRev !== $baseRevId but the new rev's hash is the same as the base
> rev's hash, it's a rollback. This works with the current implementation of
> commitRollback(), but does not for manual reverts or trivial undos.
>
> So, FlaggedRevs assumes that EditPage resp WikiPage set $baseRevId to the
> edits
> logical parent (basically, the revision the user loaded when starting to
> edit).
> That's what I described as option (3) in my earlier mail, except for the
> rollback case; It would be fined with me to use the target rev as the base
> for
> rollbacks, as is currently done.
>
> FlaggedRevs.hooks.php also injects a baseRevId form field and uses it in
> some
> cases, adding to the confusion.
>
> In order to handle manual reverts and null edits consistently, EditPage
> should
> probably have a base revision as a form field, and pass it on to
> doEditContent.
> As far as I can tell, this would work with the current code in FlaggedRevs.
>
> > As for the EditPage code path, note that it has already done edit
> conflict
> > resolution so "base revision = current revision of the page". Which is
> > probably the intended meaning of false.
>
> Right. If that's the case though, WikiPage::doEditContent should probably
> set
> $baseRevId = $oldid, before passing it to the hooks.
>
> Without changing core, it seems that there is no way to implement a
> late/strict
> conflict check based on the base rev id. That would need an additional
> "anchor
> revision" for checking.
>
> The easiest solution for the current situation is to simply drop the strict
> conflict check in Wikibase and accept a race condition that may cause a
> revision
> to be silently overwritten, as is currently the case in core.
>
> -- daniel
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unclear Meaning of $baseRevId in WikiPage::doEditContent

Aaron Schulz
I suppose that naming scheme is reasonable.

$contentsRevId sounds awkward, maybe $sourceRevId or $originRevId is better.
Reply | Threaded
Open this post in threaded view
|

Re: Unclear Meaning of $baseRevId in WikiPage::doEditContent

Adam Wight-2
On Fri, Jun 6, 2014 at 4:08 PM, Aaron Schulz <[hidden email]> wrote:

> I suppose that naming scheme is reasonable.
>
> $contentsRevId sounds awkward, maybe $sourceRevId or $originRevId is
> better.
>

What about "rollbackRevId"?  I want the variable name to make its purpose
very clear.

-Adam


>
>
> --
> View this message in context:
> http://wikimedia.7.x6.nabble.com/Unclear-Meaning-of-baseRevId-in-WikiPage-doEditContent-tp5028661p5029674.html
> Sent from the Wikipedia Developers mailing list archive at Nabble.com.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l