Gerrit Commit Wars

classic Classic list List threaded Threaded
98 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Gerrit Commit Wars

Tyler Romeo
Hi everybody,

I cannot believe I have to say something about this, but I guess it's no
surprise.

Wikipedia has a notorious policy against edit warring, where users are
encouraged to discuss changes and achieve consensus before blindly
reverting. This applies even more so to Gerrit, since changes to software
have a lot bigger effect.

Here's a nice example:
https://gerrit.wikimedia.org/r/114400
https://gerrit.wikimedia.org/r/117234
https://gerrit.wikimedia.org/r/117247

Some key points to note here:
* The revert commit was not linked to on the original commit
* The time between the revert patch being uploaded and +2ed was a mere two
minutes
* All the reviewers on the revert patch were also reviewers on the original
patch

This is unacceptable behavior, and is extremely disrespectful to the
developers here. If you are going to revert a patch for reasons other than
a blatant code review issue (such as a fatal error or the likes), you
should *at the very least* give the original patch reviewers time to
understand why the patch is being reverted and give their input on the
matter. Otherwise it defeats the entire point of the code review process
and Gerrit in the first place.

The argument being made in this specific case is that the change broke the
workflow of mobile, and that the revert was announced on mobile-l. This is
not sufficient for a number of reasons:

1) not everybody is subscribed to mobile-l, so you cannot expect the
original reviewers to see or know about it
2) this is an issue with MobileFrontend, not MediaWiki core
3) code being merged does not automatically cause a deployment, and if code
being deployed breaks something in production, it is the operations team's
job to undeploy that change

Overall, the lesson to take away here is to be more communicative with
other developers, especially when you are negating their changes or
decisions.

Thanks in advance,
*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Steven Walling
On Thu, Mar 6, 2014 at 8:29 PM, Tyler Romeo <[hidden email]> wrote:

> 1) not everybody is subscribed to mobile-l, so you cannot expect the
> original reviewers to see or know about it
> 2) this is an issue with MobileFrontend, not MediaWiki core
> 3) code being merged does not automatically cause a deployment, and if code
> being deployed breaks something in production, it is the operations team's
> job to undeploy that change
>

If your patch causes a serious UX regression like this, it's going to get
reverted. The core patch involved was being deployed to Wikimedia sites /
impacting MobileFrontEnd users today. If we had more time in the deployment
cycle to wait and the revert was a simple disagreement, then waiting would
be appropriate. It is obvious in this case no one tested the core change on
mobile. That's unacceptable.

And yes, Jon should have made sure the revert and the original patch were
cross-referenced. I'm sure he'll do that next time he commits a revert.

Steven
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Chris McMahon
In reply to this post by Tyler Romeo
In over two years at WMF I have never been involved in a discussion like
this, but here goes:

In this case, I think it was entirely appropriate to revert immediately and
pick up the pieces later.  The source of the code is immaterial, if Tim
Starling  or Brion Vibber had merged this we would have done exactly the
same thing.

As Steven noted, the immediate issue was that it created a serious problem
with the mobile account creation process.  This blocked our ability to test
other aspects of mobile account creation and login that have changed
recently.  And, since this occurred on Thursday morning in the run-up to
the weekly deployment, we had little time to prevent this going live to
production.

Beyond that, there are serious concerns with any feature that a) requires
javascript support in the client in order to create an account on the wiki
and b) does not honor the characters that the user types in the username
and password fields.  I know of at least one historical instance where
violating b) caused a significant problem in UniversalLanguageSelector.
We prevented the ULS problem from going live to production at the time,
also.

-Chris




On Thu, Mar 6, 2014 at 1:29 PM, Tyler Romeo <[hidden email]> wrote:

> Hi everybody,
>
> I cannot believe I have to say something about this, but I guess it's no
> surprise.
>
> Wikipedia has a notorious policy against edit warring, where users are
> encouraged to discuss changes and achieve consensus before blindly
> reverting. This applies even more so to Gerrit, since changes to software
> have a lot bigger effect.
>
> Here's a nice example:
> https://gerrit.wikimedia.org/r/114400
> https://gerrit.wikimedia.org/r/117234
> https://gerrit.wikimedia.org/r/117247
>
> Some key points to note here:
> * The revert commit was not linked to on the original commit
> * The time between the revert patch being uploaded and +2ed was a mere two
> minutes
> * All the reviewers on the revert patch were also reviewers on the original
> patch
>
> This is unacceptable behavior, and is extremely disrespectful to the
> developers here. If you are going to revert a patch for reasons other than
> a blatant code review issue (such as a fatal error or the likes), you
> should *at the very least* give the original patch reviewers time to
> understand why the patch is being reverted and give their input on the
> matter. Otherwise it defeats the entire point of the code review process
> and Gerrit in the first place.
>
> The argument being made in this specific case is that the change broke the
> workflow of mobile, and that the revert was announced on mobile-l. This is
> not sufficient for a number of reasons:
>
> 1) not everybody is subscribed to mobile-l, so you cannot expect the
> original reviewers to see or know about it
> 2) this is an issue with MobileFrontend, not MediaWiki core
> 3) code being merged does not automatically cause a deployment, and if code
> being deployed breaks something in production, it is the operations team's
> job to undeploy that change
>
> Overall, the lesson to take away here is to be more communicative with
> other developers, especially when you are negating their changes or
> decisions.
>
> Thanks in advance,
> *-- *
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2016
> Major in Computer Science
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Tyler Romeo
In reply to this post by Steven Walling
On Thu, Mar 6, 2014 at 4:15 PM, Steven Walling <[hidden email]>wrote:

> If your patch causes a serious UX regression like this, it's going to get
> reverted. The core patch involved was being deployed to Wikimedia sites /
> impacting MobileFrontEnd users today. If we had more time in the deployment
> cycle to wait and the revert was a simple disagreement, then waiting would
> be appropriate. It is obvious in this case no one tested the core change on
> mobile. That's unacceptable.
>

You quoted my email, but didn't seem to read it. Changes to MediaWiki core
should not have to take into account extensions that incorrectly rely on
its interface, and a breakage in a deployed extension should result in an
undeployment and a fix to that extension, not a revert of the core patch.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Benjamin Lees
In reply to this post by Tyler Romeo
Special:Code automatically generated a list of followup revisions for each
revision (based on linking/mentioning).  It would be nice to have that in
Gerrit.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Erik Bernhardson
In reply to this post by Tyler Romeo
On Thu, Mar 6, 2014 at 2:08 PM, Tyler Romeo <[hidden email]> wrote:

> On Thu, Mar 6, 2014 at 4:15 PM, Steven Walling <[hidden email]
> >wrote:
>
> > If your patch causes a serious UX regression like this, it's going to get
> > reverted. The core patch involved was being deployed to Wikimedia sites /
> > impacting MobileFrontEnd users today. If we had more time in the
> deployment
> > cycle to wait and the revert was a simple disagreement, then waiting
> would
> > be appropriate. It is obvious in this case no one tested the core change
> on
> > mobile. That's unacceptable.
> >
>
> You quoted my email, but didn't seem to read it. Changes to MediaWiki core
> should not have to take into account extensions that incorrectly rely on
> its interface, and a breakage in a deployed extension should result in an
> undeployment and a fix to that extension, not a revert of the core patch.
>
>
I don't think core is in any way special here.  It doesn't matter what
broke what, the whole is much more important than the individual parts.  If
the patch to core is what broke things reverting it is the appropriate
course of action.

*-- *
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2016
> Major in Computer Science
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Jon Robson
In reply to this post by Tyler Romeo
Communication in the wikiverse is hard.

To clarify, this is _not_ an issue with MobileFrontend. The same
problem effects users without JavaScript. There was a fundamental
problem with this patch that sadly didn't get caught during code
review. It broke the workflow of mobile on an important page in
production which is a bad thing. On a side note it saddens me that
mobile gets very little attention during code review on essential
parts of our infrastructure. If anyone has any ideas on how this can
be remedied please let me know.

Moan about the deployment train:
The code was merged at the final hour before a deployment train (this
is another issue that our deployment train doesn't distinguish between
patches that have been sitting on master for a week with patches that
have been sitting there for 1 hour). Had this been merged on a
Thursday morning we would have had more luxury and a revert maybe
could have been avoided (but I still don't think that patch was in a
mergeable format).

In answer to a few statements you made...

"Wikipedia has a notorious policy against edit warring, where users
are encouraged to discuss changes and achieve consensus..."
Agreed but that consensus should also be achieved during review. It
seems during the code review process [1] there was an open concerns
that had been raised and a -1 from Steven that was unaddressed. In
this case we have the luxury to discuss this more and explore problems
and in my opinion it was not worthy of a rushed merge. Yes we can't
please everyone but it would have been good to get more people
involved in this conversation.

"not everybody is subscribed to mobile-l, so you cannot expect the
original reviewers to see or know about it"
Yes, and posts to wikimedia-l go straight to my archive, so I usually
miss them so I wasn't aware of this mail until someone pointed me to
it. Communicating so everyone gets a message is hard :-).

That said I did screw up here though in that I didn't comment on the
patchset with a link to the mobile-l mailing list.  In fact I started
to and then got distracted by a conversation and forgot to hit save. I
Will be more careful in future. All conversations about code should
start in code and I'm sorry I didn't adhere to this rule this time.

[1] https://gerrit.wikimedia.org/r/#/c/114400/

On Thu, Mar 6, 2014 at 2:08 PM, Tyler Romeo <[hidden email]> wrote:

> On Thu, Mar 6, 2014 at 4:15 PM, Steven Walling <[hidden email]>wrote:
>
>> If your patch causes a serious UX regression like this, it's going to get
>> reverted. The core patch involved was being deployed to Wikimedia sites /
>> impacting MobileFrontEnd users today. If we had more time in the deployment
>> cycle to wait and the revert was a simple disagreement, then waiting would
>> be appropriate. It is obvious in this case no one tested the core change on
>> mobile. That's unacceptable.
>>
>
> You quoted my email, but didn't seem to read it. Changes to MediaWiki core
> should not have to take into account extensions that incorrectly rely on
> its interface, and a breakage in a deployed extension should result in an
> undeployment and a fix to that extension, not a revert of the core patch.
>
> *-- *
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2016
> Major in Computer Science
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Jon Robson
* http://jonrobson.me.uk
* https://www.facebook.com/jonrobson
* @rakugojon

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Brion Vibber-4
In reply to this post by Tyler Romeo
On Thu, Mar 6, 2014 at 2:08 PM, Tyler Romeo <[hidden email]> wrote:

> On Thu, Mar 6, 2014 at 4:15 PM, Steven Walling <[hidden email]
> >wrote:
>
> > If your patch causes a serious UX regression like this, it's going to get
> > reverted. The core patch involved was being deployed to Wikimedia sites /
> > impacting MobileFrontEnd users today. If we had more time in the
> deployment
> > cycle to wait and the revert was a simple disagreement, then waiting
> would
> > be appropriate. It is obvious in this case no one tested the core change
> on
> > mobile. That's unacceptable.
> >
>
> You quoted my email, but didn't seem to read it. Changes to MediaWiki core
> should not have to take into account extensions that incorrectly rely on
> its interface, and a breakage in a deployed extension should result in an
> undeployment and a fix to that extension, not a revert of the core patch.
>

Changes to MediaWiki core should avoid breaking Wikipedia in production,
especially since we aggressively push new versions of core and extensions
to Wikipedia every few weeks.

For years and years and years we've been very free about reverting things
that break. No one, including old-timers like me and Tim, has the "right"
to not have something reverted. If it needs to be reverted it will be
reverted -- there is nothing personal in a revert. Remember it can always
be put back once all problems are resolved.

Is there anything specific in the communications involved that you found
was problematic, other than a failure to include a backlink in the initial
revert?

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

George William Herbert
To take a couple of steps back...

This happened because testing isn't robust enough?

That should be discussed and followed up on.



On Thu, Mar 6, 2014 at 3:34 PM, Brion Vibber <[hidden email]> wrote:

> On Thu, Mar 6, 2014 at 2:08 PM, Tyler Romeo <[hidden email]> wrote:
>
> > On Thu, Mar 6, 2014 at 4:15 PM, Steven Walling <[hidden email]
> > >wrote:
> >
> > > If your patch causes a serious UX regression like this, it's going to
> get
> > > reverted. The core patch involved was being deployed to Wikimedia
> sites /
> > > impacting MobileFrontEnd users today. If we had more time in the
> > deployment
> > > cycle to wait and the revert was a simple disagreement, then waiting
> > would
> > > be appropriate. It is obvious in this case no one tested the core
> change
> > on
> > > mobile. That's unacceptable.
> > >
> >
> > You quoted my email, but didn't seem to read it. Changes to MediaWiki
> core
> > should not have to take into account extensions that incorrectly rely on
> > its interface, and a breakage in a deployed extension should result in an
> > undeployment and a fix to that extension, not a revert of the core patch.
> >
>
> Changes to MediaWiki core should avoid breaking Wikipedia in production,
> especially since we aggressively push new versions of core and extensions
> to Wikipedia every few weeks.
>
> For years and years and years we've been very free about reverting things
> that break. No one, including old-timers like me and Tim, has the "right"
> to not have something reverted. If it needs to be reverted it will be
> reverted -- there is nothing personal in a revert. Remember it can always
> be put back once all problems are resolved.
>
> Is there anything specific in the communications involved that you found
> was problematic, other than a failure to include a backlink in the initial
> revert?
>
> -- brion
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
-george william herbert
[hidden email]
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Tyler Romeo
In reply to this post by Brion Vibber-4
On Thu, Mar 6, 2014 at 6:34 PM, Brion Vibber <[hidden email]> wrote:

> Is there anything specific in the communications involved that you found
> was problematic, other than a failure to include a backlink in the initial
> revert?
>

I think this entire thing was a big failure in basic software development
and systems administration. If MobileFrontend is so tightly coupled with
the desktop login form, that is a problem with MobileFrontend. In addition,
the fact that a practically random code change was launched into production
an hour later without so much as a test... That's the kind of thing that
gets people fired at other companies.

But apparently I'm the only person that thinks this, so the WMF can feel
free to do what it wants.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Ryan Lane-2
On Thu, Mar 6, 2014 at 3:54 PM, Tyler Romeo <[hidden email]> wrote:

> I think this entire thing was a big failure in basic software development
> and systems administration. If MobileFrontend is so tightly coupled with
> the desktop login form, that is a problem with MobileFrontend. In addition,
> the fact that a practically random code change was launched into production
> an hour later without so much as a test... That's the kind of thing that
> gets people fired at other companies.
>
>
At shitty companies maybe.

Things break. You do a post-mortem and track the things that lead to an
outage and try to make sure it doesn't break again, ideally by adding
automated tests, if possible.

- Ryan
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Brion Vibber-4
In reply to this post by Tyler Romeo
On Thu, Mar 6, 2014 at 3:54 PM, Tyler Romeo <[hidden email]> wrote:

> On Thu, Mar 6, 2014 at 6:34 PM, Brion Vibber <[hidden email]>
> wrote:
>
> > Is there anything specific in the communications involved that you found
> > was problematic, other than a failure to include a backlink in the
> initial
> > revert?
> >
>
> I think this entire thing was a big failure in basic software development
> and systems administration. If MobileFrontend is so tightly coupled with
> the desktop login form, that is a problem with MobileFrontend.


As noted already in the thread, the commit was broken for non-JS users as
well as for mobile; it's not a deficiency in MobileFrontend specifically.


> In addition,
> the fact that a practically random code change was launched into production
> an hour later without so much as a test...


Aha -- this seems to strike to the heart of the matter. Would you agree
this incident has more to do with problems with the branch deployment
scheduling than with commit warring?

-- brion


That's the kind of thing that

> gets people fired at other companies.
>
> But apparently I'm the only person that thinks this, so the WMF can feel
> free to do what it wants.
>
> *-- *
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2016
> Major in Computer Science
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Rob Lanphier-4
In reply to this post by Tyler Romeo
Hi Tyler,

I understand you're frustrated here.  As Jon says: "communication in the
wikiverse is hard".  Also, running a top 10 website is also hard.

Others have covered many of the other points, but I wanted to make sure I
addressed one of the points that hasn't been covered yet:

On Thu, Mar 6, 2014 at 2:08 PM, Tyler Romeo <[hidden email]> wrote:

> Changes to MediaWiki core
> should not have to take into account extensions that incorrectly rely on
> its interface, and a breakage in a deployed extension should result in an
> undeployment and a fix to that extension, not a revert of the core patch.


I wholeheartedly disagree with this.  Changes to core should definitely
take into account uses by widely-deployed extensions (where
"widely-deployed" can either mean by installation count or by end-user
count), even if the usage is "incorrect".  We need to handle these things
on a case by case basis, but in general, *all* of the following are options
when a core change introduces an unintentional extension incompatibility:
1.  Fix the extension quickly
2.  Revert the change
3.  Undeploy the extension until its fixed to be compatible with core

#3 is last and least.  It should be exceedingly rare that we undeploy a
long-running extension because of a newly-introduced core change.

It is often difficult to know exactly how core "should" be used, and
sometimes, extension developers need to do things that seem hacky or wrong
to achieve the desired result.  It will often be the case that we'll need
to continue to support misfeatures because breaking them would be too
disruptive.  Over time, if we improve our practices, this sort of tradeoff
will need to happen less-and-less, but it does need to happen.

Drawing the analogy to wiki world, what has happened here is exactly this:
https://en.wikipedia.org/wiki/Wikipedia:BOLD,_revert,_discuss_cycle

We're in the discuss part, which is actually where we should be.

Rob
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Erik Bernhardson
In reply to this post by Brion Vibber-4
On Thu, Mar 6, 2014 at 4:05 PM, Brion Vibber <[hidden email]> wrote:

> On Thu, Mar 6, 2014 at 3:54 PM, Tyler Romeo <[hidden email]> wrote:
>
> > On Thu, Mar 6, 2014 at 6:34 PM, Brion Vibber <[hidden email]>
> > wrote:
> >
> > > Is there anything specific in the communications involved that you
> found
> > > was problematic, other than a failure to include a backlink in the
> > initial
> > > revert?
> > >
> >
> > I think this entire thing was a big failure in basic software development
> > and systems administration. If MobileFrontend is so tightly coupled with
> > the desktop login form, that is a problem with MobileFrontend.
>
>
> As noted already in the thread, the commit was broken for non-JS users as
> well as for mobile; it's not a deficiency in MobileFrontend specifically.
>
>
> > In addition,
> > the fact that a practically random code change was launched into
> production
> > an hour later without so much as a test...
>
>
> Aha -- this seems to strike to the heart of the matter. Would you agree
> this incident has more to do with problems with the branch deployment
> scheduling than with commit warring?
>
> -- brion
>
> t
Does core have any  policies related to merging?  The core features team
has adopted a methodology(although slightly different) that we learned of
from the VE team.  Essentially +2 for 24 hours before a deployment branch
is cut is limited to fixes for bugs that  were introduced since the last
deployment branch was cut or reverts for patches that turned out to not be
ready for deployment.  Core is certainly bigger and with more participants,
but perhaps a conversation about when to +2 and how that effects the
deployment process would be benefitial?


> That's the kind of thing that
> > gets people fired at other companies.
> >
> > But apparently I'm the only person that thinks this, so the WMF can feel
> > free to do what it wants.
> >
> > *-- *
> > *Tyler Romeo*
> > Stevens Institute of Technology, Class of 2016
> > Major in Computer Science
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Chris McMahon
In reply to this post by Tyler Romeo
On Thu, Mar 6, 2014 at 4:54 PM, Tyler Romeo <[hidden email]> wrote:

> On Thu, Mar 6, 2014 at 6:34 PM, Brion Vibber <[hidden email]>
> wrote:
>
> > Is there anything specific in the communications involved that you found
> > was problematic, other than a failure to include a backlink in the
> initial
> > revert?
> >
>
> I think this entire thing was a big failure in basic software development
> and systems administration. If MobileFrontend is so tightly coupled with
> the desktop login form, that is a problem with MobileFrontend. In addition,
> the fact that a practically random code change was launched into production
> an hour later without so much as a test...


It was in fact our automated browser test suite that alerted us that a
change to some other area of the software overnight had broken some central
MobileFrontend functionality.  It was rather unexpected, and we moved
quickly to identify the issue and revert it in the short amount of time we
had before the code went to production.


> That's the kind of thing that
> gets people fired at other companies.
>
> But apparently I'm the only person that thinks this, so the WMF can feel
> free to do what it wants.


That sort of thing is not necessary.

-Chris
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Bartosz Dziewoński
In reply to this post by Tyler Romeo
It seems that commenters here believe that the patch made it impossible to create an account if JavaScript was disabled, or via MobileFrontend – this is obviously not true, it just required an additional confirmation (which was by design and +1'd by five people). Please stop spreading this disinformation.

--
Matma Rex

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Chris Steipp
In reply to this post by Erik Bernhardson
On Thu, Mar 6, 2014 at 4:08 PM, Erik Bernhardson <[hidden email]

> wrote:
>
> Does core have any  policies related to merging?  The core features team
> has adopted a methodology(although slightly different) that we learned of
> from the VE team.  Essentially +2 for 24 hours before a deployment branch
> is cut is limited to fixes for bugs that  were introduced since the last
> deployment branch was cut or reverts for patches that turned out to not be
> ready for deployment.  Core is certainly bigger and with more participants,
> but perhaps a conversation about when to +2 and how that effects the
> deployment process would be benefitial?
>
>
Formally, no (not that I know of). Informally, I know a lot of us do a lot
of merging on Fridays, partly for this reason. I resisted merging a big
patch this morning because I want it to sit in beta for a while. I know a
few patches were merged this morning so that they *would* make it into
today's deploy. Everyone with +2 should always think about how/when things
will be deployed, and merge as appropriate. And it seems like most people
use good judgement most of the time.

If this is coming up a lot, then yeah, let's make some policy about it, or
just enforce it in how we do the branch cutting.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Greg Grossmeier-2
In reply to this post by George William Herbert
<quote name="George Herbert" date="2014-03-06" time="15:53:50 -0800">
> To take a couple of steps back...
>
> This happened because testing isn't robust enough?
>
> That should be discussed and followed up on.

Ish.

The suite of automatic tests caught this bug, actually. It's how the
mobile team found out about it as they got to work this morning. So the
testing is quite robust.


I'd argue, that the revert didn't happen fast enough.

"Whoa! Greg! Don't be incendiary!"

What I mean by that is:

I want us (where 'us' == any developer writing MediaWiki or extension
code) to get to the point where we reject and revert any commit which
breaks the test suite. Basically, when a test fails after a commit you
(as the developer) should:

A) fix it right away (like, now)
or
B) revert your commit that broke it and work on a fix


B enables other people to continue working with a good state of the
software.

Doing C, which is what happened here, makes *your work stop other
people's productivity*. Period.

This should happen no matter where the test fails; if it's "your code"
or not. Your code caused it (in the sense that the test didn't fail
before), so you should work on fixing it.

There's a bit more nuance here:

A test can fail for any number of reasons including badly written tests
or the test infrastructure failing somehow. Part of the above decision
tree should include determining what actually broke. If the test was
badly written, rewrite it or remove it if it no longer applies.


We aren't to this point yet; we need a bit more test coverage and we
need to speed up the feedback cycle for auto browser tests, but we're
headed in this direction. On purpose.

We need to get in the habit of making every commit deployable. No more
breaking beta cluster for a day while we work something out. No more
breaking other parts of the code base and ignoring it because 'it's not
core'.


Greg

--
| Greg Grossmeier            GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg                A18D 1138 8E47 FAC8 1C7D |

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
OQ
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

OQ
So I'm confused on the timeline here.

Did the commit get merged before the testsuite found the breakage, or did
the commit get merged despite the testsuite failing?

Either way sounds like "When to merge a changeset" needs reviewed.


On Thu, Mar 6, 2014 at 6:38 PM, Greg Grossmeier <[hidden email]> wrote:

> <quote name="George Herbert" date="2014-03-06" time="15:53:50 -0800">
> > To take a couple of steps back...
> >
> > This happened because testing isn't robust enough?
> >
> > That should be discussed and followed up on.
>
> Ish.
>
> The suite of automatic tests caught this bug, actually. It's how the
> mobile team found out about it as they got to work this morning. So the
> testing is quite robust.
>
>
> I'd argue, that the revert didn't happen fast enough.
>
> "Whoa! Greg! Don't be incendiary!"
>
> What I mean by that is:
>
> I want us (where 'us' == any developer writing MediaWiki or extension
> code) to get to the point where we reject and revert any commit which
> breaks the test suite. Basically, when a test fails after a commit you
> (as the developer) should:
>
> A) fix it right away (like, now)
> or
> B) revert your commit that broke it and work on a fix
>
>
> B enables other people to continue working with a good state of the
> software.
>
> Doing C, which is what happened here, makes *your work stop other
> people's productivity*. Period.
>
> This should happen no matter where the test fails; if it's "your code"
> or not. Your code caused it (in the sense that the test didn't fail
> before), so you should work on fixing it.
>
> There's a bit more nuance here:
>
> A test can fail for any number of reasons including badly written tests
> or the test infrastructure failing somehow. Part of the above decision
> tree should include determining what actually broke. If the test was
> badly written, rewrite it or remove it if it no longer applies.
>
>
> We aren't to this point yet; we need a bit more test coverage and we
> need to speed up the feedback cycle for auto browser tests, but we're
> headed in this direction. On purpose.
>
> We need to get in the habit of making every commit deployable. No more
> breaking beta cluster for a day while we work something out. No more
> breaking other parts of the code base and ignoring it because 'it's not
> core'.
>
>
> Greg
>
> --
> | Greg Grossmeier            GPG: B2FA 27B1 F7EB D327 6B8E |
> | identi.ca: @greg                A18D 1138 8E47 FAC8 1C7D |
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Gerrit Commit Wars

Chris McMahon
On Thu, Mar 6, 2014 at 5:49 PM, OQ <[hidden email]> wrote:

> So I'm confused on the timeline here.
>
> Did the commit get merged before the testsuite found the breakage, or did
> the commit get merged despite the testsuite failing?


The commit was merged late Wednesday.  The automated tests that
demonstrated the problem failed over Wednesday night and we analyzed the
failures early Thursday morning, which is routine.

As noted above, code committed late on Wednesday or early Thursday only
resides in the test environment on beta labs for a short time before going
to production wikis.  We intend to improve this situation in the not too
distant future, but for now that is the situation on the ground.
-Chris
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
12345