QA: Holding our code to better standards.

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

QA: Holding our code to better standards.

Jon Robson-2
Dear Greg, and anyone else that is involved in deployment

This is a follow-up from Dan Duvall's talk today during the metrics
meeting about voting browser tests.

Background:
The reading web team this quarter with the help of Dan Duvall has made
huge strides in our QA infrastructure. The extensions Gather,
MobileFrontend and now the new extension QuickSurveys are all running
browser tests on a per commit basis. A selected set of MobileFrontend
@smoke tests (a selected subset of all he tests) are running in
15minutes on every commit and the entire set of Gather browser tests
is running in around 21minutes. It marginally slows down getting
patches deployed... but I think this is a good thing. The results
speak for themselves.

In the past month (August 4th-September 4th) only 3/33 builds failed
for MobileFrontend's daily smoke test build [1] (all 3 due to issues
with the Jenkins infrastructure). For the full set of tests only 10/33
failed in the Chrome daily build [3], 8 of which were due to tests
being flakey and needing improvement or issues with the Jenkin
infrastructure and the two others serious bugs [4,5] brought about by
work the performance team had been doing that we were able to fix
shortly after.

In Firefox [2] there were only 6 failures and only 2 of these were
serious bugs, again caused by things outside MobileFrontend [4,6]. One
of these was pretty serious - we had started loading JavaScript for
users with legacy browsers such as IE6. These were caught prior to the
daily builds when suddenly our MobileFrontend commits would not merge.

The future!:
Given this success:
1) I would like to see us run @integration tests on core, but I
understand given the number of bugs this might not be feasible so far.
2) We should run @integration tests prior to deployments to the
cluster via the train and communicate out when we have failures (and
make a decision to push broken code)
3) I'd like to see other extensions adopt browser test voting on their
extensions. Please feel free to reach out to me if you need help with
that. The more coverage across our extensions we have, the better.

We really have no excuse going forward to push broken code out to our
users and at the very least we need to be visible to each other when
we are deploying broken code. We have a responsibility to our users.

Thoughts? Reactions? Who's with me?!

[1] https://integration.wikimedia.org/ci/view/Mobile/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/
[2] https://integration.wikimedia.org/ci/view/Mobile/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/
[3] https://integration.wikimedia.org/ci/view/Mobile/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/
[4] https://phabricator.wikimedia.org/T108045
[5] https://phabricator.wikimedia.org/T108191
[6] https://phabricator.wikimedia.org/T111233

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [WikimediaMobile] QA: Holding our code to better standards.

Greg Grossmeier-2
Looping in QA list and dropping our team private list.

<quote name="Jon Robson" date="2015-09-03" time="11:45:47 -0700">

> Dear Greg, and anyone else that is involved in deployment
>
> This is a follow-up from Dan Duvall's talk today during the metrics
> meeting about voting browser tests.
>
> Background:
> The reading web team this quarter with the help of Dan Duvall has made
> huge strides in our QA infrastructure. The extensions Gather,
> MobileFrontend and now the new extension QuickSurveys are all running
> browser tests on a per commit basis. A selected set of MobileFrontend
> @smoke tests (a selected subset of all he tests) are running in
> 15minutes on every commit and the entire set of Gather browser tests
> is running in around 21minutes. It marginally slows down getting
> patches deployed... but I think this is a good thing. The results
> speak for themselves.
>
> In the past month (August 4th-September 4th) only 3/33 builds failed
> for MobileFrontend's daily smoke test build [1] (all 3 due to issues
> with the Jenkins infrastructure). For the full set of tests only 10/33
> failed in the Chrome daily build [3], 8 of which were due to tests
> being flakey and needing improvement or issues with the Jenkin
> infrastructure and the two others serious bugs [4,5] brought about by
> work the performance team had been doing that we were able to fix
> shortly after.
>
> In Firefox [2] there were only 6 failures and only 2 of these were
> serious bugs, again caused by things outside MobileFrontend [4,6]. One
> of these was pretty serious - we had started loading JavaScript for
> users with legacy browsers such as IE6. These were caught prior to the
> daily builds when suddenly our MobileFrontend commits would not merge.
>
> The future!:
> Given this success:
> 1) I would like to see us run @integration tests on core, but I
> understand given the number of bugs this might not be feasible so far.
> 2) We should run @integration tests prior to deployments to the
> cluster via the train and communicate out when we have failures (and
> make a decision to push broken code)
> 3) I'd like to see other extensions adopt browser test voting on their
> extensions. Please feel free to reach out to me if you need help with
> that. The more coverage across our extensions we have, the better.
>
> We really have no excuse going forward to push broken code out to our
> users and at the very least we need to be visible to each other when
> we are deploying broken code. We have a responsibility to our users.
>
> Thoughts? Reactions? Who's with me?!
>
> [1] https://integration.wikimedia.org/ci/view/Mobile/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/
> [2] https://integration.wikimedia.org/ci/view/Mobile/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/
> [3] https://integration.wikimedia.org/ci/view/Mobile/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/
> [4] https://phabricator.wikimedia.org/T108045
> [5] https://phabricator.wikimedia.org/T108191
> [6] https://phabricator.wikimedia.org/T111233
>
> _______________________________________________
> Mobile-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/mobile-l

--
| Greg Grossmeier            GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg                A18D 1138 8E47 FAC8 1C7D |

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: QA: Holding our code to better standards.

Matthew Flaschen-2
In reply to this post by Jon Robson-2
On 09/03/2015 02:45 PM, Jon Robson wrote:

> The future!:
> Given this success:
> 1) I would like to see us run @integration tests on core, but I
> understand given the number of bugs this might not be feasible so far.
> 2) We should run @integration tests prior to deployments to the
> cluster via the train and communicate out when we have failures (and
> make a decision to push broken code)
> 3) I'd like to see other extensions adopt browser test voting on their
> extensions. Please feel free to reach out to me if you need help with
> that. The more coverage across our extensions we have, the better.

+100%  I assume #2 should be "make a decision whether to push broken code".

Matt Flaschen


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: QA: Holding our code to better standards.

Pine W
I just want to say that I appreciate this overview.

Pine
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Holding our code to better standards.

Greg Grossmeier-2
In reply to this post by Jon Robson-2
(I've put the TO: field as the QA list only, and put everyone else on
BCC now. If you're curious in this topic, please join the QA mailing
list and follow along. It's not a very high traffic list.)

<quote name="Jon Robson" date="2015-09-03" time="11:45:47 -0700">
> Dear Greg, and anyone else that is involved in deployment

Hi there :)

> <successes over the past month>

Awesome :)

> The future!:
> Given this success:
> 1) I would like to see us run @integration tests on core, but I
> understand given the number of bugs this might not be feasible so far.

https://integration.wikimedia.org/ci/view/BrowserTests/view/Core/

Those are pretty stable, but limited (rightfully) in number. Looks like
the last run of the first job took 8 minutes. Not too bad.

> 2) We should run @integration tests prior to deployments to the
> cluster via the train and communicate out when we have failures (and
> make a decision to push broken code)

The way I hear this is: Run @integration tests on merge to wmfXX
branches. Is that an accurate rephrasing?

If not, then it mean what we're planning on doing with respect to the
"Staging" cluster work we started, paused (due to time constraints), and
will restart in Q3. tl;dr: The staging cluster will be a cluster that
runs nightly full blown tests against a git tag. In the morning we'll be
able to make a go/no-go decision on deploying that tag.

That "nightly" part can, of course, be modified to whatever frequency we
can support (which can probably be pretty fast).

> 3) I'd like to see other extensions adopt browser test voting on their
> extensions. Please feel free to reach out to me if you need help with
> that. The more coverage across our extensions we have, the better.

Thanks for the offer of help, Jon! That's awesome. I love the idea of
teams/groups helping other teams/groups! You, personally, have been
great this so far and I thank you for that.

Greg

--
| Greg Grossmeier            GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg                A18D 1138 8E47 FAC8 1C7D |

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: QA: Holding our code to better standards.

Steven Walling
In reply to this post by Pine W
Just to hop on the bandwagon here: this seems like the only sane path going
forward. One unmentioned benefit is that this is a step toward continuous
deployment. Having integration tests run on every commit and then block
when there are failures is pretty much a requirement if Wikimedia ever
wants to get there.

On Thu, Sep 3, 2015 at 1:43 PM Pine W <[hidden email]> wrote:

> I just want to say that I appreciate this overview.
>
> Pine
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: QA: Holding our code to better standards.

Gabriel Wicke-3
In the services team, we found that prominent coverage metrics are a very
powerful motivator for keeping tests in order. We have set up 'voting'
coverage reports, which fail the overall tests if coverage falls, and make
it easy to check which lines aren't covered yet (via coveralls). In all
repositories we enabled this for, test coverage has since stabilized around
80-90%.

Gabriel

On Thu, Sep 3, 2015 at 4:31 PM, Steven Walling <[hidden email]>
wrote:

> Just to hop on the bandwagon here: this seems like the only sane path going
> forward. One unmentioned benefit is that this is a step toward continuous
> deployment. Having integration tests run on every commit and then block
> when there are failures is pretty much a requirement if Wikimedia ever
> wants to get there.
>
> On Thu, Sep 3, 2015 at 1:43 PM Pine W <[hidden email]> wrote:
>
> > I just want to say that I appreciate this overview.
> >
> > Pine
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: QA: Holding our code to better standards.

Greg Grossmeier-2
<quote name="Gabriel Wicke" date="2015-09-03" time="17:03:03 -0700">
> In the services team, we found that prominent coverage metrics are a very
> powerful motivator for keeping tests in order. We have set up 'voting'
> coverage reports, which fail the overall tests if coverage falls, and make
> it easy to check which lines aren't covered yet (via coveralls). In all
> repositories we enabled this for, test coverage has since stabilized around
> 80-90%.

We (RelEng), too, are interested in this. Given the nature of our
projects we'll probably need to start this on a case-by-case basis,
(un)fortunately. :)

There's two parts to this (as I see it): informational and enforcement.

Informational:
* "Generate code coverage reports for extensions"
** https://phabricator.wikimedia.org/T71685
* Add ^^^ to "QA Health scoreboard"
** https://phabricator.wikimedia.org/T108768

Enforcement:
* What Gabriel described above.
** There's no one ticket for tracking this cross repos right now, I'll
create one...
** https://phabricator.wikimedia.org/T111546

Greg

PS: I didn't mean to, but I forked this thread across wikitech-l and qa
lists (my bcc to wikitech-l didn't make it through mailman, I don't
think). See the other sub-thread on adding @integration test runs on wmf
deploy branch creation at:
https://lists.wikimedia.org/pipermail/qa/2015-September/thread.html

--
| Greg Grossmeier            GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg                A18D 1138 8E47 FAC8 1C7D |

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: QA: Holding our code to better standards.

Željko Filipin
In reply to this post by Jon Robson-2
On Thu, Sep 3, 2015 at 8:45 PM, Jon Robson <[hidden email]> wrote:

> This is a follow-up from Dan Duvall's talk today during the metrics
> meeting about voting browser tests.
>

If you did not see it (34:30-44:30):

https://youtu.be/Hy307xn99-c?t=34m26s

Please notice the explanation of release engineering team, by Evil Greg:
Delivering deliverables delivery since our delivery.

I am getting that tattooed. :)

Željko
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l