Recent thumbnail problems and problem reporting.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Recent thumbnail problems and problem reporting.

Gregory Maxwell
Two days ago the disk filled up on one of our servers, Bacon,
(http://ganglia.wikimedia.org/pmtpa/graph.php?c=Miscellaneous&h=bacon.wikimedia.org&v=0.070&m=disk_free&r=week&z=medium&jr=&js=&vl=GB).

The full disk resulted in some thumbnails failing to render.

The root problem was resolved, but some of the failed thumbnails
remained failed. They could be resolved by purging the image page, or
by simply waiting for the cache to expire for them.  The technical
team considered the matter closed.

Sometime today awareness of broken thumbs on English Wikipedia rocketed up.

Rather than successfully flagging the tech team's attention, a series
of inaccurate sitenotices were placed on English Wikipedia and on
several other language Wikipedias. The English notice in particular
was displayed to the general public.

The notices claimed that the issue was being worked on. This was not
correct. The notice most likely caused people to not report the
problems they were seeing.

None of the active tech team were aware of any ongoing issue. It was
understood that some images would fail to display until their cache
expired but this was not believed to be an issue significant enough in
scale to justify any action.

When I happened to browse over to enwp as a reader I saw the notice.
I asked ST47 to remove the notice.  I got a hold of our resident
caching god, Mark Bergsma, and went ahead and mass-purged all the
thumbnails.

Sometime after that point the incorrect notice was restored on English
Wikipedia and revised several times, and in its last version it
attempted to give bad directions on how to purge images. It is
generally inadvisable to instruct the general public to purge pages on
a wide scale for a number of reasons.

All in all this issue was handled poorly all around. On the tech side
a status report should have gone out after the fix, and on the
Wikipedia admins side no claim should ever be made that a problem is
being worked on unless you are darn sure that it is the case.

There are also some issues related to how we communicate with the
public, but I'll leave it to someone else to complain about that.

My biggest fear is that had there been a second issue it may have
persisted for days with the techs unaware of the problem. I've seen
some prior examples of over eagerness to claim something is being
worked on in the past in our user communities. It frightens me for
this reason.

Hopefully future events will be handled better and this message will
increase awareness of the potential issues involved.

Thanks for your time.

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
http://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Recent thumbnail problems and problem reporting.

Gregory Maxwell
On 9/16/07, Gregory Maxwell <[hidden email]> wrote:
[snip]
> I asked ST47 to remove the notice.  I got a hold of our resident
> caching god, Mark Bergsma, and went ahead and mass-purged all the
> thumbnails.
[snip]

I intended to state "and he went ahead and mass-purged all the
thumbnails which were believed to be affected by the issue".

My excuse for this failure to proofread is that I've been spoiled by
the ability to revise my own comments on the wikis. Yea.. spoiled..
thats the ticket.

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
http://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Recent thumbnail problems and problem reporting.

Mark Ryan
In reply to this post by Gregory Maxwell
On 17/09/2007, Gregory Maxwell <[hidden email]> wrote:
*snip*
> I've seen
> some prior examples of over eagerness to claim something is being
> worked on in the past in our user communities.
*snip*

On IRC when Wikipedia goes down, I always set the channel entry
message to say that our "Technical Response Group" is working to fix
the problem, because with something as serious as a Wikipedia
downtime, the techs generally are already upon it. Is that wrong?

~Mark Ryan

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
http://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Recent thumbnail problems and problem reporting.

Gregory Maxwell
On 9/16/07, Mark Ryan <[hidden email]> wrote:

> On 17/09/2007, Gregory Maxwell <[hidden email]> wrote:
> *snip*
> > I've seen
> > some prior examples of over eagerness to claim something is being
> > worked on in the past in our user communities.
> *snip*
>
> On IRC when Wikipedia goes down, I always set the channel entry
> message to say that our "Technical Response Group" is working to fix
> the problem, because with something as serious as a Wikipedia
> downtime, the techs generally are already upon it. Is that wrong?

In cases of serious issues if you do not have direct personal
knowledge that someone with shell access
(http://meta.wikimedia.org/wiki/Developers) is working on or at least
acutely aware of the issue, please do not make the claim that it is
being worked on. Allow those who have direct knowledge to make the
claim.

Since you mention IRC... you are welcome to join #wikimedia-tech.
Please listen for a moment before asking. And be aware that if there
is technical banter between folks that doesn't mean the right people
are aware of the issue. Many problems can only be addressed people on
the sysadmin end of the spectrum and there are a large number of
people, including some MediaWiki developers, who are not sysadmins and
can not actually fix many problems even if they understand them and
are talking about them. Do not assume that any person who knows more
than you can fix the issue, will fix the issue, or will even bother to
report it to someone who can.


In cases where the site is down, yes... Tech folks will know about it,
but there is no harm in not making the statement unless you are sure.

In cases which are serious but are not a total-site down event it is
somewhat more likely that we've had some new and exciting mode of
failure that the monitoring tools can not yet catch. In these cases it
is especially important that we do not prematurely suppress trouble
reports.

In all cases over-reporting is preferable to under reporting. The tech
IRC channel can be set moderated. Emails and OTRS messages can be
filtered. And, of course, if you see one of the people listed with
shell access saying "Hush we know already!" then it's a safe bet that
the issue is actually being worked on. ;)

Also, if you do decide to contact any of the tech team yourself please
try to be detailed and constructive. Entering the tech IRC channel and
saying "The darn site is broken AGAIN!" doesn't help fix anything.
Instead say something like "When I load any page, like
http://en.wikipedia.org/wiki/Foo all the images are upside down. I'm
running firefox on windows and this has been going on for two hours!".

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
http://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Recent thumbnail problems and problem reporting.

Arne 'Timwi' Heizmann
In reply to this post by Gregory Maxwell
Gregory Maxwell wrote:
> Rather than successfully flagging the tech team's attention, a series
> of inaccurate sitenotices were placed on English Wikipedia and on
> several other language Wikipedias. The English notice in particular
> was displayed to the general public. [...] None of the active tech
> team were aware of any ongoing issue.

I don't understand this train of thought. If, as you say, the notice was
displayed _to the general public_, how can the tech team remain unaware
of it? Are they a bunch of robots sitting in a basement who act only
upon direct command and who never browse Wikipedia as a member of the
general public?

Presumably the main reason something was mentioned in the sitenotice but
not to the tech team is that out of all active Wikipedia admins, a great
majority (myself included) probably know how to put something in the
sitenotice but not how to contact the "active tech team". If the
sitenotice is the only course of action known to any particular admin,
then that admin will naturally take that course of action (I know I
would if I had been there).

Timwi


_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
http://lists.wikimedia.org/mailman/listinfo/wikien-l
Reply | Threaded
Open this post in threaded view
|

Re: Recent thumbnail problems and problem reporting.

Anthony-73
In reply to this post by Gregory Maxwell
On 9/17/07, Gregory Maxwell <[hidden email]> wrote:
> In all cases over-reporting is preferable to under reporting.

In that spirit, the pages-meta-history dump broke, again.  Someone
please report this to someone who can fix it, and if you could have
someone report back letting us know what the problem is and whether or
not it'll ever be fixed, that'd be awesome.

_______________________________________________
WikiEN-l mailing list
[hidden email]
To unsubscribe from this mailing list, visit:
http://lists.wikimedia.org/mailman/listinfo/wikien-l