Massive image loss

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Massive image loss

Tim Starling-2
This is a triple-crosspost. I suggest you reply to wikitech-l only.

A mistake I made caused the loss of 496 full-resolution images from
Wikimedia servers.

I have recovered as many images as I can, drawing on the following sources:

* Squid cache (pmtpa, knams and yaseo)
* May 8 backup of some wikis on storage1
* Duplicates with the same signature, found on the same or other wikis

That brought the number lost down from about 3000 to the current 496. For
the remaining files, I made a copy of their thumbnail directories:

http://upload.wikimedia.org/lost-image-thumb-backup/

A list of missing images can be found here:

http://noc.wikimedia.org/~tstarling/missing-images-2008-09

If anyone has any ideas about where to find more backup files, I'd be
willing to hear them. Otherwise, the community will just have to reupload
as many as possible.

The technical details were as follows: I fixed a bug in File.php, and
without checking what other changes were made to it, deployed the most
recent version of the file on the Wikimedia servers, without also updating
the rest of MediaWiki. Because FileRepo::$thumbDir was unset,
LocalFile::migrateThumbFile() had the effect of deleting the source image
for any thumbnail request which reached the backend. I reverted the change
after about 20 minutes, following a report on IRC.

My sincere apologies.

-- Tim Starling

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Massive image loss

Magnus Manske-2
On Fri, Sep 5, 2008 at 11:11 AM, Tim Starling <[hidden email]> wrote:

> This is a triple-crosspost. I suggest you reply to wikitech-l only.
>
> A mistake I made caused the loss of 496 full-resolution images from
> Wikimedia servers.
>
> I have recovered as many images as I can, drawing on the following sources:
>
> * Squid cache (pmtpa, knams and yaseo)
> * May 8 backup of some wikis on storage1
> * Duplicates with the same signature, found on the same or other wikis
>
> That brought the number lost down from about 3000 to the current 496. For
> the remaining files, I made a copy of their thumbnail directories:
>
> http://upload.wikimedia.org/lost-image-thumb-backup/
>
> A list of missing images can be found here:
>
> http://noc.wikimedia.org/~tstarling/missing-images-2008-09
>
> If anyone has any ideas about where to find more backup files, I'd be
> willing to hear them. Otherwise, the community will just have to reupload
> as many as possible.

At least one of them ( Clan_member_crest_badge_-_Clan_MacTavish.svg )
was reuploaded in a coincidence :-)

How about a script adding a message to the talk page of the respective uploader?

Magnus

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Massive image loss

David Gerard-2
In reply to this post by Tim Starling-2
2008/9/5 Tim Starling <[hidden email]>:

> A mistake I made caused the loss of 496 full-resolution images from
> Wikimedia servers.


*facepalm* One of them just had to be the Flag of Palestine, didn't it ... ;-p


- d.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Massive image loss

Hay (Husky)
In reply to this post by Magnus Manske-2
I think it would be very helpful to have a thumbnail gallery of all
missing images. I'm sure people still have them lying around on their
hard disks or somewhere on the internet.

-- Hay / Husky

On Fri, Sep 5, 2008 at 12:44 PM, John at Darkstar <[hidden email]> wrote:

> Perhaps also add which pages the image was used at on the image list?
> That way you increase the chance of people noticing images they have
> uploaded. Also, could there be a split on which projects the images was
> used at?
> John
>
> Magnus Manske skrev:
>> On Fri, Sep 5, 2008 at 11:11 AM, Tim Starling <[hidden email]> wrote:
>>> This is a triple-crosspost. I suggest you reply to wikitech-l only.
>>>
>>> A mistake I made caused the loss of 496 full-resolution images from
>>> Wikimedia servers.
>>>
>>> I have recovered as many images as I can, drawing on the following sources:
>>>
>>> * Squid cache (pmtpa, knams and yaseo)
>>> * May 8 backup of some wikis on storage1
>>> * Duplicates with the same signature, found on the same or other wikis
>>>
>>> That brought the number lost down from about 3000 to the current 496. For
>>> the remaining files, I made a copy of their thumbnail directories:
>>>
>>> http://upload.wikimedia.org/lost-image-thumb-backup/
>>>
>>> A list of missing images can be found here:
>>>
>>> http://noc.wikimedia.org/~tstarling/missing-images-2008-09
>>>
>>> If anyone has any ideas about where to find more backup files, I'd be
>>> willing to hear them. Otherwise, the community will just have to reupload
>>> as many as possible.
>>
>> At least one of them ( Clan_member_crest_badge_-_Clan_MacTavish.svg )
>> was reuploaded in a coincidence :-)
>>
>> How about a script adding a message to the talk page of the respective uploader?
>>
>> Magnus
>>
>> _______________________________________________
>> foundation-l mailing list
>> [hidden email]
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Massive image loss

Andre Engels
In reply to this post by Tim Starling-2
Could we perhaps get a list of links to these images' image pages?
That way we might be able to recover a few of them by noting that
their original source is still available.


--
André Engels, [hidden email]
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Massive image loss

Steve Summit
In reply to this post by Tim Starling-2
I'm not sure I'd call 496, out of however many hundreds of
thousands of images we have, "massive".

Is there enough metainformation available to derive the uploaders
or recent editors of the lost images?  That'd make it much easier
for concerned editors to grep -- er, search :-) -- for images
they might be in a position to reupload.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Massive image loss

Mormegil
In reply to this post by Andre Engels
2008/9/5 Andre Engels <[hidden email]>:
> Could we perhaps get a list of links to these images' image pages?
> That way we might be able to recover a few of them by noting that
> their original source is still available.

I've made the list of links at
http://meta.wikimedia.org/wiki/Missing_images_2008-09

(Just be formatting the original list
http://noc.wikimedia.org/~tstarling/missing-images-2008-09)

-- [[cs:User:Mormegil | Petr Kadlec]]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Commons-l] Massive image loss

Gregory Maxwell
In reply to this post by Tim Starling-2
On Fri, Sep 5, 2008 at 6:11 AM, Tim Starling <[hidden email]> wrote:
> This is a triple-crosspost. I suggest you reply to wikitech-l only.
>
> A mistake I made caused the loss of 496 full-resolution images from
> Wikimedia servers.
[snip]
> http://noc.wikimedia.org/~tstarling/missing-images-2008-09
>
> If anyone has any ideas about where to find more backup files, I'd be
> willing to hear them. Otherwise, the community will just have to reupload
> as many as possible.
[snip]

I have 30 of the 496 images in that list based on an exact path match.
 It's possible that I have more based on hash matches for image which
were moved between sites or 'renamed' after my last sync.

I have some chores to run, but I will later pull the hashes from the
database and check for hash matches.

I would likely have had nearly all of them if the rsync push to me had
not been down most of the year.

:(

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Commons-l] Massive image loss

Tim Starling-2
Gregory Maxwell wrote:
> On Fri, Sep 5, 2008 at 6:11 AM, Tim Starling <[hidden email]> wrote:
>> This is a triple-crosspost. I suggest you reply to wikitech-l only.
                                             ^^^^^^^^^^^^^^^^^^^^^^^^
I think some people missed this line.


>> A mistake I made caused the loss of 496 full-resolution images from
>> Wikimedia servers.
> [snip]
>> http://noc.wikimedia.org/~tstarling/missing-images-2008-09
>>
>> If anyone has any ideas about where to find more backup files, I'd be
>> willing to hear them. Otherwise, the community will just have to reupload
>> as many as possible.
> [snip]
>
> I have 30 of the 496 images in that list based on an exact path match.
>  It's possible that I have more based on hash matches for image which
> were moved between sites or 'renamed' after my last sync.
>
> I have some chores to run, but I will later pull the hashes from the
> database and check for hash matches.
>
> I would likely have had nearly all of them if the rsync push to me had
> not been down most of the year.

If it helps, this file has the hashes already:

http://noc.wikimedia.org/~tstarling/pass-3-targets-hashes

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Massive image loss

Chad
In reply to this post by Tim Starling-2
On Fri, Sep 5, 2008 at 6:11 AM, Tim Starling <[hidden email]> wrote:

> [snip]
>
> The technical details were as follows: I fixed a bug in File.php, and
> without checking what other changes were made to it, deployed the most
> recent version of the file on the Wikimedia servers, without also updating
> the rest of MediaWiki. Because FileRepo::$thumbDir was unset,
> LocalFile::migrateThumbFile() had the effect of deleting the source image
> for any thumbnail request which reached the backend. I reverted the change
> after about 20 minutes, following a report on IRC.
>
> My sincere apologies.
>
> -- Tim Starling
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

And mine as well. I introduced the $thumbDir code in r40385. I thought
I had set a sane default of 'thumb/' in the constructor (which would work
per current behavior of using hardcoded 'thumb/'. Is there a code-path in
which $thumbDir isn't being set? If so, that needs fixing asap. Would a
revert be in order, or is everything ok as-is?

-Chad

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Massive image loss

Tim Starling-2
Chad wrote:

> On Fri, Sep 5, 2008 at 6:11 AM, Tim Starling <[hidden email]> wrote:
>> [snip]
>>
>> The technical details were as follows: I fixed a bug in File.php, and
>> without checking what other changes were made to it, deployed the most
>> recent version of the file on the Wikimedia servers, without also updating
>> the rest of MediaWiki. Because FileRepo::$thumbDir was unset,
>> LocalFile::migrateThumbFile() had the effect of deleting the source image
>> for any thumbnail request which reached the backend. I reverted the change
>> after about 20 minutes, following a report on IRC.
>>
>> My sincere apologies.
>
> And mine as well. I introduced the $thumbDir code in r40385. I thought
> I had set a sane default of 'thumb/' in the constructor (which would work
> per current behavior of using hardcoded 'thumb/'. Is there a code-path in
> which $thumbDir isn't being set? If so, that needs fixing asap. Would a
> revert be in order, or is everything ok as-is?

If you had followed my example and used an accessor function, instead of
having the File class access member variables of the repo directly, then
there would have been no problem. Adding an accessor is good style in any
case, and you should make that change. But it wasn't your fault.

I patched two files in quick succession: GlobalFunctions.php and then
File.php. With GlobalFunctions.php, I checked the diff carefully for any
dependencies before I updated it on Wikimedia. There were no changes other
than my own. With File.php, I assumed it would be OK and didn't check. I
didn't think about it at the time, I was working quickly. Call it
cognitive bias, loss of concentration, laziness, whatever. Not your fault.

There was a second programming error here, and that was the fact that I
put an unlink() call in the code in the first place. It didn't seem
dangerous at the time, but obviously migrateThumbFile() is a recipe for
disaster if there's a potential for adverse input coming from getThumbPath().

However, the thumb directory is inherently temporary, and lots of things
delete from it. I think I'd be most comfortable not having the thumbDir
feature at all. Is there some reason for it?

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Massive image loss

Chad
On Fri, Sep 5, 2008 at 11:14 AM, Tim Starling <[hidden email]> wrote:

> [snip]
>
> However, the thumb directory is inherently temporary, and lots of things
> delete from it. I think I'd be most comfortable not having the thumbDir
> feature at all. Is there some reason for it?
>
> -- Tim Starling
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Customization options for sysadmins. Reverted in r40504. Largely
useless unless someone can think up a use-case for _needing_ it
to be a different location than /thumb.

In any case, I've removed it pending a reason for it (or at least a
better implementation with accessors and the like).

-Chad

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Massive image loss

Ashar Voultoiz-2
In reply to this post by Tim Starling-2
Tim Starling wrote:
<snip>
> A list of missing images can be found here:
>
> http://noc.wikimedia.org/~tstarling/missing-images-2008-09

I used your list to generate a basic gallery :

   http://noc.wikimedia.org/~hashar/200809-missing/

Maybe it can help people.

--
Ashar Voultoiz - WP++++
http://en.wikipedia.org/wiki/User:Hashar
http://www.livejournal.com/community/wikitech/
IM: [hidden email]


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Massive image loss

Steve Sanbeg
In reply to this post by Tim Starling-2
On Fri, 05 Sep 2008 20:11:10 +1000, Tim Starling wrote:

> This is a triple-crosspost. I suggest you reply to wikitech-l only.
>
> A mistake I made caused the loss of 496 full-resolution images from
> Wikimedia servers.
>
> I have recovered as many images as I can, drawing on the following sources:
>
> * Squid cache (pmtpa, knams and yaseo)
> * May 8 backup of some wikis on storage1
> * Duplicates with the same signature, found on the same or other wikis
>
> That brought the number lost down from about 3000 to the current 496. For
> the remaining files, I made a copy of their thumbnail directories:
>
> http://upload.wikimedia.org/lost-image-thumb-backup/
>
> A list of missing images can be found here:
>
> http://noc.wikimedia.org/~tstarling/missing-images-2008-09
>
> If anyone has any ideas about where to find more backup files, I'd be
> willing to hear them. Otherwise, the community will just have to reupload
> as many as possible.
>
> The technical details were as follows: I fixed a bug in File.php, and
> without checking what other changes were made to it, deployed the most
> recent version of the file on the Wikimedia servers, without also updating
> the rest of MediaWiki. Because FileRepo::$thumbDir was unset,
> LocalFile::migrateThumbFile() had the effect of deleting the source image
> for any thumbnail request which reached the backend. I reverted the change
> after about 20 minutes, following a report on IRC.
>
> My sincere apologies.
>
> -- Tim Starling

I just checked that list with my collection; it looks like I've got about
250 of them.  Is there someplace I can drop a tarball or somethings?




_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Commons-l] Massive image loss

Gregory Maxwell
In reply to this post by Tim Starling-2
On Fri, Sep 5, 2008 at 8:54 AM, Tim Starling <[hidden email]> wrote:
> If it helps, this file has the hashes already:
> http://noc.wikimedia.org/~tstarling/pass-3-targets-hashes

Thanks. Saved me a step… and fortunately I already had base conversion
code handy.

Sadly, it takes a long time to SHA1 many tbytes of data. I started the
process this morning, but I had made an error in assuming the xargs
parallel argument (-P) wouldn't result in badly interleaved output,
since it didn't in a limited test.  Turns out it did so I had to start
the hashing over again.

(Might I suggest, beyond not invoking unlink() that if your filesystem
can handle some additional inode pressure that you make daily or
weekly hardlink snapshots in a directory tree inaccessible to the web
front end?   It's not as good as a real backup system, but it's cheap
and easy.  On my system (xfs) I have a dozen or so hardlink snapshots
of the Wikimedia image collection: while I was getting updates I was
creating snapshots which roughly coincided with the released database
dumps)

Since the hashing is going to take a while I'll hop on IRC and pass
you a link to a tar with the file name matches. Turns out that I have
*most* of them based on name match alone. (dunno why my earlier count
was wrong… perhaps a unicode handling bug on my part, I'd just woken
up when I sent my prior email)
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Massive image loss

Waerth
In reply to this post by Tim Starling-2
Woops .... it is actually a miracle big mistakes like this haven't
occured before over the years! Which says a lot about the high quality
of the developers and maintainers of the site. Don't worry to much Tim,
it will work itself out. No more 24 our days behind the computer though ;)

Walter van Kalken (waerth)

> This is a triple-crosspost. I suggest you reply to wikitech-l only.
>
> A mistake I made caused the loss of 496 full-resolution images from
> Wikimedia servers.
>
> I have recovered as many images as I can, drawing on the following sources:
>
> * Squid cache (pmtpa, knams and yaseo)
> * May 8 backup of some wikis on storage1
> * Duplicates with the same signature, found on the same or other wikis
>
> That brought the number lost down from about 3000 to the current 496. For
> the remaining files, I made a copy of their thumbnail directories:
>
> http://upload.wikimedia.org/lost-image-thumb-backup/
>
> A list of missing images can be found here:
>
> http://noc.wikimedia.org/~tstarling/missing-images-2008-09
>
> If anyone has any ideas about where to find more backup files, I'd be
> willing to hear them. Otherwise, the community will just have to reupload
> as many as possible.
>
> The technical details were as follows: I fixed a bug in File.php, and
> without checking what other changes were made to it, deployed the most
> recent version of the file on the Wikimedia servers, without also updating
> the rest of MediaWiki. Because FileRepo::$thumbDir was unset,
> LocalFile::migrateThumbFile() had the effect of deleting the source image
> for any thumbnail request which reached the backend. I reverted the change
> after about 20 minutes, following a report on IRC.
>
> My sincere apologies.
>
> -- Tim Starling
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
>
>
>  



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Massive image loss

Steve Summit
In reply to this post by Mormegil
Petr Kadlec wrote:
> 2008/9/5 Andre Engels <[hidden email]>:
> > Could we perhaps get a list of links to these images' image pages?
>
> I've made the list of links at
> http://meta.wikimedia.org/wiki/Missing_images_2008-09

And I've made a list augmented with each image's uploader/editor(s).
http://meta.wikimedia.org/wiki/Missing_images_%2B_editors_2008-09

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Massive image loss

Gregory Maxwell
On Fri, Sep 5, 2008 at 6:16 PM, Steve Summit <[hidden email]> wrote:
> Petr Kadlec wrote:
>> 2008/9/5 Andre Engels <[hidden email]>:
>> > Could we perhaps get a list of links to these images' image pages?
>>
>> I've made the list of links at
>> http://meta.wikimedia.org/wiki/Missing_images_2008-09
>
> And I've made a list augmented with each image's uploader/editor(s).
> http://meta.wikimedia.org/wiki/Missing_images_%2B_editors_2008-09

It turned out that I had even more than I thought, thanks to
Platonides who has been running a bot on my system that has the stale
mirror, the bot has been patiently mirroring every file uploaded to
commons.  So the files in his directory added another 150 to the 308
that I had,   and a number of other people filled in some as well.

I'm still generating SHA1SUMs so I still may find a few more yet based
on content hashes.

The last concrete number I heard was 47 missing, but I think it's
probably less than that now.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Massive image loss

Tim Starling-2
In reply to this post by Steve Sanbeg
Steve Sanbeg wrote:
> I just checked that list with my collection; it looks like I've got about
> 250 of them.  Is there someplace I can drop a tarball or somethings?

We don't have any FTP upload server set up if that's what you mean. The
easiest thing would be if you could set up an HTTP server that I can
download the tarball from. If that's not feasible, grab me on IRC and
we'll sort something out.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Massive image loss

Huji Lee
An "out of the blue" idea that I haven't checked: Are those pages stored in
archive.org? Because if yes, then a copy of the image my also be there.

Hojjat (aka Huji)

On 9/6/08, Tim Starling <[hidden email]> wrote:

>
> Steve Sanbeg wrote:
> > I just checked that list with my collection; it looks like I've got about
> > 250 of them.  Is there someplace I can drop a tarball or somethings?
>
>
> We don't have any FTP upload server set up if that's what you mean. The
> easiest thing would be if you could set up an HTTP server that I can
> download the tarball from. If that's not feasible, grab me on IRC and
> we'll sort something out.
>
>
> -- Tim Starling
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
12