scaled media (thumbs) as *temporary* files, not stored forever

classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

Brion Vibber
On Wed, Sep 5, 2012 at 2:00 PM, Roan Kattouw <[hidden email]> wrote:

> On Wed, Sep 5, 2012 at 12:35 PM, Asher Feldman <[hidden email]>
> wrote:
> > Browser scaling is also at least worth
> > experimenting with.  Instances where browser scaling would be bad are
> > likely instances where the image is already subpar if viewed on a
> high-dpi
> > / retina display.
> Other instances where browser scaling is bad are:
> * PoS browsers that don't render SVGs (how old are these by now?)
>

IE up through 8, and Android stock browser through 2.3. Neither are dead
yet, so we still gotta deal with rasterization for them.

* Even modern browsers have subpar SVG rendering at 1x, PNG looks better
>

Examples? Sounds like bugs need to be filed with some of those browsers. :)


> * Some media types are "scaled" in unusual ways (SVGs, but also video
> stills, PDF pages, ...)
> * Some original images are really friggin' large (20-30 megapixels
> sometimes), so at least some downscaling is needed there
>

You'd absolutely want to do server-side downscaling to the base sizes in
the appropriate file formats -- we wouldn't try to download multi-megapixel
originals just to make a tiny thumbnail, and some formats require
conversion to a format the browser can read.

For an example if we were to standardize on sizes: (we wouldn't use these
actual sizes because they do NOT fit our usage)
* 32px
* 64px
* 128px
* 256px
* 512px
* 1024px
* 2048px

Then somebody requesting a 400px image might get the next size up, the
512px image delivered and scaled down in the browser. (On a high-resolution
display, you might fetch the 1024px image.)

In reality we'd want sizes that fit most common usage, and perhaps make
future markup & visual editor widgets promote using of standard sizes to
minimize the cases where you end up with something that's not an exact fit.


> * Mobile clients will want to minimize the amount of data transferred
>

This is a good reason for picking appropriate default sizes that would fit
with actual common usage.

Note that with SVG, SVG originals can be either much smaller or much larger
than a rasterized image -- in many cases we have SVGs that are much more
detailed than they need to be. So serving of SVG doesn't guarantee a
bandwidth save, though it can in well-designed cases.

Mobile also has the case that many (possibly even most in some markets)
devices have a greater than 1.0 device-to-CSS pixel ratio, so loading 1.5X
or 2.0X versions of raster images may be something we want in many cases.
In theory you could make a switch -- just as we have a 'disable images'
switch, we could make it a three-way control 'no images - low-resolution
images - high-resolution images'.

[And just to screw with people, Windows 8 / Windows RT is going with 1.4x
and 1.8x scaling factors instead of the 1.5x and 2.0x that Android and iOS
-- and Windows Phone 7 -- use. Fun huh! We're probably not going to have
exact scaled versions of them, they'll get the 1.5 or 2.0 and scale it down
a little probably.]

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

MZMcBride-2
In reply to this post by Ariel Glenn WMF
Ariel T. Glenn wrote:

> So it's time to have this discussion again.  At least, I think we're
> having it again, though I could not find previous threads on this list
> about the subject.
>
> In short, scaled media is currently generated on the fly for any size
> and for any user.  The resulting files are kept around forever or until
> we run perilously short of space, at which point we make some guesses
> about what we can toss and then do a mass purge. Last time we did so, we
> had the rotation bug going at the same time, which made for a real fine
> mess.
>
> A little bit of crunching shows me that we have about 6 million images
> in use on the projects, and yet we manage to have around 130 million
> thumbnails.  Just for fun I checked to see how many thumbs each image
> has, what sizes we are looking at, etc.  Here's the results.

Only really tangentially related, but I remember thinking when reading this
thread: are there any pages (on wikitech.wikimedia.org or elsewhere) that
document Wikimedia's current media infrastructure? It's always been a bit of
a mystery to me.

MZMcBride



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

Aude-2
In reply to this post by Jon Robson
On Wed, Sep 5, 2012 at 6:40 PM, Jon Robson <[hidden email]> wrote:

> I just wanted to clarify something... is there any protection in place in
> the thumbnail generator to prevent denial of service attacks? For instance
> if someone wanted to they could run a script which uploaded photos then
> fired off requests for thumbnails of it of size 20px,21px,22px...1024px
>
> I'm guessing the servers wouldn't like that. This is why I'd be keen to
> limit the sizes.
>

The ability to request an image of whatever size I need is one of my most
favorite MediaWiki features.  It's very nice to save the extra step of
resizing it after downloading it.  It makes Commons images all the more
easily reusable.

It's quite nice to also be able to have thumbnails of whatever size you
want on Wikipedia, overriding the typical size settings.

I'd be fine if we can throttle such requests to prevent DOS and maybe other
technical measures to make the feature less abused. But would be sad to see
it eliminated.

While it's also nice to hotlink to the images (or via InstantCommons), some
expiry on thumbnails might be acceptable.

Cheers,
Katie



>
> May I suggest someone analyses the sizes currently used on wikipedia and we
> limit to those as an initial step and then review the less frequently used
> ones and standardise on some sizes?
> On Sep 5, 2012 9:15 AM, "Roan Kattouw" <[hidden email]> wrote:
>
> > On Sun, Sep 2, 2012 at 5:59 PM, Tim Starling <[hidden email]>
> > wrote:
> > > The other reason for the existence of the backend thumbnail store is
> > > to transport images from the thumbnail scalers to the 404 handler. For
> > > that purpose, the image only needs to exist in the backend for a few
> > > seconds. It could be replaced by a better 404 handler, that sends
> > > thumbnails directly by HTTP. Maybe the Swift one does that already.
> > >
> > My understanding is that thumb.php already streamed the thumbnail back
> > to the 404 handler via HTTP and has done so for at least the past two
> > years or so.
> >
> > Roan
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
Board member, Wikimedia District of Columbia
http://wikimediadc.org
@wikimediadc / @wikimania2012
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

Jon Robson
Is there a bug open for this yet? If not there probably should be...
(apologies if there is.. scanning through I cannot see one)

In terms of supporting non-standard files - there is no reason why to get
an obscure size e.g. 224px you could get for example the 240px image and
resize it with css...

On Tue, Sep 18, 2012 at 7:01 PM, aude <[hidden email]> wrote:

> On Wed, Sep 5, 2012 at 6:40 PM, Jon Robson <[hidden email]> wrote:
>
> > I just wanted to clarify something... is there any protection in place in
> > the thumbnail generator to prevent denial of service attacks? For
> instance
> > if someone wanted to they could run a script which uploaded photos then
> > fired off requests for thumbnails of it of size 20px,21px,22px...1024px
> >
> > I'm guessing the servers wouldn't like that. This is why I'd be keen to
> > limit the sizes.
> >
>
> The ability to request an image of whatever size I need is one of my most
> favorite MediaWiki features.  It's very nice to save the extra step of
> resizing it after downloading it.  It makes Commons images all the more
> easily reusable.
>
> It's quite nice to also be able to have thumbnails of whatever size you
> want on Wikipedia, overriding the typical size settings.
>
> I'd be fine if we can throttle such requests to prevent DOS and maybe other
> technical measures to make the feature less abused. But would be sad to see
> it eliminated.
>
> While it's also nice to hotlink to the images (or via InstantCommons), some
> expiry on thumbnails might be acceptable.
>
> Cheers,
> Katie
>
>
>
> >
> > May I suggest someone analyses the sizes currently used on wikipedia and
> we
> > limit to those as an initial step and then review the less frequently
> used
> > ones and standardise on some sizes?
> > On Sep 5, 2012 9:15 AM, "Roan Kattouw" <[hidden email]> wrote:
> >
> > > On Sun, Sep 2, 2012 at 5:59 PM, Tim Starling <[hidden email]>
> > > wrote:
> > > > The other reason for the existence of the backend thumbnail store is
> > > > to transport images from the thumbnail scalers to the 404 handler.
> For
> > > > that purpose, the image only needs to exist in the backend for a few
> > > > seconds. It could be replaced by a better 404 handler, that sends
> > > > thumbnails directly by HTTP. Maybe the Swift one does that already.
> > > >
> > > My understanding is that thumb.php already streamed the thumbnail back
> > > to the 404 handler via HTTP and has done so for at least the past two
> > > years or so.
> > >
> > > Roan
> > >
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
>
> --
> Board member, Wikimedia District of Columbia
> http://wikimediadc.org
> @wikimediadc / @wikimania2012
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
Jon Robson
http://jonrobson.me.uk
@rakugojon
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

Aude-2
On Wed, Sep 19, 2012 at 7:02 AM, Jon Robson <[hidden email]> wrote:

> Is there a bug open for this yet? If not there probably should be...
> (apologies if there is.. scanning through I cannot see one)
>
> In terms of supporting non-standard files - there is no reason why to get
> an obscure size e.g. 224px you could get for example the 240px image and
> resize it with css...
>
>
1) That adds an unnecessary extra step to reuse images.

2) Not every reuse case involves CSS.

3) Although it's not that super complicated to do in CSS, not every reuser
knows CSS.

4) Some browsers do a poor job at rescaling images, although other browsers
have improved in this area.

If anything, I think in the download button / dialog in Commons, we should
have an option to allow user to choose image of any size to download, in
addition to the preset choices. :)    The thumbnails can be temporary I
suppose, and hope no one uses them to hotlink.  (my humble opinion!)

Cheers,
Katie


> On Tue, Sep 18, 2012 at 7:01 PM, aude <[hidden email]> wrote:
>
> > On Wed, Sep 5, 2012 at 6:40 PM, Jon Robson <[hidden email]> wrote:
> >
> > > I just wanted to clarify something... is there any protection in place
> in
> > > the thumbnail generator to prevent denial of service attacks? For
> > instance
> > > if someone wanted to they could run a script which uploaded photos then
> > > fired off requests for thumbnails of it of size 20px,21px,22px...1024px
> > >
> > > I'm guessing the servers wouldn't like that. This is why I'd be keen to
> > > limit the sizes.
> > >
> >
> > The ability to request an image of whatever size I need is one of my most
> > favorite MediaWiki features.  It's very nice to save the extra step of
> > resizing it after downloading it.  It makes Commons images all the more
> > easily reusable.
> >
> > It's quite nice to also be able to have thumbnails of whatever size you
> > want on Wikipedia, overriding the typical size settings.
> >
> > I'd be fine if we can throttle such requests to prevent DOS and maybe
> other
> > technical measures to make the feature less abused. But would be sad to
> see
> > it eliminated.
> >
> > While it's also nice to hotlink to the images (or via InstantCommons),
> some
> > expiry on thumbnails might be acceptable.
> >
> > Cheers,
> > Katie
> >
> >
> >
> > >
> > > May I suggest someone analyses the sizes currently used on wikipedia
> and
> > we
> > > limit to those as an initial step and then review the less frequently
> > used
> > > ones and standardise on some sizes?
> > > On Sep 5, 2012 9:15 AM, "Roan Kattouw" <[hidden email]> wrote:
> > >
> > > > On Sun, Sep 2, 2012 at 5:59 PM, Tim Starling <
> [hidden email]>
> > > > wrote:
> > > > > The other reason for the existence of the backend thumbnail store
> is
> > > > > to transport images from the thumbnail scalers to the 404 handler.
> > For
> > > > > that purpose, the image only needs to exist in the backend for a
> few
> > > > > seconds. It could be replaced by a better 404 handler, that sends
> > > > > thumbnails directly by HTTP. Maybe the Swift one does that already.
> > > > >
> > > > My understanding is that thumb.php already streamed the thumbnail
> back
> > > > to the 404 handler via HTTP and has done so for at least the past two
> > > > years or so.
> > > >
> > > > Roan
> > > >
> > > > _______________________________________________
> > > > Wikitech-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > >
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> >
> >
> >
> > --
> > Board member, Wikimedia District of Columbia
> > http://wikimediadc.org
> > @wikimediadc / @wikimania2012
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
>
> --
> Jon Robson
> http://jonrobson.me.uk
> @rakugojon
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
Board member, Wikimedia District of Columbia
http://wikimediadc.org
@wikimediadc / @wikimania2012
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

David Gerard-2
On 19 September 2012 09:20, aude <[hidden email]> wrote:

> If anything, I think in the download button / dialog in Commons, we should
> have an option to allow user to choose image of any size to download, in
> addition to the preset choices. :)    The thumbnails can be temporary I
> suppose, and hope no one uses them to hotlink.  (my humble opinion!)


Arbitrary-sized thumbnails are a much-used feature, and I don't think
this feature should be removed.

The Commons reuse guide [1] notes that hotlinking thumbnails is
allowed, but it's a terrible idea and you should either store the
image locally or use InstantCommons (which works wonderfully).

As I noted, this thread was started with Ariel warning space was
getting low on the image server. Removing much-used functionality when
you could just remove unused images still strikes me as a weird
response.


- d.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

David Gerard-2
On 19 September 2012 09:25, David Gerard <[hidden email]> wrote:

> The Commons reuse guide [1] notes that hotlinking thumbnails is
> allowed, but it's a terrible idea and you should either store the
> image locally or use InstantCommons (which works wonderfully).

[1] https://commons.wikimedia.org/wiki/Commons:Reusing_content_outside_Wikimedia#Hotlinking_or_InstantCommons


- d.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

Eric Sun-4
Maybe I'm doing it wrong, but it seems the way to request thumbnails has
changed, at least for SVGs.

For example, for
http://upload.wikimedia.org/wikipedia/commons/3/3e/Flag_of_New_Zealand.svg

http://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Flag_of_New_Zealand.svg/720px-Flag_of_New_Zealand.svg

used to work, but no longer.

Now, adding .png

http://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Flag_of_New_Zealand.svg/720px-Flag_of_New_Zealand.svg.png

works.

Is this expected?  Can I rely on this going forward?

Thanks,
Eric



On Wed, Sep 19, 2012 at 1:29 AM, David Gerard <[hidden email]> wrote:

> On 19 September 2012 09:25, David Gerard <[hidden email]> wrote:
>
> > The Commons reuse guide [1] notes that hotlinking thumbnails is
> > allowed, but it's a terrible idea and you should either store the
> > image locally or use InstantCommons (which works wonderfully).
>
> [1]
> https://commons.wikimedia.org/wiki/Commons:Reusing_content_outside_Wikimedia#Hotlinking_or_InstantCommons
>
>
> - d.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

Brion Vibber
In reply to this post by Aude-2
On Wed, Sep 19, 2012 at 1:20 AM, aude <[hidden email]> wrote:

> On Wed, Sep 19, 2012 at 7:02 AM, Jon Robson <[hidden email]> wrote:
> > In terms of supporting non-standard files - there is no reason why to get
> > an obscure size e.g. 224px you could get for example the 240px image and
> > resize it with css...
> >
> >
> 1) That adds an unnecessary extra step to reuse images.
>

Not necessarily -- if you simply plop the image into an <img src="..."
width="..." height="..."> -- and that's what you should usually be doing
anyway -- then receiving an image that's not actually the requested size
will resize it.

That should actually be what you already get when you request a thumbnail
larger than the size of the original.


> 4) Some browsers do a poor job at rescaling images, although other browsers
> have improved in this area.
>

Still true, though most of em are pretty good these days.


> If anything, I think in the download button / dialog in Commons, we should
> have an option to allow user to choose image of any size to download, in
> addition to the preset choices. :)    The thumbnails can be temporary I
> suppose, and hope no one uses them to hotlink.  (my humble opinion!)
>

I'd usually be content with manually sizing to my perfect dimensions from
the original source, probably, but that is a nice shortcut. :)

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

Mark Bergsma-2
In reply to this post by Asher Feldman
To revive this old thread...

On Sep 5, 2012, at 9:35 PM, Asher Feldman <[hidden email]> wrote:

> On Tue, Sep 4, 2012 at 3:11 PM, Platonides <[hidden email]> wrote:
>
>> On 03/09/12 02:59, Tim Starling wrote:
>>> I'll go for option 4. You can't delete the images from the backend
>>> while they are still in Squid, because then they would not be purged
>>> when the image is updated or action=purge is requested. In fact, that
>>> is one of only two reasons for the existence of the backend thumbnail
>>> store on Wikimedia. The thumbnail backend could be replaced by a text
>>> file that stores a list of thumbnail filenames which were sent to
>>> Squid within a window equivalent to the expiry time sent in the
>>> Cache-Control header.
>>> -- Tim Starling
>>
>> The second one seems easy to fix. The first one should IMHO be fixed in
>> squid/varnish by allowing wildcard purges (ie. PURGE
>> /wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0)

> fast.ly  implements group purge for varnish like this via a proxy daemon
> that watches backend responses for a "tag" response header (i.e. all
> resolutions of Tim_starling.jpg would be tagged that) and builds an
> in-memory hash of tags->objects which can be purged on.  I've been told
> they'd probably open source the code for us if we want it, and it is
> interesting (especially to deal with the fact that we don't purge articles
> at all of their possible url's) albeit with its own challenges.  If we
> implemented a backend system to track thumbnails that exist for a given
> orig, we may be able to remove our dependency on swift container listings
> to purge images, paving the way for a second class of thumbnails that are
> only cached.


How about this idea:

Just "purge all images with this prefix" doesn't really work in Squid or Varnish, because they don't store their cache database in a format that makes it cheap to determine which objects would match that. Varnish could do it with their "bans", but each ban is kept around for a long time, and with the tens, sometimes hundreds of purges a second we do, this would quickly add up to a massive ban list.

But... Varnish allows you to customize how it hashes objects into its object hash table (vcl_hash). What we could do, is hash thumbnails to the same hash key as their original. Because of our current URL structure, that's pretty much a matter of stripping off the thumbnail postfix. Then the original and all its associated thumbnails end up at the same hash key in the hash table, and only a single purge for the original would nuke them all out of the cache.

This relies on Varnish having an efficient implementation for multiple objects at a single hash key. It probably does, since it implements Vary processing this way. We would essentially be doing the same, Vary-ing on the thumbnail size. But I'll check the implementation to be sure.

Of course this won't work for Squid, but I'm pretty close to being able to replace Squid by Varnish entirely for upload.

--
Mark Bergsma <[hidden email]>
Lead Operations Architect
Wikimedia Foundation





_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: scaled media (thumbs) as *temporary* files, not stored forever

Mark Bergsma-2

On Oct 24, 2012, at 11:36 AM, Mark Bergsma <[hidden email]> wrote:
> How about this idea:
>
> Just "purge all images with this prefix" doesn't really work in Squid or Varnish, because they don't store their cache database in a format that makes it cheap to determine which objects would match that. Varnish could do it with their "bans", but each ban is kept around for a long time, and with the tens, sometimes hundreds of purges a second we do, this would quickly add up to a massive ban list.
>
> But... Varnish allows you to customize how it hashes objects into its object hash table (vcl_hash). What we could do, is hash thumbnails to the same hash key as their original. Because of our current URL structure, that's pretty much a matter of stripping off the thumbnail postfix. Then the original and all its associated thumbnails end up at the same hash key in the hash table, and only a single purge for the original would nuke them all out of the cache.
>
> This relies on Varnish having an efficient implementation for multiple objects at a single hash key. It probably does, since it implements Vary processing this way. We would essentially be doing the same, Vary-ing on the thumbnail size. But I'll check the implementation to be sure.


I checked, and Varnish stores all variant objects in a linked list per hash table entry. So once it looks up the hash entry for the URL of the original, it'll have to do a linear search for the right thumbnail size, matching each against a Vary header string. If we do this, we'll need to restrict the number of variants (thumb sizes) so we don't get hundreds/thousands on a single hash key.

Here's a little proof of concept to demonstrate how it could work:

        https://gerrit.wikimedia.org/r/#/c/29805/2

--
Mark Bergsma <[hidden email]>
Lead Operations Architect
Wikimedia Foundation





_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
123