Making Toolserver work - rate limited OSM

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Making Toolserver work - rate limited OSM

Marlen Caemmerer-3
Hello,

in the last days Toolserver experienced outages of web pages which were caused by too many queries from only a few hosts.
They are using OSM images and - please dont ask me why - single IPs tend to query about 40-50 pictures per second -  for minutes or hours, peaks can be worse.
At some points our web server give up then.
Yes sorry ;). I can proudly say that only today about 11.7 millions  web queries were answered somehow.

I tried to mitigate the problem of "too many requests per IP" via blocking but it is not an option.
One problem is that users of at least one portal then complain and another is that the IP addresses seem random - coming even from dial up ranges.
There might be something badly wrong with cache-control headers for the images (or probably we can tweak at that point) or - I dont know what it could be.


To make the long story short - I rate limited the OSM tile delivery to 40 images per second per IP - allowed burst is 55.
Users will then get a 503 error if the rate exceeds until it decreases - but delivery isnt stopped completely.

It seems to work since I have some notices which IPs were throtteled and these are IPs that have heavy usage.

I used this here to throttle: http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

I dont want to have this option configured forever - I rather hope we can do something about caching or give the pictures they need to the projects themselves (I doubt we have to deliver hill shading pictures for everyone - this is Toolserver)

If anyone has an idea what to do / questions - please let me know.

Cheers
  Marlen/nosy

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: [Maps-l] Making Toolserver work - rate limited OSM

Kai Krueger
Hello,

do you have any more details of which tile layers are getting hit? Is it
low or high zoom tiles? What referes / user-agents do they come from? Is
it the tiles that get served through mod_tile, the hillshading tiles or
the tiles for the wiki mini atlas?

Too high load from individual clients has been an issue on many other
tileservers as well. Mostly it comes from various mobile apps, that
offer their users to download large areas (e.g. Germany) for offline
use. These areas then cover potentially millions of tiles, that the
clients then try and download as fast as the connection allows.

For that reason, the tileservers on osm.org have a significant list of
user-agents that they block completely and in addition they also have an
automatic rate limiting per IP. There is also a specific tile usage
policy ( https://wiki.openstreetmap.org/wiki/Tile_usage_policy ) that
gouverns how you are allowed to technically access the tile servers
(once you have it downloaded, the use is freely gouverned by the
CC-BY-SA licence)

Other tileservers like the opencyclemap, equally have restrictions and
mod_tile (the apache module used to deliver tiles) has a number of
features available to limit traffic. mod_tile also has a complex system
to try and ensure maximum cachability of tiles while still ensuring
up-to-dateness. This system can furthermore be tuned either towards
fresshness or cacheability as needed.

My impression was so far this has never been an issue with the
toolserver and I wasn't aware of any explicit policies of how the
toolserver tiles are allowed to be accessed, so I never activated any of
the limiting features. But if it is becoming an issue we can see how
best to compat the issue.

At least on the munin graphs for ptolemy, I don't see much increased
load. But if it is the hillshading tiles, or the WMA tiles, those don't
get served through ptolemy as far as I am aware.

Kai



On 10/03/2013 03:08 PM, Marlen Caemmerer wrote:

> Hello,
>
> in the last days Toolserver experienced outages of web pages which were
> caused by too many queries from only a few hosts.
> They are using OSM images and - please dont ask me why - single IPs tend
> to query about 40-50 pictures per second -  for minutes or hours, peaks
> can be worse.
> At some points our web server give up then.
> Yes sorry ;). I can proudly say that only today about 11.7 millions  web
> queries were answered somehow.
>
> I tried to mitigate the problem of "too many requests per IP" via
> blocking but it is not an option. One problem is that users of at least
> one portal then complain and another is that the IP addresses seem
> random - coming even from dial up ranges.
> There might be something badly wrong with cache-control headers for the
> images (or probably we can tweak at that point) or - I dont know what it
> could be.
>
>
> To make the long story short - I rate limited the OSM tile delivery to
> 40 images per second per IP - allowed burst is 55.
> Users will then get a 503 error if the rate exceeds until it decreases -
> but delivery isnt stopped completely.
>
> It seems to work since I have some notices which IPs were throtteled and
> these are IPs that have heavy usage.
>
> I used this here to throttle:
> http://nginx.org/en/docs/http/ngx_http_limit_req_module.html
>
> I dont want to have this option configured forever - I rather hope we
> can do something about caching or give the pictures they need to the
> projects themselves (I doubt we have to deliver hill shading pictures
> for everyone - this is Toolserver)
>
> If anyone has an idea what to do / questions - please let me know.
>
> Cheers
>     Marlen/nosy
>
> _______________________________________________
> Maps-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/maps-l


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: [Maps-l] Making Toolserver work - rate limited OSM

Marlen Caemmerer-3
Hello,

Kai, thank you for your prompt and informative response.

On Thu, 3 Oct 2013, Kai Krueger wrote:

>
> do you have any more details of which tile layers are getting hit? Is it
> low or high zoom tiles? What referes / user-agents do they come from? Is
> it the tiles that get served through mod_tile, the hillshading tiles or
> the tiles for the wiki mini atlas?

I can send you some example log lines of the throttled IPs:

2013/10/04 08:11:30 [error] 28822#0: *53658093 limiting requests, excess: 55.240 by zone "hikebike", client: 213.73.96.44, server: toolserver.org, request: "GET /tiles/hikebike/15/17169/11177.png HTTP/1.1", host: "toolserver.org", referrer: "http://www.gpsies.com/map.do?fileId=gbcojbhrdfqlglbc"

2013/10/04 08:11:02 [error] 28822#0: *53650597 limiting requests, excess: 55.120 by zone "hikebike", client: 85.0.37.63, server: toolserver.org, request: "GET /tiles/
osm/3/4/0.png HTTP/1.1", host: "c.www.toolserver.org", referrer: "http://toolserver.org/~kolossos/openlayers/embed.html?layer=mapnik&bbox=39.68865498340449,43.5524339
5214844,39.778011016595514,43.614232047851566&marker=43.583333,39.733333

If you want to get more logs I'd send them to you in private.

It seems to relate especially hikebike/cmarq tools.

>
> Too high load from individual clients has been an issue on many other
> tileservers as well. Mostly it comes from various mobile apps, that
> offer their users to download large areas (e.g. Germany) for offline
> use. These areas then cover potentially millions of tiles, that the
> clients then try and download as fast as the connection allows.

Sounds plausible.

>
> For that reason, the tileservers on osm.org have a significant list of
> user-agents that they block completely and in addition they also have an
> automatic rate limiting per IP. There is also a specific tile usage
> policy ( https://wiki.openstreetmap.org/wiki/Tile_usage_policy ) that
> gouverns how you are allowed to technically access the tile servers
> (once you have it downloaded, the use is freely gouverned by the
> CC-BY-SA licence)
>
> Other tileservers like the opencyclemap, equally have restrictions and
> mod_tile (the apache module used to deliver tiles) has a number of
> features available to limit traffic. mod_tile also has a complex system
> to try and ensure maximum cachability of tiles while still ensuring
> up-to-dateness. This system can furthermore be tuned either towards
> fresshness or cacheability as needed.
>
> My impression was so far this has never been an issue with the
> toolserver and I wasn't aware of any explicit policies of how the
> toolserver tiles are allowed to be accessed, so I never activated any of
> the limiting features. But if it is becoming an issue we can see how
> best to compat the issue.

gpsies.com stated they use the cache-control header which is sometimes not set reasonably probably as far as I tried to see - i had a look at these hikebike URL delivered from cmarq.
The will look  at the problem closer on their side so I expect some more details in the next days.

I could set cache control headers in the nginx which acts as load balancer for TS for tiles where it makes sense.
Do you have any advices on this? Which tiles dont change for what time about?


>
> At least on the munin graphs for ptolemy, I don't see much increased
> load. But if it is the hillshading tiles, or the WMA tiles, those don't
> get served through ptolemy as far as I am aware.

Seems these tiles are delivered via ortelius/wolfsbane. Unfortunatelly the high load times lead to munin not graphing anymore so I dont have any accutal data but the error logs/loads via console.

>>
>> I dont want to have this option configured forever - I rather hope we
>> can do something about caching or give the pictures they need to the
>> projects themselves (I doubt we have to deliver hill shading pictures
>> for everyone - this is Toolserver)
>>



Cheers
  nosy

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: [Maps-l] Making Toolserver work - rate limited OSM

Kolossos-2
Hello,
the second line in the log had the referrer:
http://toolserver.org/~kolossos/openlayers/embed.html
This script is included inside geohack, so it's no wonder if a lot of
requests coming with this referrer from different IP's. But if a lot of
requests are coming only from one IP than perhaps a kid want to create
the next big search engine after Google. So it's correct to throttle
them. Perhaps we should also check our robot.txt.

Gpsie.com is a special topic, as they have many users they can have a
positive effect for OSM. They still use cached hillshading generated by
us, but there will be no updates necessary in the next time.
The tiles from hikebike-styles can change each minute, and it depends on
the user which delay is acceptable. For a mapper the hikbike map is the
only chance to see own modifications on hiking routes, so a delay of 10
minutes is the maximum. For a normal user a delay of a week or a month
should be no problem.

Greetings Tim alias Kolossos


Am 04.10.2013 23:39, schrieb Marlen Caemmerer:

> Hello,
>
> Kai, thank you for your prompt and informative response.
>
> On Thu, 3 Oct 2013, Kai Krueger wrote:
>
>>
>> do you have any more details of which tile layers are getting hit? Is it
>> low or high zoom tiles? What referes / user-agents do they come from? Is
>> it the tiles that get served through mod_tile, the hillshading tiles or
>> the tiles for the wiki mini atlas?
>
> I can send you some example log lines of the throttled IPs:
>
> 2013/10/04 08:11:30 [error] 28822#0: *53658093 limiting requests,
> excess: 55.240 by zone "hikebike", client: 213.73.96.44, server:
> toolserver.org, request: "GET /tiles/hikebike/15/17169/11177.png
> HTTP/1.1", host: "toolserver.org", referrer:
> "http://www.gpsies.com/map.do?fileId=gbcojbhrdfqlglbc"
>
> 2013/10/04 08:11:02 [error] 28822#0: *53650597 limiting requests,
> excess: 55.120 by zone "hikebike", client: 85.0.37.63, server:
> toolserver.org, request: "GET /tiles/
> osm/3/4/0.png HTTP/1.1", host: "c.www.toolserver.org", referrer:
> "http://toolserver.org/~kolossos/openlayers/embed.html?layer=mapnik&bbox=39.68865498340449,43.5524339
>
> 5214844,39.778011016595514,43.614232047851566&marker=43.583333,39.733333
>
> If you want to get more logs I'd send them to you in private.
>
> It seems to relate especially hikebike/cmarq tools.
>
>>
>> Too high load from individual clients has been an issue on many other
>> tileservers as well. Mostly it comes from various mobile apps, that
>> offer their users to download large areas (e.g. Germany) for offline
>> use. These areas then cover potentially millions of tiles, that the
>> clients then try and download as fast as the connection allows.
>
> Sounds plausible.
>
>>
>> For that reason, the tileservers on osm.org have a significant list of
>> user-agents that they block completely and in addition they also have an
>> automatic rate limiting per IP. There is also a specific tile usage
>> policy ( https://wiki.openstreetmap.org/wiki/Tile_usage_policy ) that
>> gouverns how you are allowed to technically access the tile servers
>> (once you have it downloaded, the use is freely gouverned by the
>> CC-BY-SA licence)
>>
>> Other tileservers like the opencyclemap, equally have restrictions and
>> mod_tile (the apache module used to deliver tiles) has a number of
>> features available to limit traffic. mod_tile also has a complex system
>> to try and ensure maximum cachability of tiles while still ensuring
>> up-to-dateness. This system can furthermore be tuned either towards
>> fresshness or cacheability as needed.
>>
>> My impression was so far this has never been an issue with the
>> toolserver and I wasn't aware of any explicit policies of how the
>> toolserver tiles are allowed to be accessed, so I never activated any of
>> the limiting features. But if it is becoming an issue we can see how
>> best to compat the issue.
>
> gpsies.com stated they use the cache-control header which is sometimes
> not set reasonably probably as far as I tried to see - i had a look at
> these hikebike URL delivered from cmarq.
> The will look  at the problem closer on their side so I expect some more
> details in the next days.
>
> I could set cache control headers in the nginx which acts as load
> balancer for TS for tiles where it makes sense. Do you have any advices
> on this? Which tiles dont change for what time about?
>
>
>>
>> At least on the munin graphs for ptolemy, I don't see much increased
>> load. But if it is the hillshading tiles, or the WMA tiles, those don't
>> get served through ptolemy as far as I am aware.
>
> Seems these tiles are delivered via ortelius/wolfsbane. Unfortunatelly
> the high load times lead to munin not graphing anymore so I dont have
> any accutal data but the error logs/loads via console.
>
>>>
>>> I dont want to have this option configured forever - I rather hope we
>>> can do something about caching or give the pictures they need to the
>>> projects themselves (I doubt we have to deliver hill shading pictures
>>> for everyone - this is Toolserver)
>>>
>
>
>
> Cheers
>     nosy
>
> _______________________________________________
> Toolserver-l mailing list ([hidden email])
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
> Posting guidelines for this list:
> https://wiki.toolserver.org/view/Mailing_list_etiquette


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: [Maps-l] Making Toolserver work - rate limited OSM

Kai Krueger
In reply to this post by Marlen Caemmerer-3
On 10/04/2013 03:39 PM, Marlen Caemmerer wrote:

> Hello,
>
> [...]
> gpsies.com stated they use the cache-control header which is sometimes
> not set reasonably probably as far as I tried to see - i had a look at
> these hikebike URL delivered from cmarq.
> The will look  at the problem closer on their side so I expect some more
> details in the next days.
>
> I could set cache control headers in the nginx which acts as load
> balancer for TS for tiles where it makes sense. Do you have any advices
> on this? Which tiles dont change for what time about?

Mod_tile has some "sophisticated" heuristics to try and set the cache
control headers. Hoever, since many years ago, the update schedule has
moved from weekly to minutely, it is impossible to know when exactly
which tiles get updated and hence the need for heuristics.

Currently mod_tile sets the cache control headers to somewhere between
15 minutes and 7 days.

The first distinction it uses is whether a tile is known to be "dirty".
I.e. the data for that tile has changed in the database, but the
rendering hasn't updated the corresponding tiles. In this case the
served tiles are known to be out-of-date, and a rather short 15 - 30
minutes cache-control header is set.

For clean tiles, it looks at when the tile was last modified, in the
assumption that tiles that have recently changed (e.g. because they are
in a high activity area) are more likely to get updated again soon than
tiles that e.g. haven't changed in the last year. In this case the
cache-control header varies between 1 and 7 days, depending on the age.

All of these heuristics can be changed in the apache config.

For toolserver, it probably is worth modifying the default values, as a
couple of the assumptions don't really hold.

E.g. the assumption for the short expiry for serving the dirty tiles is
that rendering might not quite succeed in the 2 - 3 seconds timeout for
serving stale tiles, but will finish shortly afterwards. Hoever due to
the slow rendering, the overloaded server and the very long queueus on
ptolemy, it might take days rather than seconds for the rendering
request in the queues to finally get processed. So having a
cache-control of only 15 minutes probably doesn't make sense in this case.

For clean tiles, a timeout of only 1 - 2 days might also make little
sense, if due to overload, it is unlikely that tiles can be re-rendered
in a faster pace than that anyway.

So perhaps it would make sense to simply set the cache-control headers
to always e.g. 7 days. Or at least a minimum of 1 day.

Kai

>
>
>>
>> At least on the munin graphs for ptolemy, I don't see much increased
>> load. But if it is the hillshading tiles, or the WMA tiles, those don't
>> get served through ptolemy as far as I am aware.
>
> Seems these tiles are delivered via ortelius/wolfsbane. Unfortunatelly
> the high load times lead to munin not graphing anymore so I dont have
> any accutal data but the error logs/loads via console.
>
>>>
>>> I dont want to have this option configured forever - I rather hope we
>>> can do something about caching or give the pictures they need to the
>>> projects themselves (I doubt we have to deliver hill shading pictures
>>> for everyone - this is Toolserver)
>>>
>
>
>
> Cheers
>     nosy
>
> _______________________________________________
> Maps-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/maps-l


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette