best practice for rate limits for accessing the English Wikipedia with the API

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

best practice for rate limits for accessing the English Wikipedia with the API

rdhyee
Hi everyone,

I plan to be making a fair number of read access calls to the Wikipedia
API over the next several months and would like to know what the best
practices for efficient, fast access that doesn't hog resources.  I've
found that having a single thread that makes a call and waits for a
response before making the next call has been extremely reliable (much
more so
than basically any other web API I've used before).   What I'd like to
do make my application multithreaded for reading from the Wikipedia and
make simultaneous calls to the Wikipedia (since the speed of my
application is limited by the rate at which I can read from the Wikipedia.)

I have the following questions:

1) What limits should I observe in terms of number of calls I make per
second and how many calls I should have going simultaneously?

2) How would I know when I'm accessing the API too quickly or too
often?  I read at http://www.mediawiki.org/wiki/API:Errors_and_warnings 
that there is ratelimited  error message, but so far, I've not seen that
error myself.  If I don't get a ratelimited error, does that mean I'm
doing ok with respect to being a good API citizen.

3) Even if I am requiring read access, should I identify myself
explicitly to the API by logging in for the read access -- so that I can
be contacted should there be a problem?

4) Does it make sense to try to obtain bot privileges (even for read
only access)?  My understanding is that bots get access to larger
payload in some API calls.

Note: since I'm looking at recent changes to the Wikipedia, downloading
a data dump of the Wikipeida to work on doesn't help me.

Thanks,

-Raymond Yee

(User:RaymondYee)

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: best practice for rate limits for accessing the English Wikipedia with the API

Max Semenik
On 17.11.2010, 16:39 Raymond wrote:

> Hi everyone,

> I plan to be making a fair number of read access calls to the Wikipedia
> API over the next several months and would like to know what the best
> practices for efficient, fast access that doesn't hog resources.  I've
> found that having a single thread that makes a call and waits for a
> response before making the next call has been extremely reliable (much
> more so
> than basically any other web API I've used before).   What I'd like to
> do make my application multithreaded for reading from the Wikipedia and
> make simultaneous calls to the Wikipedia (since the speed of my
> application is limited by the rate at which I can read from the Wikipedia.)

> I have the following questions:

> 1) What limits should I observe in terms of number of calls I make per
> second

Per Domas, fewer requests with larger limits is better.

> and how many calls I should have going simultaneously?

One.

> 2) How would I know when I'm accessing the API too quickly or too
> often?  I read at
> http://www.mediawiki.org/wiki/API:Errors_and_warnings 
> that there is ratelimited  error message, but so far, I've not seen that
> error myself.  If I don't get a ratelimited error, does that mean I'm
> doing ok with respect to being a good API citizen.

Rate limits are for editing and logging in only.

> 3) Even if I am requiring read access, should I identify myself
> explicitly to the API by logging in for the read access -- so that I can
> be contacted should there be a problem?

Logging in can help you to get a higher limit if you're a sysop or
bot. However, identifying you with user-agent header is much more
important.

> 4) Does it make sense to try to obtain bot privileges (even for read
> only access)?  My understanding is that bots get access to larger
> payload in some API calls.

See above. If your bot will generate a significant load, it's always
better to consult the sysadmins and the bot approvals group (for English
Wikipedia). The more details you provide, the more precise will be the
answer.


--
Best regards,
  Max Semenik ([[User:MaxSem]])


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: best practice for rate limits for accessing the English Wikipedia with the API

b-jorsch
On Wed, Nov 17, 2010 at 05:44:57PM +0300, Max Semenik wrote:

> On 17.11.2010, 16:39 Raymond wrote:
>
> > 2) How would I know when I'm accessing the API too quickly or too
> > often?  I read at
> > http://www.mediawiki.org/wiki/API:Errors_and_warnings 
> > that there is ratelimited  error message, but so far, I've not seen that
> > error myself.  If I don't get a ratelimited error, does that mean I'm
> > doing ok with respect to being a good API citizen.
>
> Rate limits are for editing and logging in only.

Also, use and handle maxlag=5 on all queries so your bot will
automatically pause when the database servers are overloaded. See
http://www.mediawiki.org/wiki/Manual:Maxlag_parameter

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: best practice for rate limits for accessing the English Wikipedia with the API

rdhyee
I use mwclient, which, I think, handles maxlag
(https://fisheye.toolserver.org/changelog/Bryan/mwclient/trunk/errors.py?cs=310).  
I'm also looking into using mw-peachy, which
also seems to implement the maxlag parameter
(http://www.google.com/codesearch?q=maxlag+package:http://mw-peachy\.googlecode\.com&origq=maxlag&btnG=Search+Trunk)

Am I understanding mwclient's and Peachy's implementation correctly?

Thanks,
-Raymond

On 11/17/10 10:07 AM, Brad Jorsch wrote:

> On Wed, Nov 17, 2010 at 05:44:57PM +0300, Max Semenik wrote:
>> On 17.11.2010, 16:39 Raymond wrote:
>>
>>> 2) How would I know when I'm accessing the API too quickly or too
>>> often?  I read at
>>> http://www.mediawiki.org/wiki/API:Errors_and_warnings
>>> that there is ratelimited  error message, but so far, I've not seen that
>>> error myself.  If I don't get a ratelimited error, does that mean I'm
>>> doing ok with respect to being a good API citizen.
>> Rate limits are for editing and logging in only.
> Also, use and handle maxlag=5 on all queries so your bot will
> automatically pause when the database servers are overloaded. See
> http://www.mediawiki.org/wiki/Manual:Maxlag_parameter


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: best practice for rate limits for accessing the English Wikipedia with the API

Bryan Tong Minh
On Wed, Nov 17, 2010 at 4:21 PM, Raymond Yee <[hidden email]> wrote:
> I use mwclient, which, I think, handles maxlag
> (https://fisheye.toolserver.org/changelog/Bryan/mwclient/trunk/errors.py?cs=310).
mwclient handles maxlag correctly. However, you should used the latest
version that can be found on sourceforge at http://mwclient.sf.net/

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: best practice for rate limits for accessing the English Wikipedia with the API

rdhyee
Bryan, I want to say how much I really like using mwclient!

-Raymond

On 11/17/10 10:28 AM, Bryan Tong Minh wrote:

> On Wed, Nov 17, 2010 at 4:21 PM, Raymond Yee<[hidden email]>  wrote:
>> I use mwclient, which, I think, handles maxlag
>> (https://fisheye.toolserver.org/changelog/Bryan/mwclient/trunk/errors.py?cs=310).
> mwclient handles maxlag correctly. However, you should used the latest
> version that can be found on sourceforge at http://mwclient.sf.net/
>
> _______________________________________________
> Mediawiki-api mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: best practice for rate limits for accessing the English Wikipedia with the API

Bryan Tong Minh
On Wed, Nov 17, 2010 at 4:36 PM, Raymond Yee <[hidden email]> wrote:
> Bryan, I want to say how much I really like using mwclient!
I appreciate it :)

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: best practice for rate limits for accessing the English Wikipedia with the API

Platonides
In reply to this post by rdhyee
Raymond Yee wrote:
> 4) Does it make sense to try to obtain bot privileges (even for read
> only access)?  My understanding is that bots get access to larger
> payload in some API calls.

See if the api modules you use has higher limits for bots.


> Note: since I'm looking at recent changes to the Wikipedia, downloading
> a data dump of the Wikipeida to work on doesn't help me.

If you are fetching the list of recent changes, you should consider
having your bot listening on irc://irc.wikimedia.org/en.wikipedia and
reacting to the events there.

If you're concerned just with a subset of edits, you could reduce your
requests to almost nothing. If you need some extra processing, there may
not be much difference (since enwiki has constant changes).

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: best practice for rate limits for accessing the English Wikipedia with the API

Soxred93@gmail.com
In reply to this post by rdhyee
Peachy also does handle maxlag correctly.

-X!

On Nov 17, 2010, at 10:21 AM, Raymond Yee wrote:

> I use mwclient, which, I think, handles maxlag
> (https://fisheye.toolserver.org/changelog/Bryan/mwclient/trunk/errors.py?cs=310).  
> I'm also looking into using mw-peachy, which
> also seems to implement the maxlag parameter
> (http://www.google.com/codesearch?q=maxlag+package:http://mw-peachy\.googlecode\.com&origq=maxlag&btnG=Search+Trunk)
>
> Am I understanding mwclient's and Peachy's implementation correctly?
>
> Thanks,
> -Raymond
>
> On 11/17/10 10:07 AM, Brad Jorsch wrote:
>> On Wed, Nov 17, 2010 at 05:44:57PM +0300, Max Semenik wrote:
>>> On 17.11.2010, 16:39 Raymond wrote:
>>>
>>>> 2) How would I know when I'm accessing the API too quickly or too
>>>> often?  I read at
>>>> http://www.mediawiki.org/wiki/API:Errors_and_warnings
>>>> that there is ratelimited  error message, but so far, I've not seen that
>>>> error myself.  If I don't get a ratelimited error, does that mean I'm
>>>> doing ok with respect to being a good API citizen.
>>> Rate limits are for editing and logging in only.
>> Also, use and handle maxlag=5 on all queries so your bot will
>> automatically pause when the database servers are overloaded. See
>> http://www.mediawiki.org/wiki/Manual:Maxlag_parameter
>
>
> _______________________________________________
> Mediawiki-api mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api