Access to MediaWiki API for Yandex bot

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Access to MediaWiki API for Yandex bot

Остапук Наталья
Hello,
 
My name is Natalia, I'm a software developer at Yandex (https://www.yandex.ru). My team is building a large database, which contains objects from real world, their attributes and different types of relationships between them.
 
Wikipedia is one of our main data sources. Until recently we've been downloading data from it in a half-manual mode, but now we want to make it automatically with our bots. It works fine for all what we need except for API pages. As robots.txt states "Friendly, low-speed bots are welcome viewing article pages, but not dynamically-generated pages please."
 
Our bot uses crawl-delay so it won't bother you more often than you allow. Moreover, we are planning to make no more than 20-30 requests per 5 minutes.
 
Is there some way to add an exception to robots.txt for our bot?
 
Thank you for your time.
 
Best regards,
Ostapuk Natalia

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Access to MediaWiki API for Yandex bot

Adam Baso
Hi Ostapuk Natalia,

Thanks for writing the list. There may be some alternative approaches - I'll follow up with you off list.

-Adam



On Mon, Oct 24, 2016 at 4:11 AM, Остапук Наталья <[hidden email]> wrote:
Hello,
 
My name is Natalia, I'm a software developer at Yandex (https://www.yandex.ru). My team is building a large database, which contains objects from real world, their attributes and different types of relationships between them.
 
Wikipedia is one of our main data sources. Until recently we've been downloading data from it in a half-manual mode, but now we want to make it automatically with our bots. It works fine for all what we need except for API pages. As robots.txt states "Friendly, low-speed bots are welcome viewing article pages, but not dynamically-generated pages please."
 
Our bot uses crawl-delay so it won't bother you more often than you allow. Moreover, we are planning to make no more than 20-30 requests per 5 minutes.
 
Is there some way to add an exception to robots.txt for our bot?
 
Thank you for your time.
 
Best regards,
Ostapuk Natalia

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Loading...