Maintain up-to-date mirror

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Maintain up-to-date mirror

Svavar Kjarrval
Hi.

Recently I've been toying with writing code to automatically list
articles in certain wikiprojects based on certain criteria. An example
would be usage of certain templates or spelling error detection. To
enable the code to detect when articles have been "fixed" in a
relatively fast manner, I'd need to keep the database updated using a
greater interval than the XML dumps can provide. Then I thought of the
mediawiki API. What methods do you think are the most suited for the task?

With regards,
Svavar Kjarrval


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Maintain up-to-date mirror

Petr Onderka
That depends on what exactly you need.
If you want article wikitext, then you should use something like action=query&prop=revisions&rvprop=content.
If you want usage of some template, that would be action=query&prop=templates (probably together with tltemplates=TheTemplateYoureLookingFor).
If you want something else, there's probably another API module for that.

Petr Onderka
[[en:User:Svick]]


On Wed, Jul 17, 2013 at 3:58 AM, Svavar Kjarrval <[hidden email]> wrote:
Hi.

Recently I've been toying with writing code to automatically list
articles in certain wikiprojects based on certain criteria. An example
would be usage of certain templates or spelling error detection. To
enable the code to detect when articles have been "fixed" in a
relatively fast manner, I'd need to keep the database updated using a
greater interval than the XML dumps can provide. Then I thought of the
mediawiki API. What methods do you think are the most suited for the task?

With regards,
Svavar Kjarrval


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: Maintain up-to-date mirror

Svavar Kjarrval
Thanks for your response.

I was thinking about getting the wikitext and parsing it according to the criteria. Since I'm not only thinking about parsing for templates, I figured maintaining an updated database of wikitexts would be the best way to achieve that with minimum load on WMF servers. I already have a method for detecting templates in wikitext using the XML dumps but would like to have access to a more recent and regularly updated copy of the database.

To elaborate further I was also looking for pointers on how to keep the database updated, preferably via a method which wasn't very strict on a continuous update process. I see that the API does allow queries based on timestamps, for example, see if there are known balances on timestamp ranges vs. update frequency which would allow for catching up if something stops working for a short period of time. The wikiprojects I'm thinking about aren't as popular as the English Wikipedia although I would like the code to have no (significant) problems there either.

Are there any already-done solutions which I could adapt for that purpose (of updating the database)?

- Svavar Kjarrval

On 17/07/13 09:23, Petr Onderka wrote:
That depends on what exactly you need.
If you want article wikitext, then you should use something like action=query&prop=revisions&rvprop=content.
If you want usage of some template, that would be action=query&prop=templates (probably together with tltemplates=TheTemplateYoureLookingFor).
If you want something else, there's probably another API module for that.

Petr Onderka
[[en:User:Svick]]


On Wed, Jul 17, 2013 at 3:58 AM, Svavar Kjarrval <[hidden email]> wrote:
Hi.

Recently I've been toying with writing code to automatically list
articles in certain wikiprojects based on certain criteria. An example
would be usage of certain templates or spelling error detection. To
enable the code to detect when articles have been "fixed" in a
relatively fast manner, I'd need to keep the database updated using a
greater interval than the XML dumps can provide. Then I thought of the
mediawiki API. What methods do you think are the most suited for the task?

With regards,
Svavar Kjarrval


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

signature.asc (853 bytes) Download Attachment