How to get full list of references for a Wiki revision through the API?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

How to get full list of references for a Wiki revision through the API?

Bertel Teilfeldt Hansen
Hi Mediawiki-api mailing listers!

I'm trying to extract the full list of references for different revisions of Wikipedia pages. This seemed like it would be easy enough with "prop=references", but I keep getting the following error:

"error": {
        "code": "citestoragedisabled",
        "info": "Cite extension reference storage is not enabled.",
        "*": "See https://en.wikipedia.org/w/api.php for API usage."
    }

Even the example on the API's documentation page returns this error (https://en.wikipedia.org/w/api.php?action=help&modules=query%2Breferences). 

Why does this occur? And is there no way of getting references through the API?

Thanks! 

Best wishes,

Bertel





_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: How to get full list of references for a Wiki revision through the API?

Brad Jorsch (Anomie)
On Tue, Dec 20, 2016 at 1:18 PM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Mediawiki-api mailing listers!

I'm trying to extract the full list of references for different revisions of Wikipedia pages. This seemed like it would be easy enough with "prop=references", but I keep getting the following error:

"error": {
        "code": "citestoragedisabled",
        "info": "Cite extension reference storage is not enabled.",
        "*": "See https://en.wikipedia.org/w/api.php for API usage."
    }

Even the example on the API's documentation page returns this error (https://en.wikipedia.org/w/api.php?action=help&modules=query%2Breferences). 

Why does this occur? And is there no way of getting references through the API?
As the message says, it occurs when reference storage is not enabled (i.e. when $wgCiteStoreReferencesData is false).

Even if reference storage is enabled, note that prop=references always returns the references for the current revision of the page. It does not provide references for old revisions.

--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: How to get full list of references for a Wiki revision through the API?

Gergo Tisza
In reply to this post by Bertel Teilfeldt Hansen
On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
And is there no way of getting references through the API? 

There is no nice way, but you can always get the HTML (or the parse tree, depending on whether you want parsed or raw refs) and process it; references are not hard to extract. For the wikitext version, there is a python tool: https://github.com/mediawiki-utilities/python-mwrefs

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: How to get full list of references for a Wiki revision through the API?

Bertel Teilfeldt Hansen
Hi Brad and Gergo,

Thanks for your responses!

@Brad: Yeah, that was also my impression, but I wasn't sure. Seemed strange that the example in the official docs would point to a place where the feature was disabled. Thank you for clearing that up!

@Gergo: I've been looking at action=parse, but as far as I understand it, it is limited to one revision per API request, which makes it quite slow to get a bunch of older revisions from a large number of articles. action=query&prop=revisions&rvprop=content omits the references from the output (just gives the string "{{reflist}}" after "References"). "mvrefs" sounds very promising, though! I will definitely check that out - thank you!

Best,

Bertel

2016-12-20 19:51 GMT+01:00 Gergo Tisza <[hidden email]>:
On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
And is there no way of getting references through the API? 

There is no nice way, but you can always get the HTML (or the parse tree, depending on whether you want parsed or raw refs) and process it; references are not hard to extract. For the wikitext version, there is a python tool: https://github.com/mediawiki-utilities/python-mwrefs

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: How to get full list of references for a Wiki revision through the API?

Gabriel Wicke-3
Bertel, another option is to use the REST API:

Hope this helps,

Gabriel

On Wed, Dec 21, 2016 at 3:20 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Brad and Gergo,

Thanks for your responses!

@Brad: Yeah, that was also my impression, but I wasn't sure. Seemed strange that the example in the official docs would point to a place where the feature was disabled. Thank you for clearing that up!

@Gergo: I've been looking at action=parse, but as far as I understand it, it is limited to one revision per API request, which makes it quite slow to get a bunch of older revisions from a large number of articles. action=query&prop=revisions&rvprop=content omits the references from the output (just gives the string "{{reflist}}" after "References"). "mvrefs" sounds very promising, though! I will definitely check that out - thank you!

Best,

Bertel

2016-12-20 19:51 GMT+01:00 Gergo Tisza <[hidden email]>:
On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
And is there no way of getting references through the API? 

There is no nice way, but you can always get the HTML (or the parse tree, depending on whether you want parsed or raw refs) and process it; references are not hard to extract. For the wikitext version, there is a python tool: https://github.com/mediawiki-utilities/python-mwrefs

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: How to get full list of references for a Wiki revision through the API?

Bertel Teilfeldt Hansen
Hi Gabriel,

The REST API looks promising - thank you!

Having played around with it a bit, I seem to only be able to get one revision per request. Is that correct, or am I doing something wrong? My project requires every revision and its references from a large number of articles, so that would make a lot of requests. The regular API allows for multiple revisions per request (only with action=query, though).

Thanks!

Bertel







2016-12-21 17:01 GMT+01:00 Gabriel Wicke <[hidden email]>:
Bertel, another option is to use the REST API:

Hope this helps,

Gabriel

On Wed, Dec 21, 2016 at 3:20 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Brad and Gergo,

Thanks for your responses!

@Brad: Yeah, that was also my impression, but I wasn't sure. Seemed strange that the example in the official docs would point to a place where the feature was disabled. Thank you for clearing that up!

@Gergo: I've been looking at action=parse, but as far as I understand it, it is limited to one revision per API request, which makes it quite slow to get a bunch of older revisions from a large number of articles. action=query&prop=revisions&rvprop=content omits the references from the output (just gives the string "{{reflist}}" after "References"). "mvrefs" sounds very promising, though! I will definitely check that out - thank you!

Best,

Bertel

2016-12-20 19:51 GMT+01:00 Gergo Tisza <[hidden email]>:
On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
And is there no way of getting references through the API? 

There is no nice way, but you can always get the HTML (or the parse tree, depending on whether you want parsed or raw refs) and process it; references are not hard to extract. For the wikitext version, there is a python tool: https://github.com/mediawiki-utilities/python-mwrefs

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: How to get full list of references for a Wiki revision through the API?

Gabriel Wicke-3
Bertel,

On Mon, Jan 2, 2017 at 7:40 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Gabriel,

The REST API looks promising - thank you!

Having played around with it a bit, I seem to only be able to get one revision per request. Is that correct, or am I doing something wrong?


this is correct. The requests themselves are quite cheap, and can be parallelized up to rate limit set out in the API documentation.

 
My project requires every revision and its references from a large number of articles, so that would make a lot of requests. The regular API allows for multiple revisions per request (only with action=query, though).


There is a caveat here in that we currently don't store all revisions for all articles. This means that requests for really old revisions will trigger a more expensive on-demand parse, just as with the action API. Can you say more about the number of articles you are targeting, and how this list is selected? Regarding the selection, I am mainly wondering if you are targeting especially frequently edited articles.

Thanks,

Gabriel
 

Thanks!

Bertel







2016-12-21 17:01 GMT+01:00 Gabriel Wicke <[hidden email]>:
Bertel, another option is to use the REST API:

Hope this helps,

Gabriel

On Wed, Dec 21, 2016 at 3:20 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Brad and Gergo,

Thanks for your responses!

@Brad: Yeah, that was also my impression, but I wasn't sure. Seemed strange that the example in the official docs would point to a place where the feature was disabled. Thank you for clearing that up!

@Gergo: I've been looking at action=parse, but as far as I understand it, it is limited to one revision per API request, which makes it quite slow to get a bunch of older revisions from a large number of articles. action=query&prop=revisions&rvprop=content omits the references from the output (just gives the string "{{reflist}}" after "References"). "mvrefs" sounds very promising, though! I will definitely check that out - thank you!

Best,

Bertel

2016-12-20 19:51 GMT+01:00 Gergo Tisza <[hidden email]>:
On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
And is there no way of getting references through the API? 

There is no nice way, but you can always get the HTML (or the parse tree, depending on whether you want parsed or raw refs) and process it; references are not hard to extract. For the wikitext version, there is a python tool: https://github.com/mediawiki-utilities/python-mwrefs

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: How to get full list of references for a Wiki revision through the API?

Bertel Teilfeldt Hansen
Hi Gabriel,

Oh yeah, I see now that the REST api doesn't mind parallel requests. I was going off of the etiquette section in the documentation for the other api (https://www.mediawiki.org/wiki/API:Etiquette). That one prefers requests in series.

Ah, ok - that caveat is actually quite relevant for my project. It requires all revisions of certain pages along with all revisions of their talk pages (along with a bunch of other stuff). So perhaps the REST api is not for me. I am not targeting especially frequently edited articles specifically; rather, I'm look at articles related to particular real-world conflicts (international and civil wars). I am a postdoc at Copenhagen University funded by the Danish government (grant information at the bottom of this page: http://ufm.dk/en/research-and-innovation/funding-programmes-for-research-and-innovation/who-has-received-funding/2015/postdoc-grants-from-the-danish-council-for-independent-research-social-sciences-february-2015?set_language=en&cl=en). Let me know if you want more identification or anything.

I actually have another question about the REST api, if that's ok. I'm using it to get page views over time for the pages that I'm interested in. However, the data don't seem to stretch very far back in time - is that correct? And if so, is there a better way of getting page views (short of using the raw files at https://dumps.wikimedia.org/other/pagecounts-raw/)?

Thanks for your help so far!

Bertel




2017-01-03 19:25 GMT+01:00 Gabriel Wicke <[hidden email]>:
Bertel,

On Mon, Jan 2, 2017 at 7:40 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Gabriel,

The REST API looks promising - thank you!

Having played around with it a bit, I seem to only be able to get one revision per request. Is that correct, or am I doing something wrong?


this is correct. The requests themselves are quite cheap, and can be parallelized up to rate limit set out in the API documentation.

 
My project requires every revision and its references from a large number of articles, so that would make a lot of requests. The regular API allows for multiple revisions per request (only with action=query, though).


There is a caveat here in that we currently don't store all revisions for all articles. This means that requests for really old revisions will trigger a more expensive on-demand parse, just as with the action API. Can you say more about the number of articles you are targeting, and how this list is selected? Regarding the selection, I am mainly wondering if you are targeting especially frequently edited articles.

Thanks,

Gabriel
 

Thanks!

Bertel







2016-12-21 17:01 GMT+01:00 Gabriel Wicke <[hidden email]>:
Bertel, another option is to use the REST API:

Hope this helps,

Gabriel

On Wed, Dec 21, 2016 at 3:20 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Brad and Gergo,

Thanks for your responses!

@Brad: Yeah, that was also my impression, but I wasn't sure. Seemed strange that the example in the official docs would point to a place where the feature was disabled. Thank you for clearing that up!

@Gergo: I've been looking at action=parse, but as far as I understand it, it is limited to one revision per API request, which makes it quite slow to get a bunch of older revisions from a large number of articles. action=query&prop=revisions&rvprop=content omits the references from the output (just gives the string "{{reflist}}" after "References"). "mvrefs" sounds very promising, though! I will definitely check that out - thank you!

Best,

Bertel

2016-12-20 19:51 GMT+01:00 Gergo Tisza <[hidden email]>:
On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
And is there no way of getting references through the API? 

There is no nice way, but you can always get the HTML (or the parse tree, depending on whether you want parsed or raw refs) and process it; references are not hard to extract. For the wikitext version, there is a python tool: https://github.com/mediawiki-utilities/python-mwrefs

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: How to get full list of references for a Wiki revision through the API?

Gabriel Wicke-3
On Thu, Jan 5, 2017 at 6:20 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Gabriel,

Oh yeah, I see now that the REST api doesn't mind parallel requests. I was going off of the etiquette section in the documentation for the other api (https://www.mediawiki.org/wiki/API:Etiquette). That one prefers requests in series.

Ah, ok - that caveat is actually quite relevant for my project. It requires all revisions of certain pages along with all revisions of their talk pages (along with a bunch of other stuff). So perhaps the REST api is not for me. I am not targeting especially frequently edited articles specifically; rather, I'm look at articles related to particular real-world conflicts (international and civil wars). I am a postdoc at Copenhagen University funded by the Danish government (grant information at the bottom of this page: http://ufm.dk/en/research-and-innovation/funding-programmes-for-research-and-innovation/who-has-received-funding/2015/postdoc-grants-from-the-danish-council-for-independent-research-social-sciences-february-2015?set_language=en&cl=en). Let me know if you want more identification or anything.


My concern was mainly about the overall volume of uncached requests. It sounds like you are interested in is a fairly small subset of overall pages, so I think this should be fine. Perhaps don't max out the parallelism in this case. In any case, making the same requests to the action API will result in even more on-demand parses, as only the very latest revision is cached in that case.
 

I actually have another question about the REST api, if that's ok. I'm using it to get page views over time for the pages that I'm interested in. However, the data don't seem to stretch very far back in time - is that correct? And if so, is there a better way of getting page views (short of using the raw files at https://dumps.wikimedia.org/other/pagecounts-raw/)?

Yes, the pageview API is relatively new, and only has recent data at this point. I am not certain if the analytics team plans to back-fill more historic data over time. I vaguely remember that there might be difficulties with changes in what is considered a pageview, so the numbers might not be completely comparable. I cc'ed Nuria and Dan from the analytics team, who should be able to speak to this.


 
Thanks for your help so far!

Bertel




2017-01-03 19:25 GMT+01:00 Gabriel Wicke <[hidden email]>:
Bertel,

On Mon, Jan 2, 2017 at 7:40 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Gabriel,

The REST API looks promising - thank you!

Having played around with it a bit, I seem to only be able to get one revision per request. Is that correct, or am I doing something wrong?


this is correct. The requests themselves are quite cheap, and can be parallelized up to rate limit set out in the API documentation.

 
My project requires every revision and its references from a large number of articles, so that would make a lot of requests. The regular API allows for multiple revisions per request (only with action=query, though).


There is a caveat here in that we currently don't store all revisions for all articles. This means that requests for really old revisions will trigger a more expensive on-demand parse, just as with the action API. Can you say more about the number of articles you are targeting, and how this list is selected? Regarding the selection, I am mainly wondering if you are targeting especially frequently edited articles.

Thanks,

Gabriel
 

Thanks!

Bertel







2016-12-21 17:01 GMT+01:00 Gabriel Wicke <[hidden email]>:
Bertel, another option is to use the REST API:

Hope this helps,

Gabriel

On Wed, Dec 21, 2016 at 3:20 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Brad and Gergo,

Thanks for your responses!

@Brad: Yeah, that was also my impression, but I wasn't sure. Seemed strange that the example in the official docs would point to a place where the feature was disabled. Thank you for clearing that up!

@Gergo: I've been looking at action=parse, but as far as I understand it, it is limited to one revision per API request, which makes it quite slow to get a bunch of older revisions from a large number of articles. action=query&prop=revisions&rvprop=content omits the references from the output (just gives the string "{{reflist}}" after "References"). "mvrefs" sounds very promising, though! I will definitely check that out - thank you!

Best,

Bertel

2016-12-20 19:51 GMT+01:00 Gergo Tisza <[hidden email]>:
On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
And is there no way of getting references through the API? 

There is no nice way, but you can always get the HTML (or the parse tree, depending on whether you want parsed or raw refs) and process it; references are not hard to extract. For the wikitext version, there is a python tool: https://github.com/mediawiki-utilities/python-mwrefs

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: How to get full list of references for a Wiki revision through the API?

Bertel Teilfeldt Hansen
Ok, so if I wanted to get the entire revision history for a page, it would actually be cheaper to run parallel requests on the REST api than to do multi-revision requests on the action api? I can get up to 50 revisions per request on the action api and I kinda assumed that would be cheaper than getting only one for each request.

Thanks for the cc! Looking forward to hearing from them!

Best,

Bertel

2017-01-05 17:36 GMT+01:00 Gabriel Wicke <[hidden email]>:
On Thu, Jan 5, 2017 at 6:20 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Gabriel,

Oh yeah, I see now that the REST api doesn't mind parallel requests. I was going off of the etiquette section in the documentation for the other api (https://www.mediawiki.org/wiki/API:Etiquette). That one prefers requests in series.

Ah, ok - that caveat is actually quite relevant for my project. It requires all revisions of certain pages along with all revisions of their talk pages (along with a bunch of other stuff). So perhaps the REST api is not for me. I am not targeting especially frequently edited articles specifically; rather, I'm look at articles related to particular real-world conflicts (international and civil wars). I am a postdoc at Copenhagen University funded by the Danish government (grant information at the bottom of this page: http://ufm.dk/en/research-and-innovation/funding-programmes-for-research-and-innovation/who-has-received-funding/2015/postdoc-grants-from-the-danish-council-for-independent-research-social-sciences-february-2015?set_language=en&cl=en). Let me know if you want more identification or anything.


My concern was mainly about the overall volume of uncached requests. It sounds like you are interested in is a fairly small subset of overall pages, so I think this should be fine. Perhaps don't max out the parallelism in this case. In any case, making the same requests to the action API will result in even more on-demand parses, as only the very latest revision is cached in that case.
 

I actually have another question about the REST api, if that's ok. I'm using it to get page views over time for the pages that I'm interested in. However, the data don't seem to stretch very far back in time - is that correct? And if so, is there a better way of getting page views (short of using the raw files at https://dumps.wikimedia.org/other/pagecounts-raw/)?

Yes, the pageview API is relatively new, and only has recent data at this point. I am not certain if the analytics team plans to back-fill more historic data over time. I vaguely remember that there might be difficulties with changes in what is considered a pageview, so the numbers might not be completely comparable. I cc'ed Nuria and Dan from the analytics team, who should be able to speak to this.


 
Thanks for your help so far!

Bertel




2017-01-03 19:25 GMT+01:00 Gabriel Wicke <[hidden email]>:
Bertel,

On Mon, Jan 2, 2017 at 7:40 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Gabriel,

The REST API looks promising - thank you!

Having played around with it a bit, I seem to only be able to get one revision per request. Is that correct, or am I doing something wrong?


this is correct. The requests themselves are quite cheap, and can be parallelized up to rate limit set out in the API documentation.

 
My project requires every revision and its references from a large number of articles, so that would make a lot of requests. The regular API allows for multiple revisions per request (only with action=query, though).


There is a caveat here in that we currently don't store all revisions for all articles. This means that requests for really old revisions will trigger a more expensive on-demand parse, just as with the action API. Can you say more about the number of articles you are targeting, and how this list is selected? Regarding the selection, I am mainly wondering if you are targeting especially frequently edited articles.

Thanks,

Gabriel
 

Thanks!

Bertel







2016-12-21 17:01 GMT+01:00 Gabriel Wicke <[hidden email]>:
Bertel, another option is to use the REST API:

Hope this helps,

Gabriel

On Wed, Dec 21, 2016 at 3:20 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Brad and Gergo,

Thanks for your responses!

@Brad: Yeah, that was also my impression, but I wasn't sure. Seemed strange that the example in the official docs would point to a place where the feature was disabled. Thank you for clearing that up!

@Gergo: I've been looking at action=parse, but as far as I understand it, it is limited to one revision per API request, which makes it quite slow to get a bunch of older revisions from a large number of articles. action=query&prop=revisions&rvprop=content omits the references from the output (just gives the string "{{reflist}}" after "References"). "mvrefs" sounds very promising, though! I will definitely check that out - thank you!

Best,

Bertel

2016-12-20 19:51 GMT+01:00 Gergo Tisza <[hidden email]>:
On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen <[hidden email]> wrote:
And is there no way of getting references through the API? 

There is no nice way, but you can always get the HTML (or the parse tree, depending on whether you want parsed or raw refs) and process it; references are not hard to extract. For the wikitext version, there is a python tool: https://github.com/mediawiki-utilities/python-mwrefs

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api



_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api