BREAKING CHANGE: limits for prop= modules

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

BREAKING CHANGE: limits for prop= modules

Roan Kattouw
As of r37270 [1], the prop=links, prop=templatelinks, prop=langlinks, prop=extlinks, prop=categories and prop=images modules have limit and continue parameters, just like the list= modules do. This means they no longer list all links/templates/whatever on a page (which can be bad for
the database), so you'll have to use the old query-continue song to get them all. Note that this doesn't hurt backwards compatibility, as pre-r37270 versions of MediaWiki will just give you all results at once,
without a <query-continue>. If you really must know in advance whether these prop= modules are limited, use action=paraminfo:

api.php?action=paraminfo&querymodules=links

If the limits aren't in effect, the limit and continue parameters won't be listed (if you get an error about paraminfo being an invalid action, the limits are also not in effect, of course, since the introduction of these limits predates the introduction of action=paraminfo by a long time).

Roan Kattouw (Catrope)

[1] http://svn.wikimedia.org/viewvc/mediawiki/?view=rev&revision=37270


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Mirroring Wikipedia through the API?

Dirk Riehle-2
Hi,

looking through the API: http://en.wikipedia.org/w/api.php I can't find
any way to get at the actual page contents. Is this correct?

I assume this is deliberate to avoid Wikipedia mirroring or the like? I
remember discussions where WP mirroring was frowned upon. While I can
see that folks may not like it if someone uses Wikipedia to make money
through Adsense I don't quite understand how you can prevent it (you
can't, given the GFDL, I think).

More importantly, I can think of many legitimate uses of Wikpedia where
someone wants to mirror it and enhance the functionality. I can envision
many users who are better served through specific apps in front of
Wikipedia than passive contents added to WP like bots do it. And the WMF
is unlikely to write all these apps, nor will it want to operate them I
assume. How is that handled? Simply not allowed?

Finally, and that's why I'm sending this email to this mailing list: How
does Powerset do this: Go to powerset.com, search for something you
might find in Wikipedia, and see how it provides an uptodate
(click-through) copy of the Wikipedia page. My hunch is that the they
use a database dump for search and then screen-scrape, or is there a
better explanation?

Thanks,
Dirk

--
Phone: + 1 (650) 215 3459, Web: http://www.riehle.org


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: Mirroring Wikipedia through the API?

MinuteElectron
Dirk Riehle wrote:
> looking through the API: http://en.wikipedia.org/w/api.php I can't find
> any way to get at the actual page contents. Is this correct?

I'm not sure how to do this via the API (I believe there is a way
though), but you can use action=raw on index.php e.g. [1].

> I assume this is deliberate to avoid Wikipedia mirroring or the like? I
> remember discussions where WP mirroring was frowned upon. While I can
> see that folks may not like it if someone uses Wikipedia to make money
> through Adsense I don't quite understand how you can prevent it (you
> can't, given the GFDL, I think).
>
> More importantly, I can think of many legitimate uses of Wikpedia where
> someone wants to mirror it and enhance the functionality. I can envision
> many users who are better served through specific apps in front of
> Wikipedia than passive contents added to WP like bots do it. And the WMF
> is unlikely to write all these apps, nor will it want to operate them I
> assume. How is that handled? Simply not allowed?

It is not permitted as it puts stress on the database servers, you can
purchase a live feed though, see [2].

> Finally, and that's why I'm sending this email to this mailing list: How
> does Powerset do this: Go to powerset.com, search for something you
> might find in Wikipedia, and see how it provides an uptodate
> (click-through) copy of the Wikipedia page. My hunch is that the they
> use a database dump for search and then screen-scrape, or is there a
> better explanation?

One would assume they use a database dump, which contains the page text,
and simply parse the wikitext for information.  The database dump can
also be used for searching and other tasks (nearly everything the API
can do, and more) when imported into a wiki - which they would be able
to do.  It is also possible they could be using a live feed to stay up
to date.  Screen scraping is not necessary for this.

[1] http://en.wikipedia.org/w/index.php?action=raw&title=Main_Page
[2] http://meta.wikimedia.org/wiki/Wikimedia_update_feed_service

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: Mirroring Wikipedia through the API?

Stephen Bain
In reply to this post by Dirk Riehle-2
On Sun, Jul 13, 2008 at 5:52 AM, Dirk Riehle <[hidden email]> wrote:
>
> looking through the API: http://en.wikipedia.org/w/api.php I can't find
> any way to get at the actual page contents. Is this correct?

Sure there is, you use something like this:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Main%20Page&rvlimit=1&rvprop=content

That's the contents of the current revision of the Main Page.

Search for "prop=revisions" in the API documentation.

--
Stephen Bain
[hidden email]

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: Mirroring Wikipedia through the API?

Roan Kattouw
In reply to this post by Dirk Riehle-2
Dirk Riehle schreef:
> Hi,
>
> looking through the API: http://en.wikipedia.org/w/api.php I can't find
> any way to get at the actual page contents. Is this correct?
>  
Like someone else said before me, you can get unparsed wikitext through
prop=revisions&rvprop=content . If you want HTML without the sidebar and
all that, you can get it with

index.php?action=render&title=Foo
> Finally, and that's why I'm sending this email to this mailing list: How
> does Powerset do this: Go to powerset.com, search for something you
> might find in Wikipedia, and see how it provides an uptodate
> (click-through) copy of the Wikipedia page. My hunch is that the they
> use a database dump for search and then screen-scrape, or is there a
> better explanation?
You can actually search Wikipedia through the API:

http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=Foo&srwhat=text&srlimit=5
(gets a list of 5 pages containing 'Foo')

http://en.wikipedia.org/w/api.php?action=query&generator=search&gsrsearch=Foo&gsrwhat=text&&gsrlimit=5prop=revisions&rvprop=content
(gets the contents of those pages)

You can also get search suggestions in the OpenSearch format with
http://en.wikipedia.org/w/api.php?action=opensearch&search=Te

Roan Kattouw (Catrope)

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api