empty extract field with exintro=True

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

empty extract field with exintro=True

Bertel Teilfeldt Hansen
Hi Mediawiki-api mailing listers!

I'm trying to get the intro to a list of Wikipedia pages using the "extracts" property with "exintro=True". This works fine for most sites, but for a few of them the API returns an empty extract field. See for example:

When looking at the page "https://en.wikipedia.org/wiki/Anthem" there definitely seems to be text before the first section, so I think I should be getting something. Indeed without the "exintro" parameter, I get the expected return.

Any idea why this occurs?

Best,

Bertel


_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: empty extract field with exintro=True

Brad Jorsch (Anomie)
On Sat, Aug 24, 2019 at 7:02 AM Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Mediawiki-api mailing listers!

I'm trying to get the intro to a list of Wikipedia pages using the "extracts" property with "exintro=True". This works fine for most sites, but for a few of them the API returns an empty extract field. See for example:

When looking at the page "https://en.wikipedia.org/wiki/Anthem" there definitely seems to be text before the first section, so I think I should be getting something. Indeed without the "exintro" parameter, I get the expected return.

Any idea why this occurs?

"exintro" assumes that the first heading tag (<h1> to <h6>) indicates the end of the intro. In the HTML of that page, the {{TOC_Right}} causes the table of contents to be before the visible text, and the table of contents includes an <h2>, so it chops it off there.


--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Reply | Threaded
Open this post in threaded view
|

Re: empty extract field with exintro=True

Bertel Teilfeldt Hansen
Hi Brad,

Sorry, for some reason I didn't see your email till just now : / But thank you for that reply, it makes total sense. Although it is probably not the desired behavior.

Keep up the radical Wiki work - love the site!

Best,

Bertel

Den man. 26. aug. 2019 kl. 16.59 skrev Brad Jorsch (Anomie) <[hidden email]>:
On Sat, Aug 24, 2019 at 7:02 AM Bertel Teilfeldt Hansen <[hidden email]> wrote:
Hi Mediawiki-api mailing listers!

I'm trying to get the intro to a list of Wikipedia pages using the "extracts" property with "exintro=True". This works fine for most sites, but for a few of them the API returns an empty extract field. See for example:

When looking at the page "https://en.wikipedia.org/wiki/Anthem" there definitely seems to be text before the first section, so I think I should be getting something. Indeed without the "exintro" parameter, I get the expected return.

Any idea why this occurs?

"exintro" assumes that the first heading tag (<h1> to <h6>) indicates the end of the intro. In the HTML of that page, the {{TOC_Right}} causes the table of contents to be before the visible text, and the table of contents includes an <h2>, so it chops it off there.


--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

_______________________________________________
Mediawiki-api mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api