Thumbnail image hinting in articles via metadata tags

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Thumbnail image hinting in articles via metadata tags

James Pearson
Hello.

I'm a developer at reddit, and this morning I was looking into how we
generate thumbnail images from Wikipedia (and I suppose all Mediawiki)
articles, due to a user report that they had an unexpected thumbnail on
their submission[0].

You can read my post there for more details, but essentially since we can't
find an og:image or other similar metadata tag hinting to us what we should
use for a thumbnail, we iterate through all the linked images on the page
and pull out the largest one (you can view the code online if you're
curious[1]).  While this works reasonably well as a general heuristic,
Wikipedia articles often have some more structure that could give us a
better image to use.

While writing this, I recalled that the Wikipedia Android app displays
thumbnails in its search results.  I think that's pulling from OpenSearch
with the PageImages extension[2]? but I haven't really delved into that
yet.  I'm curious how those images get pulled - if it's taking into account
infoboxes or such, or just the first image on the page, or what.

Would it be feasible to include an og:image tag on pages for which we have
a reasonable guess as to the thumbnail?  Open Graph[3] is supported by what
seems anecdotally to me to be a wide range of services, so good hints there
would improve thumbnails for links on not just reddit, but Facebook,
Twitter, various chat clients, I think several Wordpress plugins, etc.

Thanks,
 - P

[0]:
https://www.reddit.com/r/bugs/comments/317n1v/thumbnail_acquisition_from_wikipedia_went_haywire/
[1]:
https://github.com/reddit/reddit/blob/master/r2/r2/lib/media.py#L485-L542
[2]: https://www.mediawiki.org/wiki/API:Opensearch
[3]: http://ogp.me/
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

Max Semenik
On Thu, Apr 2, 2015 at 11:35 AM, James Pearson <[hidden email]> wrote:

> While writing this, I recalled that the Wikipedia Android app displays
> thumbnails in its search results.  I think that's pulling from OpenSearch
> with the PageImages extension[2]? but I haven't really delved into that
> yet.  I'm curious how those images get pulled - if it's taking into account
> infoboxes or such, or just the first image on the page, or what.
>

It uses a scoring system that takes position on page, size and w:h ratio
into account.


> Would it be feasible to include an og:image tag on pages for which we have
> a reasonable guess as to the thumbnail?  Open Graph[3] is supported by what
> seems anecdotally to me to be a wide range of services, so good hints there
> would improve thumbnails for links on not just reddit, but Facebook,
> Twitter, various chat clients, I think several Wordpress plugins, etc.
>

https://phabricator.wikimedia.org/T33338



--
Best regards,
Max Semenik ([[User:MaxSem]])
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

Jon Robson
Hi James! Thanks for mailing out.

On the subject of OG tags,  we have a bunch of bugs around explicitly
using og tags ([1] for example)
This thread is very enlightening: [2]

tldr: Essentially I think many of us, myself included, would like to
add 'og:image' tags but the Wikipedia community as a whole is not 100%
sure this is aligned with the mission.

I wonder if adopting http://schema.org would be a less controversial
move and help you towards your goal.

[1] https://phabricator.wikimedia.org/T32113
[2] http://www.gossamer-threads.com/lists/wiki/wikitech/545421

On Thu, Apr 2, 2015 at 11:45 AM, Max Semenik <[hidden email]> wrote:

> On Thu, Apr 2, 2015 at 11:35 AM, James Pearson <[hidden email]> wrote:
>
>> While writing this, I recalled that the Wikipedia Android app displays
>> thumbnails in its search results.  I think that's pulling from OpenSearch
>> with the PageImages extension[2]? but I haven't really delved into that
>> yet.  I'm curious how those images get pulled - if it's taking into account
>> infoboxes or such, or just the first image on the page, or what.
>>
>
> It uses a scoring system that takes position on page, size and w:h ratio
> into account.
>
>
>> Would it be feasible to include an og:image tag on pages for which we have
>> a reasonable guess as to the thumbnail?  Open Graph[3] is supported by what
>> seems anecdotally to me to be a wide range of services, so good hints there
>> would improve thumbnails for links on not just reddit, but Facebook,
>> Twitter, various chat clients, I think several Wordpress plugins, etc.
>>
>
> https://phabricator.wikimedia.org/T33338
>
>
>
> --
> Best regards,
> Max Semenik ([[User:MaxSem]])
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Jon Robson
* http://jonrobson.me.uk
* https://www.facebook.com/jonrobson
* @rakugojon

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

James Pearson
Thanks, it looks like I have some reading to do.
 - P

On Thu, Apr 2, 2015 at 11:58 AM, Jon Robson <[hidden email]> wrote:

> Hi James! Thanks for mailing out.
>
> On the subject of OG tags,  we have a bunch of bugs around explicitly
> using og tags ([1] for example)
> This thread is very enlightening: [2]
>
> tldr: Essentially I think many of us, myself included, would like to
> add 'og:image' tags but the Wikipedia community as a whole is not 100%
> sure this is aligned with the mission.
>
> I wonder if adopting http://schema.org would be a less controversial
> move and help you towards your goal.
>
> [1] https://phabricator.wikimedia.org/T32113
> [2] http://www.gossamer-threads.com/lists/wiki/wikitech/545421
>
> On Thu, Apr 2, 2015 at 11:45 AM, Max Semenik <[hidden email]>
> wrote:
> > On Thu, Apr 2, 2015 at 11:35 AM, James Pearson <[hidden email]> wrote:
> >
> >> While writing this, I recalled that the Wikipedia Android app displays
> >> thumbnails in its search results.  I think that's pulling from
> OpenSearch
> >> with the PageImages extension[2]? but I haven't really delved into that
> >> yet.  I'm curious how those images get pulled - if it's taking into
> account
> >> infoboxes or such, or just the first image on the page, or what.
> >>
> >
> > It uses a scoring system that takes position on page, size and w:h ratio
> > into account.
> >
> >
> >> Would it be feasible to include an og:image tag on pages for which we
> have
> >> a reasonable guess as to the thumbnail?  Open Graph[3] is supported by
> what
> >> seems anecdotally to me to be a wide range of services, so good hints
> there
> >> would improve thumbnails for links on not just reddit, but Facebook,
> >> Twitter, various chat clients, I think several Wordpress plugins, etc.
> >>
> >
> > https://phabricator.wikimedia.org/T33338
> >
> >
> >
> > --
> > Best regards,
> > Max Semenik ([[User:MaxSem]])
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Jon Robson
> * http://jonrobson.me.uk
> * https://www.facebook.com/jonrobson
> * @rakugojon
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

S Page-3
In reply to this post by James Pearson
On Thu, Apr 2, 2015 at 11:35 AM, James Pearson <[hidden email]> wrote:

>  I recalled that the Wikipedia Android app displays
> thumbnails in its search results.  I think that's pulling from OpenSearch
> with the PageImages extension[2]? but I haven't really delved into that
> yet.  I'm curious how those images get pulled - if it's taking into account
> infoboxes or such, or just the first image on the page, or what.
>

I'm writing an article on that subject. It hasn't been reviewed, but it's
interesting.  Your feedback is welcome.
https://www.mediawiki.org/wiki/API:Page_info_in_search_results

--
=S Page  WMF Tech writer
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

Brian Wolff
In reply to this post by Jon Robson
On Apr 2, 2015 2:58 PM, "Jon Robson" <[hidden email]> wrote:

>
> Hi James! Thanks for mailing out.
>
> On the subject of OG tags,  we have a bunch of bugs around explicitly
> using og tags ([1] for example)
> This thread is very enlightening: [2]
>
> tldr: Essentially I think many of us, myself included, would like to
> add 'og:image' tags but the Wikipedia community as a whole is not 100%
> sure this is aligned with the mission.

What's the actual objection? Our community (or at least a significant and
very vocal portion of it) does not want gaudy share buttons for various
reasons, but i dont recall anyone objecting to adding metadata to allow
automatic identification of the primary image of an article.

--bawolff
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

MZMcBride-2
Brian Wolff wrote:

>On Apr 2, 2015 2:58 PM, "Jon Robson" <[hidden email]> wrote:
>>On the subject of OG tags, we have a bunch of bugs around explicitly
>> using og tags ([1] for example)
>> This thread is very enlightening: [2]
>>
>> tldr: Essentially I think many of us, myself included, would like to
>> add 'og:image' tags but the Wikipedia community as a whole is not 100%
>> sure this is aligned with the mission.
>
>What's the actual objection? Our community (or at least a significant and
>very vocal portion of it) does not want gaudy share buttons for various
>reasons, but i dont recall anyone objecting to adding metadata to allow
>automatic identification of the primary image of an article.

There's related discussion at <https://phabricator.wikimedia.org/T64811>.

Whether it's Twitter, Facebook, or some future social site, MediaWiki and
Wikimedia need to figure out what a reasonable level of support that we're
willing to offer each of these services looks like, in my opinion.

MZMcBride



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

Brian Wolff
On Apr 2, 2015 11:02 PM, "MZMcBride" <[hidden email]> wrote:

>
> Brian Wolff wrote:
> >On Apr 2, 2015 2:58 PM, "Jon Robson" <[hidden email]> wrote:
> >>On the subject of OG tags, we have a bunch of bugs around explicitly
> >> using og tags ([1] for example)
> >> This thread is very enlightening: [2]
> >>
> >> tldr: Essentially I think many of us, myself included, would like to
> >> add 'og:image' tags but the Wikipedia community as a whole is not 100%
> >> sure this is aligned with the mission.
> >
> >What's the actual objection? Our community (or at least a significant and
> >very vocal portion of it) does not want gaudy share buttons for various
> >reasons, but i dont recall anyone objecting to adding metadata to allow
> >automatic identification of the primary image of an article.
>
> There's related discussion at <https://phabricator.wikimedia.org/T64811>.
>
> Whether it's Twitter, Facebook, or some future social site, MediaWiki and
> Wikimedia need to figure out what a reasonable level of support that we're
> willing to offer each of these services looks like, in my opinion.
>
> MZMcBride
>
>

I agree that we should not add every propriety meta tag that ever happens
to exist.

However there is clearly a desire to be able to identify a representitive
image for an article. This need is exhibited across many websites including
reddit, facebook, google plus, etc, but also our own site as noted by the
page images extension for mobile. Its clear there are multiple parties that
want to be able to accurately extract such information progmatically from
any arbitrary website on the internet. I would argue supporting this use
case is not a Wikipedia issue, but a MediaWiki issue.

We should research which meta data scheme is the most de-facto standard for
declaring this sort of information (whether that be open graph or schema.org
or something else) and implement it (and only 1. Implenting this 10
different ways would be silly).

In many ways i think this is similar to rss feeds (a specific piece of info
multiple people want, with somewhat competing standards to implement it)

--bawolff
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

Stas Malyshev
In reply to this post by Max Semenik
Hi!

>> Would it be feasible to include an og:image tag on pages for which we have
>> a reasonable guess as to the thumbnail?  Open Graph[3] is supported by what
>> seems anecdotally to me to be a wide range of services, so good hints there
>> would improve thumbnails for links on not just reddit, but Facebook,
>> Twitter, various chat clients, I think several Wordpress plugins, etc.
>>
>
> https://phabricator.wikimedia.org/T33338

I wonder if this can somehow be connected to Wikidata's image attribute
(https://www.wikidata.org/wiki/Property:P18).

--
Stas Malyshev
[hidden email]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

Daniel Friesen-2
In reply to this post by Brian Wolff
On 2015-04-02 8:44 PM, Brian Wolff wrote:

> However there is clearly a desire to be able to identify a representitive
> image for an article. This need is exhibited across many websites including
> reddit, facebook, google plus, etc, but also our own site as noted by the
> page images extension for mobile. Its clear there are multiple parties that
> want to be able to accurately extract such information progmatically from
> any arbitrary website on the internet. I would argue supporting this use
> case is not a Wikipedia issue, but a MediaWiki issue.
>
> We should research which meta data scheme is the most de-facto standard for
> declaring this sort of information (whether that be open graph or schema.org
> or something else) and implement it (and only 1. Implenting this 10
> different ways would be silly).

Facebook exclusively supports Open Graph.

Google+ recommends schema.org microdata and uses Open Graph.

Twitter exclusively uses their proprietary Twitter cards markup ( <meta
name="twitter:card" content="summary" /> ...) and requires you to
validate and submit your site for approval before they'll display cards.

Reddit uses embed.ly, which is supposed to support a variety of Open
Graph, oEmbed, etc...

Bing uses schema.org and Open Graph but states that they "currently only
[use] this information to enhance the visual display of search results
of a limited number of publishers". Bing just uses everything it can,
Microdata, Microformats, RDFa, etc...

Google uses schema.org in microdata, RDFa, and JSON-LD formats for rich
data (I'm not sure if they bother with page level metadata at all,
standard HTML title and meta description generally covers what they output).

----

So my opinion would be to support Open Graph, optionally add some
schema.org,
and screw Twitter and their unwillingness to play nice with attempts to
standardize metadata.

We should also consider oEmbed where it makes sense.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

Jon Robson-2
On Thu, Apr 2, 2015 at 10:04 PM, Daniel Friesen
<[hidden email]> wrote:

> On 2015-04-02 8:44 PM, Brian Wolff wrote:
>> However there is clearly a desire to be able to identify a representitive
>> image for an article. This need is exhibited across many websites including
>> reddit, facebook, google plus, etc, but also our own site as noted by the
>> page images extension for mobile. Its clear there are multiple parties that
>> want to be able to accurately extract such information progmatically from
>> any arbitrary website on the internet. I would argue supporting this use
>> case is not a Wikipedia issue, but a MediaWiki issue.
>>
>> We should research which meta data scheme is the most de-facto standard for
>> declaring this sort of information (whether that be open graph or schema.org
>> or something else) and implement it (and only 1. Implenting this 10
>> different ways would be silly).
>
> Facebook exclusively supports Open Graph.
>
> Google+ recommends schema.org microdata and uses Open Graph.
>
> Twitter exclusively uses their proprietary Twitter cards markup ( <meta
> name="twitter:card" content="summary" /> ...) and requires you to
> validate and submit your site for approval before they'll display cards.
>
> Reddit uses embed.ly, which is supposed to support a variety of Open
> Graph, oEmbed, etc...
>
> Bing uses schema.org and Open Graph but states that they "currently only
> [use] this information to enhance the visual display of search results
> of a limited number of publishers". Bing just uses everything it can,
> Microdata, Microformats, RDFa, etc...
>
> Google uses schema.org in microdata, RDFa, and JSON-LD formats for rich
> data (I'm not sure if they bother with page level metadata at all,
> standard HTML title and meta description generally covers what they output).
>
> ----
>
> So my opinion would be to support Open Graph, optionally add some
> schema.org,
> and screw Twitter and their unwillingness to play nice with attempts to
> standardize metadata.

+1 and if someone writes the patch I'll +2 it. We've been talking
about this for far too long :-)

>
> We should also consider oEmbed where it makes sense.
>
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

James Pearson
In reply to this post by Daniel Friesen-2
On Apr 2, 2015 10:05 PM, "Daniel Friesen" <[hidden email]>
wrote:
> Twitter exclusively uses their proprietary Twitter cards markup ( <meta
> name="twitter:card" content="summary" /> ...) and requires you to
> validate and submit your site for approval before they'll display cards.

This isn't quite correct. They have their own thing, which allows you to
give some Twitter-specific data, but attributes that are pretty standard
(like thumbnail image) will fall back to open graph.

https://dev.twitter.com/cards/markup

You do have to submit a request, though, for cards to be shown. I think
it's a pretty painless process, but I wasn't the one handling it when we
implemented card support.

> Reddit uses embed.ly, which is supposed to support a variety of Open
> Graph, oEmbed, etc...

Depending on what embedly tells us it can embed given certain conditions
(for instance, the embed needs to support https if the requested page was
https), we sometimes use embedly for thumbnails, and sometimes use our own
scraper, with the code I linked to in my first email. It will currently
pick up on opengraph tags, but if you decide to implement another standard
we don't currently support, I will gladly build it in (pending some project
scheduling, so perhaps not immediately).

From what I've seen, the various web-chat irc-replacements support open
graph as well, if they do any auto-link-embedding.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Thumbnail image hinting in articles via metadata tags

Derk-Jan Hartman
In reply to this post by Stas Malyshev

> On 3 apr. 2015, at 06:27, Stas Malyshev <[hidden email]> wrote:
>
> Hi!
>
>>> Would it be feasible to include an og:image tag on pages for which we have
>>> a reasonable guess as to the thumbnail?  Open Graph[3] is supported by what
>>> seems anecdotally to me to be a wide range of services, so good hints there
>>> would improve thumbnails for links on not just reddit, but Facebook,
>>> Twitter, various chat clients, I think several Wordpress plugins, etc.
>>>
>>
>> https://phabricator.wikimedia.org/T33338
>
> I wonder if this can somehow be connected to Wikidata's image attribute
> (https://www.wikidata.org/wiki/Property:P18).
This is a very good idea, because it circumvents the problem of autodetection that PageImages has for instance and that takes away the ability of editors to ‘author’ the result.

DJ

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

signature.asc (817 bytes) Download Attachment