Programmatically categorizing media in the Commons with Machine Learning

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Programmatically categorizing media in the Commons with Machine Learning

Jordan Adler

Hey folks!


A few months back a colleague of mine was looking for some unstructured images to analyze as part of a demo for the Google Cloud Vision API.  Luckily, I knew just the place, and the resulting demo, built by Reactive Inc., is pretty awesome.  It was shared on-stage by Jeff Dean during the keynote at GCP NEXT 2016.


I wanted to quickly share the data from the programmatically identified images so it could be used to help categorize the media in the Commons.  There's about 80,000 images worth of data:


  • map.txt (5.9MB): A single text file mapping id to filename in a "id : filename" format, one per line


We're making this data available under the CC0 license, and these links will likely be live for at least a few weeks.


If you're interested in working with the Cloud Vision API to tag other images in the Commons, talk to the WMF Community Tech team.


Thanks for your help!


_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Programmatically categorizing media in the Commons with Machine Learning

Jordan Adler
Looks like some of these images still need categorization. I think there's still an unrealized opportunity here to use the results I shared to work the backlog of the category on the Commons.

On Thu, Aug 11, 2016 at 1:47 PM Pine W <[hidden email]> wrote:

Forwarding.

Pine

---------- Forwarded message ----------
From: "Jordan Adler" <[hidden email]>
Date: Aug 11, 2016 13:06
Subject: [Commons-l] Programmatically categorizing media in the Commons with Machine Learning
To: "[hidden email]" <[hidden email]>
Cc: "Ray Sakai" <[hidden email]>, "Ram Ramanathan" <[hidden email]>, "Kazunori Sato" <[hidden email]>

Hey folks!


A few months back a colleague of mine was looking for some unstructured images to analyze as part of a demo for the Google Cloud Vision API.  Luckily, I knew just the place, and the resulting demo, built by Reactive Inc., is pretty awesome.  It was shared on-stage by Jeff Dean during the keynote at GCP NEXT 2016.


I wanted to quickly share the data from the programmatically identified images so it could be used to help categorize the media in the Commons.  There's about 80,000 images worth of data:


  • map.txt (5.9MB): A single text file mapping id to filename in a "id : filename" format, one per line


We're making this data available under the CC0 license, and these links will likely be live for at least a few weeks.


If you're interested in working with the Cloud Vision API to tag other images in the Commons, talk to the WMF Community Tech team.


Thanks for your help!


_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l


_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Programmatically categorizing media in the Commons with Machine Learning

Daniel Mietchen
Hi Jordan,
can your pipeline help with video or perhaps even audio as well?
There are lots of such files as well that need categorization.
Thanks,
Daniel

On Tue, Apr 4, 2017 at 12:05 AM, Jordan Adler <[hidden email]> wrote:

> Looks like some of these images still need categorization. I think there's
> still an unrealized opportunity here to use the results I shared to work the
> backlog of the category on the Commons.
>
> On Thu, Aug 11, 2016 at 1:47 PM Pine W <[hidden email]> wrote:
>>
>> Forwarding.
>>
>> Pine
>>
>> ---------- Forwarded message ----------
>> From: "Jordan Adler" <[hidden email]>
>> Date: Aug 11, 2016 13:06
>> Subject: [Commons-l] Programmatically categorizing media in the Commons
>> with Machine Learning
>> To: "[hidden email]" <[hidden email]>
>> Cc: "Ray Sakai" <[hidden email]>, "Ram Ramanathan"
>> <[hidden email]>, "Kazunori Sato" <[hidden email]>
>>
>> Hey folks!
>>
>>
>> A few months back a colleague of mine was looking for some unstructured
>> images to analyze as part of a demo for the Google Cloud Vision API.
>> Luckily, I knew just the place, and the resulting demo, built by Reactive
>> Inc., is pretty awesome.  It was shared on-stage by Jeff Dean during the
>> keynote at GCP NEXT 2016.
>>
>>
>> I wanted to quickly share the data from the programmatically identified
>> images so it could be used to help categorize the media in the Commons.
>> There's about 80,000 images worth of data:
>>
>>
>> map.txt (5.9MB): A single text file mapping id to filename in a "id :
>> filename" format, one per line
>>
>> results.tar.gz (29.6MB): a tgz'd directory of json files representing the
>> output of the API, in the format "${id}.jpg.json"
>>
>>
>> We're making this data available under the CC0 license, and these links
>> will likely be live for at least a few weeks.
>>
>>
>> If you're interested in working with the Cloud Vision API to tag other
>> images in the Commons, talk to the WMF Community Tech team.
>>
>>
>> Thanks for your help!
>>
>>
>> _______________________________________________
>> Commons-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/commons-l
>>
>
> _______________________________________________
> Commons-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/commons-l
>

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Programmatically categorizing media in the Commons with Machine Learning

Jordan Adler
GCP has a number of models-as-a-service that might be useful.
On Mon, Apr 3, 2017 at 6:46 PM Daniel Mietchen <[hidden email]> wrote:
Hi Jordan,
can your pipeline help with video or perhaps even audio as well?
There are lots of such files as well that need categorization.
Thanks,
Daniel

On Tue, Apr 4, 2017 at 12:05 AM, Jordan Adler <[hidden email]> wrote:
> Looks like some of these images still need categorization. I think there's
> still an unrealized opportunity here to use the results I shared to work the
> backlog of the category on the Commons.
>
> On Thu, Aug 11, 2016 at 1:47 PM Pine W <[hidden email]> wrote:
>>
>> Forwarding.
>>
>> Pine
>>
>> ---------- Forwarded message ----------
>> From: "Jordan Adler" <[hidden email]>
>> Date: Aug 11, 2016 13:06
>> Subject: [Commons-l] Programmatically categorizing media in the Commons
>> with Machine Learning
>> To: "[hidden email]" <[hidden email]>
>> Cc: "Ray Sakai" <[hidden email]>, "Ram Ramanathan"
>> <[hidden email]>, "Kazunori Sato" <[hidden email]>
>>
>> Hey folks!
>>
>>
>> A few months back a colleague of mine was looking for some unstructured
>> images to analyze as part of a demo for the Google Cloud Vision API.
>> Luckily, I knew just the place, and the resulting demo, built by Reactive
>> Inc., is pretty awesome.  It was shared on-stage by Jeff Dean during the
>> keynote at GCP NEXT 2016.
>>
>>
>> I wanted to quickly share the data from the programmatically identified
>> images so it could be used to help categorize the media in the Commons.
>> There's about 80,000 images worth of data:
>>
>>
>> map.txt (5.9MB): A single text file mapping id to filename in a "id :
>> filename" format, one per line
>>
>> results.tar.gz (29.6MB): a tgz'd directory of json files representing the
>> output of the API, in the format "${id}.jpg.json"
>>
>>
>> We're making this data available under the CC0 license, and these links
>> will likely be live for at least a few weeks.
>>
>>
>> If you're interested in working with the Cloud Vision API to tag other
>> images in the Commons, talk to the WMF Community Tech team.
>>
>>
>> Thanks for your help!
>>
>>
>> _______________________________________________
>> Commons-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/commons-l
>>
>
> _______________________________________________
> Commons-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/commons-l
>

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Loading...