Now live: Shared structured data

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Now live: Shared structured data

Yuri Astrakhan-2
Gift season! We have launched structured data on Commons, available from
all wikis.

TLDR; One data store. Use everywhere. Upload table data to Commons, with
localization, and use it to create wiki tables, lists, or use directly in
graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
per-state GDP map demo, and select multiple years. More demos at the bottom.
US Map state highlight
<https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>

Data can now be stored as *.tab and *.map pages in the data namespace on
Commons. That data may contain localization, so a table cell could be in
multiple languages. And that data is accessible from any wikis, by Lua
scripts, Graphs, and Maps.

Lua lets you generate wiki tables from the data by filtering, converting,
mixing, and formatting the raw data. Lua also lets you generate lists. Or
any wiki markup.

Graphs can use both .tab and .map directly to visualize the data and let
users interact with it. The GDP demo above uses a map from Commons, and
colors each segment with the data based on a data table.

Kartographer (<maplink>/<mapframe>) can use the .map data as an extra layer
on top of the base map. This way we can show endangered species' habitat.

== Demo ==
* Raw data example
<https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab>
* Interactive Weather data
<https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history>
* Same data in Weather template
<https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo>
* Interactive GDP map
<https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
* Endangered Jemez Mountains salamander - habitat
<https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0>
* Population history
<https://en.wikipedia.org/wiki/Template:Graph:Population_history>
* Line chart <https://en.wikipedia.org/wiki/Template:Graph:Lines>

== Getting started ==
* Try creating a page at data:Sandbox/<user>.tab on Commons. Don't forget
the .tab extension, or it won't work.
* Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!

== Documentation links ==
* Tabular help <https://www.mediawiki.org/wiki/Help:Tabular_Data>
* Map help <https://www.mediawiki.org/wiki/Help:Map_Data>
If you find a bug, create Phabricator ticket with #tabular-data tag, or
comment on the documentation talk pages.

== FAQ ==
* Relation to Wikidata:  Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data like
the historical weather or the outline of the state of New York.

== TODOs ==
* Add a nice "table editor" - editing JSON by hand is cruel. T134618
* "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
* Support data redirects. T153598
* Mega epic: Support external data feeds.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

Brad Jorsch (Anomie)
On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan <[hidden email]>
wrote:

> Gift season! We have launched structured data on Commons, available from
> all wikis.
>

I was momentarily excited, then I read a little farther and discovered this
isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.


--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

Yuri Astrakhan-2
Yes, there seem to have been a bit of a naming collision.  Tabular data and
map data have been jointly known as structured data, but there is also the
Structured Data project, which IMO should be called Structured Metadata
project :)  Naming suggestions are welcome!

P.S. Brad, I'm sorry tabular and map data did not excite you :(

On Thu, Dec 22, 2016 at 2:38 PM Brad Jorsch (Anomie) <[hidden email]>
wrote:

> On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan <[hidden email]>
> wrote:
>
> > Gift season! We have launched structured data on Commons, available from
> > all wikis.
> >
>
> I was momentarily excited, then I read a little farther and discovered this
> isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.
>
>
> --
> Brad Jorsch (Anomie)
> Senior Software Engineer
> Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

David Cuenca Tudela
In reply to this post by Brad Jorsch (Anomie)
On Thu, Dec 22, 2016 at 8:38 PM, Brad Jorsch (Anomie) <[hidden email]
> wrote:

> On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan <[hidden email]>
> wrote:
>
> > Gift season! We have launched structured data on Commons, available from
> > all wikis.
> >
>
> I was momentarily excited, then I read a little farther and discovered this
> isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.
>

Same here, I think it needs a better name...

What about calling it datasets or structured datasets?

Cheers,
Micru
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

Yuri Astrakhan-2
Micru, thanks, I think Datasets sounds like a good name too!

On Thu, Dec 22, 2016 at 2:44 PM David Cuenca Tudela <[hidden email]>
wrote:

> On Thu, Dec 22, 2016 at 8:38 PM, Brad Jorsch (Anomie) <
> [hidden email]
> > wrote:
>
> > On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan <
> [hidden email]>
> > wrote:
> >
> > > Gift season! We have launched structured data on Commons, available
> from
> > > all wikis.
> > >
> >
> > I was momentarily excited, then I read a little farther and discovered
> this
> > isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.
> >
>
> Same here, I think it needs a better name...
>
> What about calling it datasets or structured datasets?
>
> Cheers,
> Micru
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

David Cuenca Tudela
Anyway, this is great news! I hope that it gets adopted by the community.
Congratulations, Yuri!

I was going to suggest a Wikidata property, but I see that the data type
for datasets is not there yet:
https://phabricator.wikimedia.org/T151334

On Thu, Dec 22, 2016 at 8:48 PM, Yuri Astrakhan <[hidden email]>
wrote:

> Micru, thanks, I think Datasets sounds like a good name too!
>
> On Thu, Dec 22, 2016 at 2:44 PM David Cuenca Tudela <[hidden email]>
> wrote:
>
> > On Thu, Dec 22, 2016 at 8:38 PM, Brad Jorsch (Anomie) <
> > [hidden email]
> > > wrote:
> >
> > > On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan <
> > [hidden email]>
> > > wrote:
> > >
> > > > Gift season! We have launched structured data on Commons, available
> > from
> > > > all wikis.
> > > >
> > >
> > > I was momentarily excited, then I read a little farther and discovered
> > this
> > > isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data
> .
> > >
> >
> > Same here, I think it needs a better name...
> >
> > What about calling it datasets or structured datasets?
> >
> > Cheers,
> > Micru
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
Etiamsi omnes, ego non
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [discovery] Now live: Shared structured data

Yuri Astrakhan-2
In reply to this post by Yuri Astrakhan-2
Svetlana, thanks for suggestion. I think we should create a portal similar
to the Structured Data one, and put some examples there.  Deciding on the
name is difficult :)   "Commons Datasets" does sound good.

There has been a very prolonged discussion on where to host this feature -
https://meta.wikimedia.org/wiki/User:Yurik/Storing_data. Wikidata would
have been a good choice, but users expect all the data there to be in
public domain, and we may add more licensing choices later.

> An inline example with English commentary -- straight on the first page about
this new technology without making users click links -- could be nice. The
text you typed up does not seem to be on a wiki page, so I am unable to
edit it...

Which page are you referring to?

On Thu, Dec 22, 2016 at 4:03 PM Svetlana Tkachenko <[hidden email]>
wrote:

> Hello,
>
> Maybe 'commons store' or 'commons datasets' could work? I would suggest
> that the name reflects on the fact that the datasets are shared
> ('common') and are not on Wikidata.
>
> If I may ask, why is it in commons.wikimedia.org/wiki/Data:* and not at
> Meta (like Global user pages) or Wikidata (like structured data about
> lots of things)?
>
> An inline example with English commentary -- straight on the first page
> about this new technology without making users click links -- could be
> nice. The text you typed up does not seem to be on a wiki page, so I am
> unable to edit it...
>
> Svetlana.
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Maps-l] Now live: Shared structured data

Susanna Ånäs-2
In reply to this post by Yuri Astrakhan-2
Great work!

I'm happy with the new naming, Commons Datasets.

For historical maps we have been waiting to have a way to store data about
the rectification with the map image. Here it is! It brings us one notch
closer to being able to work with zoomable historical maps in Wikipedia.

Some have noted that the datasets are contrary to what Wikidata is about.
Instead, they are complementary! Not all data is either suitable for
Wikidata or licensed openly enough. Or not yet. Many great datasets can be
introduced to the Wikimedia online communities. The data owners will pay
attention to more open licensing, seeing their data being used. The
wikidatans will pick up interesting datasets and work to prepare missing
properties for them in Wikidata and ignite their bots. Sometimes the data
can just be used as is.

This is one part of the puzzle and it will be interesting to see how the
pieces fall into their places and evolve further.

In the coming few days there'll be time to digest and experiment.

Happy holidays
Susanna

2016-12-23 5:22 GMT+02:00 Bohdan Melnychuk <[hidden email]>:

> Yay :)
>
> As someone who already has plans to actively use it in both my metapedian
> role (e.g. CEE Spring article writing contest statistics data for building
> Graphs from being stored like https://meta.wikimedia.org/w/
> index.php?title=User:BaseBot/CEES/MMXVI/Per_country_sums_(
> general)&action=edit
> <https://meta.wikimedia.org/w/index.php?title=User:BaseBot/CEES/MMXVI/Per_country_sums_(general)&action=edit&section=2>
> to a better format of https://commons.wikimedia.org/
> wiki/Data:Wikimedia/CEE_Spring/Statistics/MMXVI/Per_
> country_sums_(general).tab which can be turned by Lua to the same output
> but now with it being controlled on wiki rather than bot code part) and
> exopedianish for actual articles I think it is wonderful.
>
> I do think that it needs tight cross linking with Wikidata and perfectly a
> way to run queries against both the sources at the same time (e.g. "give me
> the weather in all the current capitals in the date the comet Whatever was
> closest to the Sun the last time" or whatever else more useful thing may
> come into one's mind), but that does not deny the fact that it is very
> useful already.
>
> It can also be used as an intermediate location for data on the way to be
> imported to Wikidata, IMHO.
>
> --Base
>
>
> 22.12.2016, 21:31, "Yuri Astrakhan" <[hidden email]>:
>
> Gift season! We have launched structured data on Commons, available from
> all wikis.
>
> TLDR; One data store. Use everywhere. Upload table data to Commons, with
> localization, and use it to create wiki tables, lists, or use directly in
> graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
> per-state GDP map demo, and select multiple years. More demos at the bottom.
> US Map state highlight
> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
>
> Data can now be stored as *.tab and *.map pages in the data namespace on
> Commons. That data may contain localization, so a table cell could be in
> multiple languages. And that data is accessible from any wikis, by Lua
> scripts, Graphs, and Maps.
>
> Lua lets you generate wiki tables from the data by filtering, converting,
> mixing, and formatting the raw data. Lua also lets you generate lists. Or
> any wiki markup.
>
> Graphs can use both .tab and .map directly to visualize the data and let
> users interact with it. The GDP demo above uses a map from Commons, and
> colors each segment with the data based on a data table.
>
> Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
> layer on top of the base map. This way we can show endangered species'
> habitat.
>
> == Demo ==
> * Raw data example
> <https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab>
> * Interactive Weather data
> <https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history>
> * Same data in Weather template
> <https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo>
> * Interactive GDP map
> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
> * Endangered Jemez Mountains salamander - habitat
> <https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0>
> * Population history
> <https://en.wikipedia.org/wiki/Template:Graph:Population_history>
> * Line chart <https://en.wikipedia.org/wiki/Template:Graph:Lines>
>
> == Getting started ==
> * Try creating a page at data:Sandbox/<user>.tab on Commons. Don't forget
> the .tab extension, or it won't work.
> * Try using some data with the Line chart graph template
> A thorough guide is needed, help is welcome!
>
> == Documentation links ==
> * Tabular help <https://www.mediawiki.org/wiki/Help:Tabular_Data>
> * Map help <https://www.mediawiki.org/wiki/Help:Map_Data>
> If you find a bug, create Phabricator ticket with #tabular-data tag, or
> comment on the documentation talk pages.
>
> == FAQ ==
> * Relation to Wikidata:  Wikidata is about "facts" (small pieces of
> information). Structured data is about "blobs" - large amounts of data like
> the historical weather or the outline of the state of New York.
>
> == TODOs ==
> * Add a nice "table editor" - editing JSON by hand is cruel. T134618
> * "What links here" should track data usage across wikis. Will allow
> quicker auto-refresh of the pages too. T153966
> * Support data redirects. T153598
> * Mega epic: Support external data feeds.
> ,
>
> _______________________________________________
> Maps-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/maps-l
>
>
> _______________________________________________
> Maps-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/maps-l
>
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

mathieu lovato stumpf guntz
In reply to this post by Yuri Astrakhan-2
Hi Yuri,

Seems very interesting. Am I wrong thinking this could helpto create
multi-lingual glossary as drafted in
https://phabricator.wikimedia.org/T150263#2860014 ?


Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :

> Gift season! We have launched structured data on Commons, available from
> all wikis.
>
> TLDR; One data store. Use everywhere. Upload table data to Commons, with
> localization, and use it to create wiki tables, lists, or use directly in
> graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
> per-state GDP map demo, and select multiple years. More demos at the bottom.
> US Map state highlight
> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
>
> Data can now be stored as *.tab and *.map pages in the data namespace on
> Commons. That data may contain localization, so a table cell could be in
> multiple languages. And that data is accessible from any wikis, by Lua
> scripts, Graphs, and Maps.
>
> Lua lets you generate wiki tables from the data by filtering, converting,
> mixing, and formatting the raw data. Lua also lets you generate lists. Or
> any wiki markup.
>
> Graphs can use both .tab and .map directly to visualize the data and let
> users interact with it. The GDP demo above uses a map from Commons, and
> colors each segment with the data based on a data table.
>
> Kartographer (<maplink>/<mapframe>) can use the .map data as an extra layer
> on top of the base map. This way we can show endangered species' habitat.
>
> == Demo ==
> * Raw data example
> <https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab>
> * Interactive Weather data
> <https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history>
> * Same data in Weather template
> <https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo>
> * Interactive GDP map
> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
> * Endangered Jemez Mountains salamander - habitat
> <https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0>
> * Population history
> <https://en.wikipedia.org/wiki/Template:Graph:Population_history>
> * Line chart <https://en.wikipedia.org/wiki/Template:Graph:Lines>
>
> == Getting started ==
> * Try creating a page at data:Sandbox/<user>.tab on Commons. Don't forget
> the .tab extension, or it won't work.
> * Try using some data with the Line chart graph template
> A thorough guide is needed, help is welcome!
>
> == Documentation links ==
> * Tabular help <https://www.mediawiki.org/wiki/Help:Tabular_Data>
> * Map help <https://www.mediawiki.org/wiki/Help:Map_Data>
> If you find a bug, create Phabricator ticket with #tabular-data tag, or
> comment on the documentation talk pages.
>
> == FAQ ==
> * Relation to Wikidata:  Wikidata is about "facts" (small pieces of
> information). Structured data is about "blobs" - large amounts of data like
> the historical weather or the outline of the state of New York.
>
> == TODOs ==
> * Add a nice "table editor" - editing JSON by hand is cruel. T134618
> * "What links here" should track data usage across wikis. Will allow
> quicker auto-refresh of the pages too. T153966
> * Support data redirects. T153598
> * Mega epic: Support external data feeds.
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

Yuri Astrakhan-2
Hi Mathieu, yes, I think you can totally build up this glossary in a
dataset. Just remember that each string can be no longer then 400 chars,
and total size under 2mb.

On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
[hidden email]> wrote:

> Hi Yuri,
>
> Seems very interesting. Am I wrong thinking this could helpto create
> multi-lingual glossary as drafted in
> https://phabricator.wikimedia.org/T150263#2860014 ?
>
>
> Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
> > Gift season! We have launched structured data on Commons, available from
> > all wikis.
> >
> > TLDR; One data store. Use everywhere. Upload table data to Commons, with
> > localization, and use it to create wiki tables, lists, or use directly in
> > graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
> > per-state GDP map demo, and select multiple years. More demos at the
> bottom.
> > US Map state highlight
> > <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
> >
> > Data can now be stored as *.tab and *.map pages in the data namespace on
> > Commons. That data may contain localization, so a table cell could be in
> > multiple languages. And that data is accessible from any wikis, by Lua
> > scripts, Graphs, and Maps.
> >
> > Lua lets you generate wiki tables from the data by filtering, converting,
> > mixing, and formatting the raw data. Lua also lets you generate lists. Or
> > any wiki markup.
> >
> > Graphs can use both .tab and .map directly to visualize the data and let
> > users interact with it. The GDP demo above uses a map from Commons, and
> > colors each segment with the data based on a data table.
> >
> > Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
> layer
> > on top of the base map. This way we can show endangered species' habitat.
> >
> > == Demo ==
> > * Raw data example
> > <https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab>
> > * Interactive Weather data
> > <https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history>
> > * Same data in Weather template
> > <https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo>
> > * Interactive GDP map
> > <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
> > * Endangered Jemez Mountains salamander - habitat
> > <https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0>
> > * Population history
> > <https://en.wikipedia.org/wiki/Template:Graph:Population_history>
> > * Line chart <https://en.wikipedia.org/wiki/Template:Graph:Lines>
> >
> > == Getting started ==
> > * Try creating a page at data:Sandbox/<user>.tab on Commons. Don't forget
> > the .tab extension, or it won't work.
> > * Try using some data with the Line chart graph template
> > A thorough guide is needed, help is welcome!
> >
> > == Documentation links ==
> > * Tabular help <https://www.mediawiki.org/wiki/Help:Tabular_Data>
> > * Map help <https://www.mediawiki.org/wiki/Help:Map_Data>
> > If you find a bug, create Phabricator ticket with #tabular-data tag, or
> > comment on the documentation talk pages.
> >
> > == FAQ ==
> > * Relation to Wikidata:  Wikidata is about "facts" (small pieces of
> > information). Structured data is about "blobs" - large amounts of data
> like
> > the historical weather or the outline of the state of New York.
> >
> > == TODOs ==
> > * Add a nice "table editor" - editing JSON by hand is cruel. T134618
> > * "What links here" should track data usage across wikis. Will allow
> > quicker auto-refresh of the pages too. T153966
> > * Support data redirects. T153598
> > * Mega epic: Support external data feeds.
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

mathieu lovato stumpf guntz
Thank you Yuri. Is there some rational explanation behind this limits? I
understand the limit over performance concern, and 2Mb seems already
very large for intented glossaries. But 400 chars might be problematic
for some definition I guess, especially since translations can lead to
varying lenght needs.


Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :

> Hi Mathieu, yes, I think you can totally build up this glossary in a
> dataset. Just remember that each string can be no longer then 400 chars,
> and total size under 2mb.
>
> On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
> [hidden email]> wrote:
>
>> Hi Yuri,
>>
>> Seems very interesting. Am I wrong thinking this could helpto create
>> multi-lingual glossary as drafted in
>> https://phabricator.wikimedia.org/T150263#2860014 ?
>>
>>
>> Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
>>> Gift season! We have launched structured data on Commons, available from
>>> all wikis.
>>>
>>> TLDR; One data store. Use everywhere. Upload table data to Commons, with
>>> localization, and use it to create wiki tables, lists, or use directly in
>>> graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
>>> per-state GDP map demo, and select multiple years. More demos at the
>> bottom.
>>> US Map state highlight
>>> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
>>>
>>> Data can now be stored as *.tab and *.map pages in the data namespace on
>>> Commons. That data may contain localization, so a table cell could be in
>>> multiple languages. And that data is accessible from any wikis, by Lua
>>> scripts, Graphs, and Maps.
>>>
>>> Lua lets you generate wiki tables from the data by filtering, converting,
>>> mixing, and formatting the raw data. Lua also lets you generate lists. Or
>>> any wiki markup.
>>>
>>> Graphs can use both .tab and .map directly to visualize the data and let
>>> users interact with it. The GDP demo above uses a map from Commons, and
>>> colors each segment with the data based on a data table.
>>>
>>> Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
>> layer
>>> on top of the base map. This way we can show endangered species' habitat.
>>>
>>> == Demo ==
>>> * Raw data example
>>> <https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab>
>>> * Interactive Weather data
>>> <https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history>
>>> * Same data in Weather template
>>> <https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo>
>>> * Interactive GDP map
>>> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
>>> * Endangered Jemez Mountains salamander - habitat
>>> <https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0>
>>> * Population history
>>> <https://en.wikipedia.org/wiki/Template:Graph:Population_history>
>>> * Line chart <https://en.wikipedia.org/wiki/Template:Graph:Lines>
>>>
>>> == Getting started ==
>>> * Try creating a page at data:Sandbox/<user>.tab on Commons. Don't forget
>>> the .tab extension, or it won't work.
>>> * Try using some data with the Line chart graph template
>>> A thorough guide is needed, help is welcome!
>>>
>>> == Documentation links ==
>>> * Tabular help <https://www.mediawiki.org/wiki/Help:Tabular_Data>
>>> * Map help <https://www.mediawiki.org/wiki/Help:Map_Data>
>>> If you find a bug, create Phabricator ticket with #tabular-data tag, or
>>> comment on the documentation talk pages.
>>>
>>> == FAQ ==
>>> * Relation to Wikidata:  Wikidata is about "facts" (small pieces of
>>> information). Structured data is about "blobs" - large amounts of data
>> like
>>> the historical weather or the outline of the state of New York.
>>>
>>> == TODOs ==
>>> * Add a nice "table editor" - editing JSON by hand is cruel. T134618
>>> * "What links here" should track data usage across wikis. Will allow
>>> quicker auto-refresh of the pages too. T153966
>>> * Support data redirects. T153598
>>> * Mega epic: Support external data feeds.
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

Yuri Astrakhan-2
The 400 chat limit is to be in sync with Wikidata, which has the same
limitation. The origins of this limit is to encourage storage of "values"
rather than full strings (sentences). Also, it discourages storage of wiki
markup.

On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz <
[hidden email]> wrote:

> Thank you Yuri. Is there some rational explanation behind this limits? I
> understand the limit over performance concern, and 2Mb seems already
> very large for intented glossaries. But 400 chars might be problematic
> for some definition I guess, especially since translations can lead to
> varying lenght needs.
>
>
> Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :
> > Hi Mathieu, yes, I think you can totally build up this glossary in a
> > dataset. Just remember that each string can be no longer then 400 chars,
> > and total size under 2mb.
> >
> > On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
> > [hidden email]> wrote:
> >
> >> Hi Yuri,
> >>
> >> Seems very interesting. Am I wrong thinking this could helpto create
> >> multi-lingual glossary as drafted in
> >> https://phabricator.wikimedia.org/T150263#2860014 ?
> >>
> >>
> >> Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
> >>> Gift season! We have launched structured data on Commons, available
> from
> >>> all wikis.
> >>>
> >>> TLDR; One data store. Use everywhere. Upload table data to Commons,
> with
> >>> localization, and use it to create wiki tables, lists, or use directly
> in
> >>> graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
> >>> per-state GDP map demo, and select multiple years. More demos at the
> >> bottom.
> >>> US Map state highlight
> >>> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
> >>>
> >>> Data can now be stored as *.tab and *.map pages in the data namespace
> on
> >>> Commons. That data may contain localization, so a table cell could be
> in
> >>> multiple languages. And that data is accessible from any wikis, by Lua
> >>> scripts, Graphs, and Maps.
> >>>
> >>> Lua lets you generate wiki tables from the data by filtering,
> converting,
> >>> mixing, and formatting the raw data. Lua also lets you generate lists.
> Or
> >>> any wiki markup.
> >>>
> >>> Graphs can use both .tab and .map directly to visualize the data and
> let
> >>> users interact with it. The GDP demo above uses a map from Commons, and
> >>> colors each segment with the data based on a data table.
> >>>
> >>> Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
> >> layer
> >>> on top of the base map. This way we can show endangered species'
> habitat.
> >>>
> >>> == Demo ==
> >>> * Raw data example
> >>> <https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab>
> >>> * Interactive Weather data
> >>> <https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history>
> >>> * Same data in Weather template
> >>> <https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo>
> >>> * Interactive GDP map
> >>> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
> >>> * Endangered Jemez Mountains salamander - habitat
> >>> <https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0>
> >>> * Population history
> >>> <https://en.wikipedia.org/wiki/Template:Graph:Population_history>
> >>> * Line chart <https://en.wikipedia.org/wiki/Template:Graph:Lines>
> >>>
> >>> == Getting started ==
> >>> * Try creating a page at data:Sandbox/<user>.tab on Commons. Don't
> forget
> >>> the .tab extension, or it won't work.
> >>> * Try using some data with the Line chart graph template
> >>> A thorough guide is needed, help is welcome!
> >>>
> >>> == Documentation links ==
> >>> * Tabular help <https://www.mediawiki.org/wiki/Help:Tabular_Data>
> >>> * Map help <https://www.mediawiki.org/wiki/Help:Map_Data>
> >>> If you find a bug, create Phabricator ticket with #tabular-data tag, or
> >>> comment on the documentation talk pages.
> >>>
> >>> == FAQ ==
> >>> * Relation to Wikidata:  Wikidata is about "facts" (small pieces of
> >>> information). Structured data is about "blobs" - large amounts of data
> >> like
> >>> the historical weather or the outline of the state of New York.
> >>>
> >>> == TODOs ==
> >>> * Add a nice "table editor" - editing JSON by hand is cruel. T134618
> >>> * "What links here" should track data usage across wikis. Will allow
> >>> quicker auto-refresh of the pages too. T153966
> >>> * Support data redirects. T153598
> >>> * Mega epic: Support external data feeds.
> >>> _______________________________________________
> >>> Wikitech-l mailing list
> >>> [hidden email]
> >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >> _______________________________________________
> >> Wikitech-l mailing list
> >> [hidden email]
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

mathieu lovato stumpf guntz


Le 28/12/2016 à 23:08, Yuri Astrakhan a écrit :
> The 400 chat limit is to be in sync with Wikidata, which has the same
> limitation. The origins of this limit is to encourage storage of "values"
> rather than full strings (sentences).
Well, that's probably not the best constraints for a glossary then. To
my mind, 400 char limit regardless of the language is rather suprising.
Surely you can tell much more with a set of 400 ideograms than with,
well, whatever the language happen to have the longest average sentence
length (any idea?). Also, at least for some translation pairs, there is
a tendancy to have translations longer than the original[1].

[1] http://www.sid.ir/en/VEWSSID/J_pdf/53001320130303.pdf
>   Also, it discourages storage of wiki
> markup.
What about disallowing it explicitly? You might even enforce that with a
quick parsing that prevent recording, or simply put a reminder when
detecting such a string to avoid blocking users in legitimate corner cases.

>
> On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz <
> [hidden email]> wrote:
>
>> Thank you Yuri. Is there some rational explanation behind this limits? I
>> understand the limit over performance concern, and 2Mb seems already
>> very large for intented glossaries. But 400 chars might be problematic
>> for some definition I guess, especially since translations can lead to
>> varying lenght needs.
>>
>>
>> Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :
>>> Hi Mathieu, yes, I think you can totally build up this glossary in a
>>> dataset. Just remember that each string can be no longer then 400 chars,
>>> and total size under 2mb.
>>>
>>> On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
>>> [hidden email]> wrote:
>>>
>>>> Hi Yuri,
>>>>
>>>> Seems very interesting. Am I wrong thinking this could helpto create
>>>> multi-lingual glossary as drafted in
>>>> https://phabricator.wikimedia.org/T150263#2860014 ?
>>>>
>>>>
>>>> Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
>>>>> Gift season! We have launched structured data on Commons, available
>> from
>>>>> all wikis.
>>>>>
>>>>> TLDR; One data store. Use everywhere. Upload table data to Commons,
>> with
>>>>> localization, and use it to create wiki tables, lists, or use directly
>> in
>>>>> graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
>>>>> per-state GDP map demo, and select multiple years. More demos at the
>>>> bottom.
>>>>> US Map state highlight
>>>>> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
>>>>>
>>>>> Data can now be stored as *.tab and *.map pages in the data namespace
>> on
>>>>> Commons. That data may contain localization, so a table cell could be
>> in
>>>>> multiple languages. And that data is accessible from any wikis, by Lua
>>>>> scripts, Graphs, and Maps.
>>>>>
>>>>> Lua lets you generate wiki tables from the data by filtering,
>> converting,
>>>>> mixing, and formatting the raw data. Lua also lets you generate lists.
>> Or
>>>>> any wiki markup.
>>>>>
>>>>> Graphs can use both .tab and .map directly to visualize the data and
>> let
>>>>> users interact with it. The GDP demo above uses a map from Commons, and
>>>>> colors each segment with the data based on a data table.
>>>>>
>>>>> Kartographer (<maplink>/<mapframe>) can use the .map data as an extra
>>>> layer
>>>>> on top of the base map. This way we can show endangered species'
>> habitat.
>>>>> == Demo ==
>>>>> * Raw data example
>>>>> <https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab>
>>>>> * Interactive Weather data
>>>>> <https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history>
>>>>> * Same data in Weather template
>>>>> <https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo>
>>>>> * Interactive GDP map
>>>>> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
>>>>> * Endangered Jemez Mountains salamander - habitat
>>>>> <https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0>
>>>>> * Population history
>>>>> <https://en.wikipedia.org/wiki/Template:Graph:Population_history>
>>>>> * Line chart <https://en.wikipedia.org/wiki/Template:Graph:Lines>
>>>>>
>>>>> == Getting started ==
>>>>> * Try creating a page at data:Sandbox/<user>.tab on Commons. Don't
>> forget
>>>>> the .tab extension, or it won't work.
>>>>> * Try using some data with the Line chart graph template
>>>>> A thorough guide is needed, help is welcome!
>>>>>
>>>>> == Documentation links ==
>>>>> * Tabular help <https://www.mediawiki.org/wiki/Help:Tabular_Data>
>>>>> * Map help <https://www.mediawiki.org/wiki/Help:Map_Data>
>>>>> If you find a bug, create Phabricator ticket with #tabular-data tag, or
>>>>> comment on the documentation talk pages.
>>>>>
>>>>> == FAQ ==
>>>>> * Relation to Wikidata:  Wikidata is about "facts" (small pieces of
>>>>> information). Structured data is about "blobs" - large amounts of data
>>>> like
>>>>> the historical weather or the outline of the state of New York.
>>>>>
>>>>> == TODOs ==
>>>>> * Add a nice "table editor" - editing JSON by hand is cruel. T134618
>>>>> * "What links here" should track data usage across wikis. Will allow
>>>>> quicker auto-refresh of the pages too. T153966
>>>>> * Support data redirects. T153598
>>>>> * Mega epic: Support external data feeds.
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Now live: Shared structured data

mathieu lovato stumpf guntz

As to my mind it's a very interesting topic, I searched a bit more.

https://www.w3.org/International/articles/article-text-size.en
     which quotes
http://www-01.ibm.com/software/globalization/guidelines/a3.html

According to which, for strings in English source that are over 70
characters, you might expect an 130% average expansion. So, with an
admittedly very loose inference,  the 400 character limit for all is
equivalent to a 307 character limit for English. Would you say that it
would seems ok to have a 307 character limit there?


Le 29/12/2016 à 12:11, mathieu stumpf guntz a écrit :

>
>
> Le 28/12/2016 à 23:08, Yuri Astrakhan a écrit :
>> The 400 chat limit is to be in sync with Wikidata, which has the same
>> limitation. The origins of this limit is to encourage storage of
>> "values"
>> rather than full strings (sentences).
> Well, that's probably not the best constraints for a glossary then. To
> my mind, 400 char limit regardless of the language is rather
> suprising. Surely you can tell much more with a set of 400 ideograms
> than with, well, whatever the language happen to have the longest
> average sentence length (any idea?). Also, at least for some
> translation pairs, there is a tendancy to have translations longer
> than the original[1].
>
> [1] http://www.sid.ir/en/VEWSSID/J_pdf/53001320130303.pdf
>>   Also, it discourages storage of wiki
>> markup.
> What about disallowing it explicitly? You might even enforce that with
> a quick parsing that prevent recording, or simply put a reminder when
> detecting such a string to avoid blocking users in legitimate corner
> cases.
>
>>
>> On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz <
>> [hidden email]> wrote:
>>
>>> Thank you Yuri. Is there some rational explanation behind this
>>> limits? I
>>> understand the limit over performance concern, and 2Mb seems already
>>> very large for intented glossaries. But 400 chars might be problematic
>>> for some definition I guess, especially since translations can lead to
>>> varying lenght needs.
>>>
>>>
>>> Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :
>>>> Hi Mathieu, yes, I think you can totally build up this glossary in a
>>>> dataset. Just remember that each string can be no longer then 400
>>>> chars,
>>>> and total size under 2mb.
>>>>
>>>> On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
>>>> [hidden email]> wrote:
>>>>
>>>>> Hi Yuri,
>>>>>
>>>>> Seems very interesting. Am I wrong thinking this could helpto create
>>>>> multi-lingual glossary as drafted in
>>>>> https://phabricator.wikimedia.org/T150263#2860014 ?
>>>>>
>>>>>
>>>>> Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
>>>>>> Gift season! We have launched structured data on Commons, available
>>> from
>>>>>> all wikis.
>>>>>>
>>>>>> TLDR; One data store. Use everywhere. Upload table data to Commons,
>>> with
>>>>>> localization, and use it to create wiki tables, lists, or use
>>>>>> directly
>>> in
>>>>>> graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try
>>>>>> this
>>>>>> per-state GDP map demo, and select multiple years. More demos at the
>>>>> bottom.
>>>>>> US Map state highlight
>>>>>> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
>>>>>>
>>>>>>
>>>>>> Data can now be stored as *.tab and *.map pages in the data
>>>>>> namespace
>>> on
>>>>>> Commons. That data may contain localization, so a table cell
>>>>>> could be
>>> in
>>>>>> multiple languages. And that data is accessible from any wikis,
>>>>>> by Lua
>>>>>> scripts, Graphs, and Maps.
>>>>>>
>>>>>> Lua lets you generate wiki tables from the data by filtering,
>>> converting,
>>>>>> mixing, and formatting the raw data. Lua also lets you generate
>>>>>> lists.
>>> Or
>>>>>> any wiki markup.
>>>>>>
>>>>>> Graphs can use both .tab and .map directly to visualize the data and
>>> let
>>>>>> users interact with it. The GDP demo above uses a map from
>>>>>> Commons, and
>>>>>> colors each segment with the data based on a data table.
>>>>>>
>>>>>> Kartographer (<maplink>/<mapframe>) can use the .map data as an
>>>>>> extra
>>>>> layer
>>>>>> on top of the base map. This way we can show endangered species'
>>> habitat.
>>>>>> == Demo ==
>>>>>> * Raw data example
>>>>>> <https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab>
>>>>>> * Interactive Weather data
>>>>>> <https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history>
>>>>>>
>>>>>> * Same data in Weather template
>>>>>> <https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo>
>>>>>> * Interactive GDP map
>>>>>> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight>
>>>>>>
>>>>>> * Endangered Jemez Mountains salamander - habitat
>>>>>> <https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0>
>>>>>>
>>>>>> * Population history
>>>>>> <https://en.wikipedia.org/wiki/Template:Graph:Population_history>
>>>>>> * Line chart <https://en.wikipedia.org/wiki/Template:Graph:Lines>
>>>>>>
>>>>>> == Getting started ==
>>>>>> * Try creating a page at data:Sandbox/<user>.tab on Commons. Don't
>>> forget
>>>>>> the .tab extension, or it won't work.
>>>>>> * Try using some data with the Line chart graph template
>>>>>> A thorough guide is needed, help is welcome!
>>>>>>
>>>>>> == Documentation links ==
>>>>>> * Tabular help <https://www.mediawiki.org/wiki/Help:Tabular_Data>
>>>>>> * Map help <https://www.mediawiki.org/wiki/Help:Map_Data>
>>>>>> If you find a bug, create Phabricator ticket with #tabular-data
>>>>>> tag, or
>>>>>> comment on the documentation talk pages.
>>>>>>
>>>>>> == FAQ ==
>>>>>> * Relation to Wikidata:  Wikidata is about "facts" (small pieces of
>>>>>> information). Structured data is about "blobs" - large amounts of
>>>>>> data
>>>>> like
>>>>>> the historical weather or the outline of the state of New York.
>>>>>>
>>>>>> == TODOs ==
>>>>>> * Add a nice "table editor" - editing JSON by hand is cruel. T134618
>>>>>> * "What links here" should track data usage across wikis. Will allow
>>>>>> quicker auto-refresh of the pages too. T153966
>>>>>> * Support data redirects. T153598
>>>>>> * Mega epic: Support external data feeds.
>>>>>> _______________________________________________
>>>>>> Wikitech-l mailing list
>>>>>> [hidden email]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l