New files for geo coded Wikimedia stats

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

New files for geo coded Wikimedia stats

Erik Zachte-3
 Today I released two new json files [2][4].
Both complement visualization 'Wikipedia Views Visualized' [1] (aka
WiViVi), but both can be useful in other contexts as well.
1) File 'demographics_from_world_bank_for_wikimedia.json' [2] resulted from
harvesting World Bank API files.
It contains yearly figures for four metrics: (more could be added rather
easily):
- population counts,
- percentage internet users,
- percentage mobile subscriptions,
- GDP per capita.
The following static demographics charts on meta are also based on these
metrics: [3]
2) File 'datamaps-data.json' [4] contains the equivalent of 3 rather
complex (*) csv files which feed WiViVi. This brings together demographics
data and pageviews (by country, by region, and by language), and also adds
additional meta info. This json file is meant for external use, as it's
much easier to parse than the 3 csv files WiViVi uses itself [5].
(*) complex , as the csv files use a hierarchy based on nested delimiters
--
Details:
World Bank files have different formats (some csv, some json) and use a
variety of indexes (some use ISO 3166-1 alpha-2 codes, others ..-alpha-3).
Script 1) first does normalization, then data are aggregated, filtered,
indexed.
Json file 1) replaces two csv files which up to now were filled from
Wikipedia pages [6][7].
Also, although Wikipedia lists nowadays also use World Bank data, this is
not consistently done, see [8][9].
[1] Viz:
https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html
[2] Json:
https://stats.wikimedia.org/wikimedia/animations/wivivi/world-bank-demographics.json
    Script:
https://github.com/wikimedia/analytics-wikistats/tree/master/worldbank
[3] Charts: https://meta.wikimedia.org/wiki/World_Bank_demographics
[4] Json:
https://stats.wikimedia.org/wikimedia/animations/wivivi/datamaps-data.json
    Script:
https://github.com/wikimedia/analytics-wikistats/tree/master/traffic
[5] Syntax:
https://stats.wikimedia.org/wikimedia/animations/wivivi/data.html
[6] Article:
https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population
[7] Article:
https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users
[8] Talk page: https://bit.ly/2L5Z2P4 section 'Wikipedia vs Worldbank
population counts'
[9] Talk page: https://bit.ly/2NJUoIu section 'Wikipedia vs Worldbank
internet percentages'
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: New files for geo coded Wikimedia stats

Leila Zia
Thanks for this, Erik. This can be helpful for a variety of projects
including
https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour/Robustness_across_languages
and the next steps for this project.

L

On Wednesday, July 11, 2018, Erik Zachte <[hidden email]> wrote:

>  Today I released two new json files [2][4].
> Both complement visualization 'Wikipedia Views Visualized' [1] (aka
> WiViVi), but both can be useful in other contexts as well.
> 1) File 'demographics_from_world_bank_for_wikimedia.json' [2] resulted
> from
> harvesting World Bank API files.
> It contains yearly figures for four metrics: (more could be added rather
> easily):
> - population counts,
> - percentage internet users,
> - percentage mobile subscriptions,
> - GDP per capita.
> The following static demographics charts on meta are also based on these
> metrics: [3]
> 2) File 'datamaps-data.json' [4] contains the equivalent of 3 rather
> complex (*) csv files which feed WiViVi. This brings together demographics
> data and pageviews (by country, by region, and by language), and also adds
> additional meta info. This json file is meant for external use, as it's
> much easier to parse than the 3 csv files WiViVi uses itself [5].
> (*) complex , as the csv files use a hierarchy based on nested delimiters
> --
> Details:
> World Bank files have different formats (some csv, some json) and use a
> variety of indexes (some use ISO 3166-1 alpha-2 codes, others ..-alpha-3).
> Script 1) first does normalization, then data are aggregated, filtered,
> indexed.
> Json file 1) replaces two csv files which up to now were filled from
> Wikipedia pages [6][7].
> Also, although Wikipedia lists nowadays also use World Bank data, this is
> not consistently done, see [8][9].
> [1] Viz:
> https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html
> [2] Json:
> https://stats.wikimedia.org/wikimedia/animations/wivivi/
> world-bank-demographics.json
>     Script:
> https://github.com/wikimedia/analytics-wikistats/tree/master/worldbank
> [3] Charts: https://meta.wikimedia.org/wiki/World_Bank_demographics
> [4] Json:
> https://stats.wikimedia.org/wikimedia/animations/wivivi/datamaps-data.json
>     Script:
> https://github.com/wikimedia/analytics-wikistats/tree/master/traffic
> [5] Syntax:
> https://stats.wikimedia.org/wikimedia/animations/wivivi/data.html
> [6] Article:
> https://en.wikipedia.org/wiki/List_of_countries_and_
> dependencies_by_population
> [7] Article:
> https://en.wikipedia.org/wiki/List_of_countries_by_number_
> of_Internet_users
> [8] Talk page: https://bit.ly/2L5Z2P4 section 'Wikipedia vs Worldbank
> population counts'
> [9] Talk page: https://bit.ly/2NJUoIu section 'Wikipedia vs Worldbank
> internet percentages'
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--

--
Leila Zia
Senior Research Scientist
Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l