Fwd: Traffic to the portal from Zero providers

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Fwd: Traffic to the portal from Zero providers

Oliver Keyes-4
Cross-posting to research and analytics, too!


---------- Forwarded message ----------
From: Oliver Keyes <[hidden email]>
Date: 6 May 2015 at 13:11
Subject: Traffic to the portal from Zero providers
To: [hidden email]


Hey all,

(Throwing this to the public list, because transparency is Good)

I recently did a presentation on a traffic analysis to the Wikipedia
"home page" - www.wikipedia.org.[1]

One of the biggest visualisations, in impact terms, showed that a lot
of portal traffic - far more, proportionately, than traffic to
Wikipedia overall - is coming from India and Brazil.[2] One of the
hypotheses was that this could be Zero traffic.

I've done a basic analysis of the traffic, looking specifically at the
zero headers,[3] and this hypothesis turns out to be incorrect -
almost no zero traffic is hitting the portal. The traffic we're seeing
from Brazil and India is not zero-based.

This makes a lot of sense (the reason mobile traffic redirects to the
enwiki home page from the portal is the Zero extension, so presumably
this happens specifically to Zero traffic) but it does mean that our
null hypothesis - that this traffic is down to ISP-level or
device-level design choices and links - is more likely to be correct.

[1] http://ironholds.org/misc/homepage_presentation.html
[2] http://ironholds.org/misc/homepage_presentation.html#/11
[3] https://phabricator.wikimedia.org/T98076

--
Oliver Keyes
Research Analyst
Wikimedia Foundation


--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Sam Katz
hey oliver,

I don't mean to be a help vampire...

but what is zero traffic? you think the traffic is being proxied?
perhaps even reverse proxied?

--Sam

On Wed, May 6, 2015 at 1:40 PM, Oliver Keyes <[hidden email]> wrote:

> Cross-posting to research and analytics, too!
>
>
> ---------- Forwarded message ----------
> From: Oliver Keyes <[hidden email]>
> Date: 6 May 2015 at 13:11
> Subject: Traffic to the portal from Zero providers
> To: [hidden email]
>
>
> Hey all,
>
> (Throwing this to the public list, because transparency is Good)
>
> I recently did a presentation on a traffic analysis to the Wikipedia
> "home page" - www.wikipedia.org.[1]
>
> One of the biggest visualisations, in impact terms, showed that a lot
> of portal traffic - far more, proportionately, than traffic to
> Wikipedia overall - is coming from India and Brazil.[2] One of the
> hypotheses was that this could be Zero traffic.
>
> I've done a basic analysis of the traffic, looking specifically at the
> zero headers,[3] and this hypothesis turns out to be incorrect -
> almost no zero traffic is hitting the portal. The traffic we're seeing
> from Brazil and India is not zero-based.
>
> This makes a lot of sense (the reason mobile traffic redirects to the
> enwiki home page from the portal is the Zero extension, so presumably
> this happens specifically to Zero traffic) but it does mean that our
> null hypothesis - that this traffic is down to ISP-level or
> device-level design choices and links - is more likely to be correct.
>
> [1] http://ironholds.org/misc/homepage_presentation.html
> [2] http://ironholds.org/misc/homepage_presentation.html#/11
> [3] https://phabricator.wikimedia.org/T98076
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Oliver Keyes-4
Traffic through Wikipedia zero; apologies for not being clear.

On 6 May 2015 at 19:56, Sam Katz <[hidden email]> wrote:

> hey oliver,
>
> I don't mean to be a help vampire...
>
> but what is zero traffic? you think the traffic is being proxied?
> perhaps even reverse proxied?
>
> --Sam
>
> On Wed, May 6, 2015 at 1:40 PM, Oliver Keyes <[hidden email]> wrote:
>> Cross-posting to research and analytics, too!
>>
>>
>> ---------- Forwarded message ----------
>> From: Oliver Keyes <[hidden email]>
>> Date: 6 May 2015 at 13:11
>> Subject: Traffic to the portal from Zero providers
>> To: [hidden email]
>>
>>
>> Hey all,
>>
>> (Throwing this to the public list, because transparency is Good)
>>
>> I recently did a presentation on a traffic analysis to the Wikipedia
>> "home page" - www.wikipedia.org.[1]
>>
>> One of the biggest visualisations, in impact terms, showed that a lot
>> of portal traffic - far more, proportionately, than traffic to
>> Wikipedia overall - is coming from India and Brazil.[2] One of the
>> hypotheses was that this could be Zero traffic.
>>
>> I've done a basic analysis of the traffic, looking specifically at the
>> zero headers,[3] and this hypothesis turns out to be incorrect -
>> almost no zero traffic is hitting the portal. The traffic we're seeing
>> from Brazil and India is not zero-based.
>>
>> This makes a lot of sense (the reason mobile traffic redirects to the
>> enwiki home page from the portal is the Zero extension, so presumably
>> this happens specifically to Zero traffic) but it does mean that our
>> null hypothesis - that this traffic is down to ISP-level or
>> device-level design choices and links - is more likely to be correct.
>>
>> [1] http://ironholds.org/misc/homepage_presentation.html
>> [2] http://ironholds.org/misc/homepage_presentation.html#/11
>> [3] https://phabricator.wikimedia.org/T98076
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Stuart A. Yeates
In reply to this post by Oliver Keyes-4
Reading that excellent presentation, the thought that struck me was:

"If I wanted to subvert the assumption that Wikipedia == en.wiki,
linking to http://www.wikipedia.org/ is what I'd do."

A smarter http://www.wikipedia.org/ might guess geo-location and thus
local languages.

cheers
stuart

--
...let us be heard from red core to black sky


On Thu, May 7, 2015 at 6:40 AM, Oliver Keyes <[hidden email]> wrote:

> Cross-posting to research and analytics, too!
>
>
> ---------- Forwarded message ----------
> From: Oliver Keyes <[hidden email]>
> Date: 6 May 2015 at 13:11
> Subject: Traffic to the portal from Zero providers
> To: [hidden email]
>
>
> Hey all,
>
> (Throwing this to the public list, because transparency is Good)
>
> I recently did a presentation on a traffic analysis to the Wikipedia
> "home page" - www.wikipedia.org.[1]
>
> One of the biggest visualisations, in impact terms, showed that a lot
> of portal traffic - far more, proportionately, than traffic to
> Wikipedia overall - is coming from India and Brazil.[2] One of the
> hypotheses was that this could be Zero traffic.
>
> I've done a basic analysis of the traffic, looking specifically at the
> zero headers,[3] and this hypothesis turns out to be incorrect -
> almost no zero traffic is hitting the portal. The traffic we're seeing
> from Brazil and India is not zero-based.
>
> This makes a lot of sense (the reason mobile traffic redirects to the
> enwiki home page from the portal is the Zero extension, so presumably
> this happens specifically to Zero traffic) but it does mean that our
> null hypothesis - that this traffic is down to ISP-level or
> device-level design choices and links - is more likely to be correct.
>
> [1] http://ironholds.org/misc/homepage_presentation.html
> [2] http://ironholds.org/misc/homepage_presentation.html#/11
> [3] https://phabricator.wikimedia.org/T98076
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Oliver Keyes-4
Agreed! That's one of the changes I'd really like to push ahead with,
although we're going to do some more in-depth data collection before
any redesign :).

On 6 May 2015 at 20:27, Stuart A. Yeates <[hidden email]> wrote:

> Reading that excellent presentation, the thought that struck me was:
>
> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
> linking to http://www.wikipedia.org/ is what I'd do."
>
> A smarter http://www.wikipedia.org/ might guess geo-location and thus
> local languages.
>
> cheers
> stuart
>
> --
> ...let us be heard from red core to black sky
>
>
> On Thu, May 7, 2015 at 6:40 AM, Oliver Keyes <[hidden email]> wrote:
>> Cross-posting to research and analytics, too!
>>
>>
>> ---------- Forwarded message ----------
>> From: Oliver Keyes <[hidden email]>
>> Date: 6 May 2015 at 13:11
>> Subject: Traffic to the portal from Zero providers
>> To: [hidden email]
>>
>>
>> Hey all,
>>
>> (Throwing this to the public list, because transparency is Good)
>>
>> I recently did a presentation on a traffic analysis to the Wikipedia
>> "home page" - www.wikipedia.org.[1]
>>
>> One of the biggest visualisations, in impact terms, showed that a lot
>> of portal traffic - far more, proportionately, than traffic to
>> Wikipedia overall - is coming from India and Brazil.[2] One of the
>> hypotheses was that this could be Zero traffic.
>>
>> I've done a basic analysis of the traffic, looking specifically at the
>> zero headers,[3] and this hypothesis turns out to be incorrect -
>> almost no zero traffic is hitting the portal. The traffic we're seeing
>> from Brazil and India is not zero-based.
>>
>> This makes a lot of sense (the reason mobile traffic redirects to the
>> enwiki home page from the portal is the Zero extension, so presumably
>> this happens specifically to Zero traffic) but it does mean that our
>> null hypothesis - that this traffic is down to ISP-level or
>> device-level design choices and links - is more likely to be correct.
>>
>> [1] http://ironholds.org/misc/homepage_presentation.html
>> [2] http://ironholds.org/misc/homepage_presentation.html#/11
>> [3] https://phabricator.wikimedia.org/T98076
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Stuart A. Yeates
Probably also an excellent time to consider whether we can do anything
for those languages which don't have wikis yet.

For example, I'm in .nz, which has en, mi and nzs as official
languages, but we're a long way from an nzs.wiki, given that ase.wiki
is still in incubator. With the release of Unicode 8 with Sutton
SignWriting in June, these may or may not kick off in a big way.

cheers
stuart
--
...let us be heard from red core to black sky


On Thu, May 7, 2015 at 12:34 PM, Oliver Keyes <[hidden email]> wrote:

> Agreed! That's one of the changes I'd really like to push ahead with,
> although we're going to do some more in-depth data collection before
> any redesign :).
>
> On 6 May 2015 at 20:27, Stuart A. Yeates <[hidden email]> wrote:
>> Reading that excellent presentation, the thought that struck me was:
>>
>> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
>> linking to http://www.wikipedia.org/ is what I'd do."
>>
>> A smarter http://www.wikipedia.org/ might guess geo-location and thus
>> local languages.
>>
>> cheers
>> stuart
>>
>> --
>> ...let us be heard from red core to black sky
>>
>>
>> On Thu, May 7, 2015 at 6:40 AM, Oliver Keyes <[hidden email]> wrote:
>>> Cross-posting to research and analytics, too!
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Oliver Keyes <[hidden email]>
>>> Date: 6 May 2015 at 13:11
>>> Subject: Traffic to the portal from Zero providers
>>> To: [hidden email]
>>>
>>>
>>> Hey all,
>>>
>>> (Throwing this to the public list, because transparency is Good)
>>>
>>> I recently did a presentation on a traffic analysis to the Wikipedia
>>> "home page" - www.wikipedia.org.[1]
>>>
>>> One of the biggest visualisations, in impact terms, showed that a lot
>>> of portal traffic - far more, proportionately, than traffic to
>>> Wikipedia overall - is coming from India and Brazil.[2] One of the
>>> hypotheses was that this could be Zero traffic.
>>>
>>> I've done a basic analysis of the traffic, looking specifically at the
>>> zero headers,[3] and this hypothesis turns out to be incorrect -
>>> almost no zero traffic is hitting the portal. The traffic we're seeing
>>> from Brazil and India is not zero-based.
>>>
>>> This makes a lot of sense (the reason mobile traffic redirects to the
>>> enwiki home page from the portal is the Zero extension, so presumably
>>> this happens specifically to Zero traffic) but it does mean that our
>>> null hypothesis - that this traffic is down to ISP-level or
>>> device-level design choices and links - is more likely to be correct.
>>>
>>> [1] http://ironholds.org/misc/homepage_presentation.html
>>> [2] http://ironholds.org/misc/homepage_presentation.html#/11
>>> [3] https://phabricator.wikimedia.org/T98076
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Kerry Raymond
In reply to this post by Oliver Keyes-4
http://wikimediafoundation.org/wiki/Wikipedia_Zero

Not something that you probably know about if you live in the grey bits of
the map.

Kerry


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Oliver
Keyes
Sent: Thursday, 7 May 2015 10:06 AM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero
providers

Traffic through Wikipedia zero; apologies for not being clear.

On 6 May 2015 at 19:56, Sam Katz <[hidden email]> wrote:

> hey oliver,
>
> I don't mean to be a help vampire...
>
> but what is zero traffic? you think the traffic is being proxied?
> perhaps even reverse proxied?
>
> --Sam
>
> On Wed, May 6, 2015 at 1:40 PM, Oliver Keyes <[hidden email]> wrote:
>> Cross-posting to research and analytics, too!
>>
>>
>> ---------- Forwarded message ----------
>> From: Oliver Keyes <[hidden email]>
>> Date: 6 May 2015 at 13:11
>> Subject: Traffic to the portal from Zero providers
>> To: [hidden email]
>>
>>
>> Hey all,
>>
>> (Throwing this to the public list, because transparency is Good)
>>
>> I recently did a presentation on a traffic analysis to the Wikipedia
>> "home page" - www.wikipedia.org.[1]
>>
>> One of the biggest visualisations, in impact terms, showed that a lot
>> of portal traffic - far more, proportionately, than traffic to
>> Wikipedia overall - is coming from India and Brazil.[2] One of the
>> hypotheses was that this could be Zero traffic.
>>
>> I've done a basic analysis of the traffic, looking specifically at the
>> zero headers,[3] and this hypothesis turns out to be incorrect -
>> almost no zero traffic is hitting the portal. The traffic we're seeing
>> from Brazil and India is not zero-based.
>>
>> This makes a lot of sense (the reason mobile traffic redirects to the
>> enwiki home page from the portal is the Zero extension, so presumably
>> this happens specifically to Zero traffic) but it does mean that our
>> null hypothesis - that this traffic is down to ISP-level or
>> device-level design choices and links - is more likely to be correct.
>>
>> [1] http://ironholds.org/misc/homepage_presentation.html
>> [2] http://ironholds.org/misc/homepage_presentation.html#/11
>> [3] https://phabricator.wikimedia.org/T98076
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Oliver Keyes-4
In reply to this post by Stuart A. Yeates
One thing we could also do is check the accept_language header and
prioritise around that; that way we'd be prioritising specifically
"the language the user's browser thinks they want".

On 6 May 2015 at 21:28, Stuart A. Yeates <[hidden email]> wrote:

> Probably also an excellent time to consider whether we can do anything
> for those languages which don't have wikis yet.
>
> For example, I'm in .nz, which has en, mi and nzs as official
> languages, but we're a long way from an nzs.wiki, given that ase.wiki
> is still in incubator. With the release of Unicode 8 with Sutton
> SignWriting in June, these may or may not kick off in a big way.
>
> cheers
> stuart
> --
> ...let us be heard from red core to black sky
>
>
> On Thu, May 7, 2015 at 12:34 PM, Oliver Keyes <[hidden email]> wrote:
>> Agreed! That's one of the changes I'd really like to push ahead with,
>> although we're going to do some more in-depth data collection before
>> any redesign :).
>>
>> On 6 May 2015 at 20:27, Stuart A. Yeates <[hidden email]> wrote:
>>> Reading that excellent presentation, the thought that struck me was:
>>>
>>> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
>>> linking to http://www.wikipedia.org/ is what I'd do."
>>>
>>> A smarter http://www.wikipedia.org/ might guess geo-location and thus
>>> local languages.
>>>
>>> cheers
>>> stuart
>>>
>>> --
>>> ...let us be heard from red core to black sky
>>>
>>>
>>> On Thu, May 7, 2015 at 6:40 AM, Oliver Keyes <[hidden email]> wrote:
>>>> Cross-posting to research and analytics, too!
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Oliver Keyes <[hidden email]>
>>>> Date: 6 May 2015 at 13:11
>>>> Subject: Traffic to the portal from Zero providers
>>>> To: [hidden email]
>>>>
>>>>
>>>> Hey all,
>>>>
>>>> (Throwing this to the public list, because transparency is Good)
>>>>
>>>> I recently did a presentation on a traffic analysis to the Wikipedia
>>>> "home page" - www.wikipedia.org.[1]
>>>>
>>>> One of the biggest visualisations, in impact terms, showed that a lot
>>>> of portal traffic - far more, proportionately, than traffic to
>>>> Wikipedia overall - is coming from India and Brazil.[2] One of the
>>>> hypotheses was that this could be Zero traffic.
>>>>
>>>> I've done a basic analysis of the traffic, looking specifically at the
>>>> zero headers,[3] and this hypothesis turns out to be incorrect -
>>>> almost no zero traffic is hitting the portal. The traffic we're seeing
>>>> from Brazil and India is not zero-based.
>>>>
>>>> This makes a lot of sense (the reason mobile traffic redirects to the
>>>> enwiki home page from the portal is the Zero extension, so presumably
>>>> this happens specifically to Zero traffic) but it does mean that our
>>>> null hypothesis - that this traffic is down to ISP-level or
>>>> device-level design choices and links - is more likely to be correct.
>>>>
>>>> [1] http://ironholds.org/misc/homepage_presentation.html
>>>> [2] http://ironholds.org/misc/homepage_presentation.html#/11
>>>> [3] https://phabricator.wikimedia.org/T98076
>>>>
>>>> --
>>>> Oliver Keyes
>>>> Research Analyst
>>>> Wikimedia Foundation
>>>>
>>>>
>>>> --
>>>> Oliver Keyes
>>>> Research Analyst
>>>> Wikimedia Foundation
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Mark J. Nelson
In reply to this post by Stuart A. Yeates

Stuart A. Yeates <[hidden email]> writes:

> Reading that excellent presentation, the thought that struck me was:
>
> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
> linking to http://www.wikipedia.org/ is what I'd do."
>
> A smarter http://www.wikipedia.org/ might guess geo-location and thus
> local languages.

I'd also like to see something smarter done at the main page, but the
"and thus" bit here is notoriously tricky.

For example most geolocation-based things, like Wikidata by default,
tend to produce funny results in Denmark. A Copenhagener is offered
something like this choice, in order:

* Danish, Greelandic, Faroese, Swedish, German, ...

The reasoning here is that Danish, Greenlandic, and Faroese are official
languages of the Danish Realm, which includes both Denmark proper, and
two autonomous territories, Greeland and the Faroe Islands. And then
Sweden and Germany are the two neighboring countries.

But for the average Copenhagener, the following order is far more
likely:

* Danish, English, Norwegian Bokmål, ...

The reason here is that Norwegian Bokmål is very close to Danish in
written form (more than Swedish is, and especially more than Faroese is)
while English is a widely used semi-official language in business,
government, and education (for example about half of university theses
are now written in English, and several major companies use it as their
official workplace language).

I think it's possible to come up with something that better aligns with
readers' actual preferences, but it's not easy!

-Mark

--
Mark J. Nelson
Anadrome Research
http://www.kmjn.org

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Oliver Keyes-4
Totally! As said, I think accept-language is a better variable to
operate from. But these are early days; we're just beginning to
understand the space. Realistically, software changes will come a lot
later :)

On 6 May 2015 at 22:24, Mark J. Nelson <[hidden email]> wrote:

>
> Stuart A. Yeates <[hidden email]> writes:
>
>> Reading that excellent presentation, the thought that struck me was:
>>
>> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
>> linking to http://www.wikipedia.org/ is what I'd do."
>>
>> A smarter http://www.wikipedia.org/ might guess geo-location and thus
>> local languages.
>
> I'd also like to see something smarter done at the main page, but the
> "and thus" bit here is notoriously tricky.
>
> For example most geolocation-based things, like Wikidata by default,
> tend to produce funny results in Denmark. A Copenhagener is offered
> something like this choice, in order:
>
> * Danish, Greelandic, Faroese, Swedish, German, ...
>
> The reasoning here is that Danish, Greenlandic, and Faroese are official
> languages of the Danish Realm, which includes both Denmark proper, and
> two autonomous territories, Greeland and the Faroe Islands. And then
> Sweden and Germany are the two neighboring countries.
>
> But for the average Copenhagener, the following order is far more
> likely:
>
> * Danish, English, Norwegian Bokmål, ...
>
> The reason here is that Norwegian Bokmål is very close to Danish in
> written form (more than Swedish is, and especially more than Faroese is)
> while English is a widely used semi-official language in business,
> government, and education (for example about half of university theses
> are now written in English, and several major companies use it as their
> official workplace language).
>
> I think it's possible to come up with something that better aligns with
> readers' actual preferences, but it's not easy!
>
> -Mark
>
> --
> Mark J. Nelson
> Anadrome Research
> http://www.kmjn.org
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Stuart A. Yeates
In reply to this post by Mark J. Nelson
This seems like a great place to use analytics data, for each division
in the geo-location classification, rank each of the languages by
usage and present the top N as likely candidates (+ browser settings)
when we need the user to pick a language.

cheers
stuart
--
...let us be heard from red core to black sky


On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson <[hidden email]> wrote:

>
> Stuart A. Yeates <[hidden email]> writes:
>
>> Reading that excellent presentation, the thought that struck me was:
>>
>> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
>> linking to http://www.wikipedia.org/ is what I'd do."
>>
>> A smarter http://www.wikipedia.org/ might guess geo-location and thus
>> local languages.
>
> I'd also like to see something smarter done at the main page, but the
> "and thus" bit here is notoriously tricky.
>
> For example most geolocation-based things, like Wikidata by default,
> tend to produce funny results in Denmark. A Copenhagener is offered
> something like this choice, in order:
>
> * Danish, Greelandic, Faroese, Swedish, German, ...
>
> The reasoning here is that Danish, Greenlandic, and Faroese are official
> languages of the Danish Realm, which includes both Denmark proper, and
> two autonomous territories, Greeland and the Faroe Islands. And then
> Sweden and Germany are the two neighboring countries.
>
> But for the average Copenhagener, the following order is far more
> likely:
>
> * Danish, English, Norwegian Bokmål, ...
>
> The reason here is that Norwegian Bokmål is very close to Danish in
> written form (more than Swedish is, and especially more than Faroese is)
> while English is a widely used semi-official language in business,
> government, and education (for example about half of university theses
> are now written in English, and several major companies use it as their
> official workplace language).
>
> I think it's possible to come up with something that better aligns with
> readers' actual preferences, but it's not easy!
>
> -Mark
>
> --
> Mark J. Nelson
> Anadrome Research
> http://www.kmjn.org
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Oliver Keyes-4
Possibly. But that sounds potentially wooly and sometimes inaccurate.

When a browser makes a web request, it sends a header called the
accept_language header
(https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
which indicates what languages the browser finds ideal - i.e., what
languages the user and system are using.

If we're going to make modifications here (I hope we will. But again;
early days) I don't see a good argument for using geolocation, which
is, as you've noted, flawed without substantial time and energy being
applied to map those countries to "probable" languages. The data the
browser already sends to the server contains the /certain/ languages.
We can just use that.

On 6 May 2015 at 22:50, Stuart A. Yeates <[hidden email]> wrote:

> This seems like a great place to use analytics data, for each division
> in the geo-location classification, rank each of the languages by
> usage and present the top N as likely candidates (+ browser settings)
> when we need the user to pick a language.
>
> cheers
> stuart
> --
> ...let us be heard from red core to black sky
>
>
> On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson <[hidden email]> wrote:
>>
>> Stuart A. Yeates <[hidden email]> writes:
>>
>>> Reading that excellent presentation, the thought that struck me was:
>>>
>>> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
>>> linking to http://www.wikipedia.org/ is what I'd do."
>>>
>>> A smarter http://www.wikipedia.org/ might guess geo-location and thus
>>> local languages.
>>
>> I'd also like to see something smarter done at the main page, but the
>> "and thus" bit here is notoriously tricky.
>>
>> For example most geolocation-based things, like Wikidata by default,
>> tend to produce funny results in Denmark. A Copenhagener is offered
>> something like this choice, in order:
>>
>> * Danish, Greelandic, Faroese, Swedish, German, ...
>>
>> The reasoning here is that Danish, Greenlandic, and Faroese are official
>> languages of the Danish Realm, which includes both Denmark proper, and
>> two autonomous territories, Greeland and the Faroe Islands. And then
>> Sweden and Germany are the two neighboring countries.
>>
>> But for the average Copenhagener, the following order is far more
>> likely:
>>
>> * Danish, English, Norwegian Bokmål, ...
>>
>> The reason here is that Norwegian Bokmål is very close to Danish in
>> written form (more than Swedish is, and especially more than Faroese is)
>> while English is a widely used semi-official language in business,
>> government, and education (for example about half of university theses
>> are now written in English, and several major companies use it as their
>> official workplace language).
>>
>> I think it's possible to come up with something that better aligns with
>> readers' actual preferences, but it's not easy!
>>
>> -Mark
>>
>> --
>> Mark J. Nelson
>> Anadrome Research
>> http://www.kmjn.org
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Sam Katz
hey guys, you can't guess geolocation, because occasionally you'd be
wrong. this happens to me all the time. I want to read a site in
spanish... and then it thinks I'm in Latin America, when I'm not.

--Sam

On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes <[hidden email]> wrote:

> Possibly. But that sounds potentially wooly and sometimes inaccurate.
>
> When a browser makes a web request, it sends a header called the
> accept_language header
> (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
> which indicates what languages the browser finds ideal - i.e., what
> languages the user and system are using.
>
> If we're going to make modifications here (I hope we will. But again;
> early days) I don't see a good argument for using geolocation, which
> is, as you've noted, flawed without substantial time and energy being
> applied to map those countries to "probable" languages. The data the
> browser already sends to the server contains the /certain/ languages.
> We can just use that.
>
> On 6 May 2015 at 22:50, Stuart A. Yeates <[hidden email]> wrote:
>> This seems like a great place to use analytics data, for each division
>> in the geo-location classification, rank each of the languages by
>> usage and present the top N as likely candidates (+ browser settings)
>> when we need the user to pick a language.
>>
>> cheers
>> stuart
>> --
>> ...let us be heard from red core to black sky
>>
>>
>> On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson <[hidden email]> wrote:
>>>
>>> Stuart A. Yeates <[hidden email]> writes:
>>>
>>>> Reading that excellent presentation, the thought that struck me was:
>>>>
>>>> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
>>>> linking to http://www.wikipedia.org/ is what I'd do."
>>>>
>>>> A smarter http://www.wikipedia.org/ might guess geo-location and thus
>>>> local languages.
>>>
>>> I'd also like to see something smarter done at the main page, but the
>>> "and thus" bit here is notoriously tricky.
>>>
>>> For example most geolocation-based things, like Wikidata by default,
>>> tend to produce funny results in Denmark. A Copenhagener is offered
>>> something like this choice, in order:
>>>
>>> * Danish, Greelandic, Faroese, Swedish, German, ...
>>>
>>> The reasoning here is that Danish, Greenlandic, and Faroese are official
>>> languages of the Danish Realm, which includes both Denmark proper, and
>>> two autonomous territories, Greeland and the Faroe Islands. And then
>>> Sweden and Germany are the two neighboring countries.
>>>
>>> But for the average Copenhagener, the following order is far more
>>> likely:
>>>
>>> * Danish, English, Norwegian Bokmål, ...
>>>
>>> The reason here is that Norwegian Bokmål is very close to Danish in
>>> written form (more than Swedish is, and especially more than Faroese is)
>>> while English is a widely used semi-official language in business,
>>> government, and education (for example about half of university theses
>>> are now written in English, and several major companies use it as their
>>> official workplace language).
>>>
>>> I think it's possible to come up with something that better aligns with
>>> readers' actual preferences, but it's not easy!
>>>
>>> -Mark
>>>
>>> --
>>> Mark J. Nelson
>>> Anadrome Research
>>> http://www.kmjn.org
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

WereSpielChequers-2
When a reader comes to Wikipedia from the web we can detect their IP address and that usually geolocates them to a country. More often than not that then tells you the dominant language of that country.

If we were to default to official or dominant languages then I predict endless arguments as to which language(s) should be the default in which countries. The large expat community in some parts of the Arab world might prefer English over Arabic. India would want to do things by state, and a whole new front would emerge in the Israeli Palestine debate.

Regards

Jonathan Cardy


> On 7 May 2015, at 05:06, Sam Katz <[hidden email]> wrote:
>
> hey guys, you can't guess geolocation, because occasionally you'd be
> wrong. this happens to me all the time. I want to read a site in
> spanish... and then it thinks I'm in Latin America, when I'm not.
>
> --Sam
>
>> On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes <[hidden email]> wrote:
>> Possibly. But that sounds potentially wooly and sometimes inaccurate.
>>
>> When a browser makes a web request, it sends a header called the
>> accept_language header
>> (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
>> which indicates what languages the browser finds ideal - i.e., what
>> languages the user and system are using.
>>
>> If we're going to make modifications here (I hope we will. But again;
>> early days) I don't see a good argument for using geolocation, which
>> is, as you've noted, flawed without substantial time and energy being
>> applied to map those countries to "probable" languages. The data the
>> browser already sends to the server contains the /certain/ languages.
>> We can just use that.
>>
>>> On 6 May 2015 at 22:50, Stuart A. Yeates <[hidden email]> wrote:
>>> This seems like a great place to use analytics data, for each division
>>> in the geo-location classification, rank each of the languages by
>>> usage and present the top N as likely candidates (+ browser settings)
>>> when we need the user to pick a language.
>>>
>>> cheers
>>> stuart
>>> --
>>> ...let us be heard from red core to black sky
>>>
>>>
>>>> On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson <[hidden email]> wrote:
>>>>
>>>> Stuart A. Yeates <[hidden email]> writes:
>>>>
>>>>> Reading that excellent presentation, the thought that struck me was:
>>>>>
>>>>> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
>>>>> linking to http://www.wikipedia.org/ is what I'd do."
>>>>>
>>>>> A smarter http://www.wikipedia.org/ might guess geo-location and thus
>>>>> local languages.
>>>>
>>>> I'd also like to see something smarter done at the main page, but the
>>>> "and thus" bit here is notoriously tricky.
>>>>
>>>> For example most geolocation-based things, like Wikidata by default,
>>>> tend to produce funny results in Denmark. A Copenhagener is offered
>>>> something like this choice, in order:
>>>>
>>>> * Danish, Greelandic, Faroese, Swedish, German, ...
>>>>
>>>> The reasoning here is that Danish, Greenlandic, and Faroese are official
>>>> languages of the Danish Realm, which includes both Denmark proper, and
>>>> two autonomous territories, Greeland and the Faroe Islands. And then
>>>> Sweden and Germany are the two neighboring countries.
>>>>
>>>> But for the average Copenhagener, the following order is far more
>>>> likely:
>>>>
>>>> * Danish, English, Norwegian Bokmål, ...
>>>>
>>>> The reason here is that Norwegian Bokmål is very close to Danish in
>>>> written form (more than Swedish is, and especially more than Faroese is)
>>>> while English is a widely used semi-official language in business,
>>>> government, and education (for example about half of university theses
>>>> are now written in English, and several major companies use it as their
>>>> official workplace language).
>>>>
>>>> I think it's possible to come up with something that better aligns with
>>>> readers' actual preferences, but it's not easy!
>>>>
>>>> -Mark
>>>>
>>>> --
>>>> Mark J. Nelson
>>>> Anadrome Research
>>>> http://www.kmjn.org
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Oliver Keyes-4
As I've now said...4 times, I don't think we'd be using geolocation.
We'd be using the accept-language header. See
https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language

On 7 May 2015 at 00:52, WereSpielChequers <[hidden email]> wrote:

> When a reader comes to Wikipedia from the web we can detect their IP address and that usually geolocates them to a country. More often than not that then tells you the dominant language of that country.
>
> If we were to default to official or dominant languages then I predict endless arguments as to which language(s) should be the default in which countries. The large expat community in some parts of the Arab world might prefer English over Arabic. India would want to do things by state, and a whole new front would emerge in the Israeli Palestine debate.
>
> Regards
>
> Jonathan Cardy
>
>
>> On 7 May 2015, at 05:06, Sam Katz <[hidden email]> wrote:
>>
>> hey guys, you can't guess geolocation, because occasionally you'd be
>> wrong. this happens to me all the time. I want to read a site in
>> spanish... and then it thinks I'm in Latin America, when I'm not.
>>
>> --Sam
>>
>>> On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes <[hidden email]> wrote:
>>> Possibly. But that sounds potentially wooly and sometimes inaccurate.
>>>
>>> When a browser makes a web request, it sends a header called the
>>> accept_language header
>>> (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
>>> which indicates what languages the browser finds ideal - i.e., what
>>> languages the user and system are using.
>>>
>>> If we're going to make modifications here (I hope we will. But again;
>>> early days) I don't see a good argument for using geolocation, which
>>> is, as you've noted, flawed without substantial time and energy being
>>> applied to map those countries to "probable" languages. The data the
>>> browser already sends to the server contains the /certain/ languages.
>>> We can just use that.
>>>
>>>> On 6 May 2015 at 22:50, Stuart A. Yeates <[hidden email]> wrote:
>>>> This seems like a great place to use analytics data, for each division
>>>> in the geo-location classification, rank each of the languages by
>>>> usage and present the top N as likely candidates (+ browser settings)
>>>> when we need the user to pick a language.
>>>>
>>>> cheers
>>>> stuart
>>>> --
>>>> ...let us be heard from red core to black sky
>>>>
>>>>
>>>>> On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson <[hidden email]> wrote:
>>>>>
>>>>> Stuart A. Yeates <[hidden email]> writes:
>>>>>
>>>>>> Reading that excellent presentation, the thought that struck me was:
>>>>>>
>>>>>> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
>>>>>> linking to http://www.wikipedia.org/ is what I'd do."
>>>>>>
>>>>>> A smarter http://www.wikipedia.org/ might guess geo-location and thus
>>>>>> local languages.
>>>>>
>>>>> I'd also like to see something smarter done at the main page, but the
>>>>> "and thus" bit here is notoriously tricky.
>>>>>
>>>>> For example most geolocation-based things, like Wikidata by default,
>>>>> tend to produce funny results in Denmark. A Copenhagener is offered
>>>>> something like this choice, in order:
>>>>>
>>>>> * Danish, Greelandic, Faroese, Swedish, German, ...
>>>>>
>>>>> The reasoning here is that Danish, Greenlandic, and Faroese are official
>>>>> languages of the Danish Realm, which includes both Denmark proper, and
>>>>> two autonomous territories, Greeland and the Faroe Islands. And then
>>>>> Sweden and Germany are the two neighboring countries.
>>>>>
>>>>> But for the average Copenhagener, the following order is far more
>>>>> likely:
>>>>>
>>>>> * Danish, English, Norwegian Bokmål, ...
>>>>>
>>>>> The reason here is that Norwegian Bokmål is very close to Danish in
>>>>> written form (more than Swedish is, and especially more than Faroese is)
>>>>> while English is a widely used semi-official language in business,
>>>>> government, and education (for example about half of university theses
>>>>> are now written in English, and several major companies use it as their
>>>>> official workplace language).
>>>>>
>>>>> I think it's possible to come up with something that better aligns with
>>>>> readers' actual preferences, but it's not easy!
>>>>>
>>>>> -Mark
>>>>>
>>>>> --
>>>>> Mark J. Nelson
>>>>> Anadrome Research
>>>>> http://www.kmjn.org
>>>>>
>>>>> _______________________________________________
>>>>> Wiki-research-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Yuvi Panda
In reply to this post by WereSpielChequers-2
Yes, but what about people not on earth? https://xkcd.com/713/ and
similar have to be taken into consideration as well for such an
important part of the wikipedia experience, I believe. It's 'Free
knowledge for all', not 'Free knowledge for all that we can accurately
geolocate'.

I wonder if we can set a permanent cookie after asking people with a
large modal dialog box about their language preferences on first load.
Thoughts?

On Wed, May 6, 2015 at 9:52 PM, WereSpielChequers
<[hidden email]> wrote:

> When a reader comes to Wikipedia from the web we can detect their IP address and that usually geolocates them to a country. More often than not that then tells you the dominant language of that country.
>
> If we were to default to official or dominant languages then I predict endless arguments as to which language(s) should be the default in which countries. The large expat community in some parts of the Arab world might prefer English over Arabic. India would want to do things by state, and a whole new front would emerge in the Israeli Palestine debate.
>
> Regards
>
> Jonathan Cardy
>
>
>> On 7 May 2015, at 05:06, Sam Katz <[hidden email]> wrote:
>>
>> hey guys, you can't guess geolocation, because occasionally you'd be
>> wrong. this happens to me all the time. I want to read a site in
>> spanish... and then it thinks I'm in Latin America, when I'm not.
>>
>> --Sam
>>
>>> On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes <[hidden email]> wrote:
>>> Possibly. But that sounds potentially wooly and sometimes inaccurate.
>>>
>>> When a browser makes a web request, it sends a header called the
>>> accept_language header
>>> (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
>>> which indicates what languages the browser finds ideal - i.e., what
>>> languages the user and system are using.
>>>
>>> If we're going to make modifications here (I hope we will. But again;
>>> early days) I don't see a good argument for using geolocation, which
>>> is, as you've noted, flawed without substantial time and energy being
>>> applied to map those countries to "probable" languages. The data the
>>> browser already sends to the server contains the /certain/ languages.
>>> We can just use that.
>>>
>>>> On 6 May 2015 at 22:50, Stuart A. Yeates <[hidden email]> wrote:
>>>> This seems like a great place to use analytics data, for each division
>>>> in the geo-location classification, rank each of the languages by
>>>> usage and present the top N as likely candidates (+ browser settings)
>>>> when we need the user to pick a language.
>>>>
>>>> cheers
>>>> stuart
>>>> --
>>>> ...let us be heard from red core to black sky
>>>>
>>>>
>>>>> On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson <[hidden email]> wrote:
>>>>>
>>>>> Stuart A. Yeates <[hidden email]> writes:
>>>>>
>>>>>> Reading that excellent presentation, the thought that struck me was:
>>>>>>
>>>>>> "If I wanted to subvert the assumption that Wikipedia == en.wiki,
>>>>>> linking to http://www.wikipedia.org/ is what I'd do."
>>>>>>
>>>>>> A smarter http://www.wikipedia.org/ might guess geo-location and thus
>>>>>> local languages.
>>>>>
>>>>> I'd also like to see something smarter done at the main page, but the
>>>>> "and thus" bit here is notoriously tricky.
>>>>>
>>>>> For example most geolocation-based things, like Wikidata by default,
>>>>> tend to produce funny results in Denmark. A Copenhagener is offered
>>>>> something like this choice, in order:
>>>>>
>>>>> * Danish, Greelandic, Faroese, Swedish, German, ...
>>>>>
>>>>> The reasoning here is that Danish, Greenlandic, and Faroese are official
>>>>> languages of the Danish Realm, which includes both Denmark proper, and
>>>>> two autonomous territories, Greeland and the Faroe Islands. And then
>>>>> Sweden and Germany are the two neighboring countries.
>>>>>
>>>>> But for the average Copenhagener, the following order is far more
>>>>> likely:
>>>>>
>>>>> * Danish, English, Norwegian Bokmål, ...
>>>>>
>>>>> The reason here is that Norwegian Bokmål is very close to Danish in
>>>>> written form (more than Swedish is, and especially more than Faroese is)
>>>>> while English is a widely used semi-official language in business,
>>>>> government, and education (for example about half of university theses
>>>>> are now written in English, and several major companies use it as their
>>>>> official workplace language).
>>>>>
>>>>> I think it's possible to come up with something that better aligns with
>>>>> readers' actual preferences, but it's not easy!
>>>>>
>>>>> -Mark
>>>>>
>>>>> --
>>>>> Mark J. Nelson
>>>>> Anadrome Research
>>>>> http://www.kmjn.org
>>>>>
>>>>> _______________________________________________
>>>>> Wiki-research-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Yuvi Panda T
http://yuvi.in/blog

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Federico Leva (Nemo)
In reply to this post by Mark J. Nelson
Thanks for looking into www.wikipedia.org traffic from India; I've been
"complaining" about it for a while. :) See also:
* https://phabricator.wikimedia.org/T26767
* https://phabricator.wikimedia.org/T5665

Mark J. Nelson, 07/05/2015 04:24:
> But for the average Copenhagener, the following order is far more
> likely:
>
> * Danish, English, Norwegian Bokmål, ...

This is something you can help fix. Please do!
https://www.mediawiki.org/wiki/ULS/FAQ#language-territory

Nemo

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Oliver Keyes-4
Thanks for the bugs, Nemo!

(search team: should we take those over?)

On 7 May 2015 at 03:08, Federico Leva (Nemo) <[hidden email]> wrote:

> Thanks for looking into www.wikipedia.org traffic from India; I've been
> "complaining" about it for a while. :) See also:
> * https://phabricator.wikimedia.org/T26767
> * https://phabricator.wikimedia.org/T5665
>
> Mark J. Nelson, 07/05/2015 04:24:
>>
>> But for the average Copenhagener, the following order is far more
>> likely:
>>
>> * Danish, English, Norwegian Bokmål, ...
>
>
> This is something you can help fix. Please do!
> https://www.mediawiki.org/wiki/ULS/FAQ#language-territory
>
> Nemo
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Scott Hale
In reply to this post by Oliver Keyes-4
The accept-language header is the obvious place to start, but there is amble scope to combine multiple approaches together. 

In addition to accept-language and geolocation data, any logged in user will have view/edit history related to multiple editions. If the user is requesting a specific article, (e.g., https://www.wikipedia.org/wiki/普天間飛行場 ) we also can take account of what editions actually have the article --- the vast majority of content on Wikipedia only exists in one language or a few languages. (I.e., the above link redirects me to create the article on en-wiki although it exists on ja-wiki and Japanese is my second preferred language by my accept-language header and is an edition I edit captured in my edit history)

This isn't an either-or question of which to use, but rather a question of how all these indicators can be used together to create the best experience. I would venture that most users don't change their accept-language header (not even possible on some mobile browsers!) and hence probably list give only one language. If so, geography and edit history can be signals for possible second languages beyond the one language in the accept-language header when hitting the homepage without a specific article.

Cheers,
Scott

P.S. It looks like the Universal Language Selector already uses the accept-language header for its preference screen.

On Thu, May 7, 2015 at 5:58 AM, Oliver Keyes <[hidden email]> wrote:
As I've now said...4 times, I don't think we'd be using geolocation.
We'd be using the accept-language header. See
https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language

On 7 May 2015 at 00:52, WereSpielChequers <[hidden email]> wrote:
> When a reader comes to Wikipedia from the web we can detect their IP address and that usually geolocates them to a country. More often than not that then tells you the dominant language of that country.
>
> If we were to default to official or dominant languages then I predict endless arguments as to which language(s) should be the default in which countries. The large expat community in some parts of the Arab world might prefer English over Arabic. India would want to do things by state, and a whole new front would emerge in the Israeli Palestine debate.
>
> Regards
>
> Jonathan Cardy
>
>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Traffic to the portal from Zero providers

Oliver Keyes-4
Makes sense! I actually hadn't factored in that sort of action
(although it does happen), more: the order of the main page links on
the root www.wikipedia.org page.

On 7 May 2015 at 03:51, Scott Hale <[hidden email]> wrote:

> The accept-language header is the obvious place to start, but there is amble
> scope to combine multiple approaches together.
>
> In addition to accept-language and geolocation data, any logged in user will
> have view/edit history related to multiple editions. If the user is
> requesting a specific article, (e.g., https://www.wikipedia.org/wiki/普天間飛行場
> ) we also can take account of what editions actually have the article ---
> the vast majority of content on Wikipedia only exists in one language or a
> few languages. (I.e., the above link redirects me to create the article on
> en-wiki although it exists on ja-wiki and Japanese is my second preferred
> language by my accept-language header and is an edition I edit captured in
> my edit history)
>
> This isn't an either-or question of which to use, but rather a question of
> how all these indicators can be used together to create the best experience.
> I would venture that most users don't change their accept-language header
> (not even possible on some mobile browsers!) and hence probably list give
> only one language. If so, geography and edit history can be signals for
> possible second languages beyond the one language in the accept-language
> header when hitting the homepage without a specific article.
>
> Cheers,
> Scott
>
> P.S. It looks like the Universal Language Selector already uses the
> accept-language header for its preference screen.
>
> On Thu, May 7, 2015 at 5:58 AM, Oliver Keyes <[hidden email]> wrote:
>>
>> As I've now said...4 times, I don't think we'd be using geolocation.
>> We'd be using the accept-language header. See
>> https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language
>>
>> On 7 May 2015 at 00:52, WereSpielChequers <[hidden email]>
>> wrote:
>> > When a reader comes to Wikipedia from the web we can detect their IP
>> > address and that usually geolocates them to a country. More often than not
>> > that then tells you the dominant language of that country.
>> >
>> > If we were to default to official or dominant languages then I predict
>> > endless arguments as to which language(s) should be the default in which
>> > countries. The large expat community in some parts of the Arab world might
>> > prefer English over Arabic. India would want to do things by state, and a
>> > whole new front would emerge in the Israeli Palestine debate.
>> >
>> > Regards
>> >
>> > Jonathan Cardy
>> >
>> >
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
12