Quantcast

Chinese conversion, search engines, and auto-detection

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Chinese conversion, search engines, and auto-detection

Amir E. Aharoni
Hi,

I wondered about some things around the Chinese variant conversion:

* When a person uses a search engine, do the links in the results point
directly to one of the variants? That is, does it point to
https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to
zh.wikipedia.org/wiki/Article_name ? I guess that among Chinese-speaking
people Google is not necessarily as ubiquitous as elsewhere, so there is
probably a separate answer for each search engine.

* If for any search engine the answer above is "yes", does anybody have an
idea about how does that search engine guess the preferred variant? Usage
of simplified / traditional characters in the search query? Geolocation?
Preferred language settings in the browser ("Accept-Language")? Preferences
in the search engine itself? A combination of all of the above? Something
else?

* Does any of the search engine show direct links to country-based variants
- zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans and
zh-hant?

* For users who didn't log in, is the variant selection remembered in a
cookie or in localStorage?

I cannot easily test any of these things myself, because I don't speak
Chinese, I'm not familiar with Chinese search engines, and I don't live in
a Chinese-speaking country (and geolocation matters). But since I care
about language, I'm very curious about this.

Thanks!

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Chinese conversion, search engines, and auto-detection

Brian Wolff
I think the point of all those <link rel=alternate hreflang=foo> tags was
so google linked to right variant, but i am unsure.

--
brian

On Wednesday, May 17, 2017, Amir E. Aharoni <[hidden email]>
wrote:

> Hi,
>
> I wondered about some things around the Chinese variant conversion:
>
> * When a person uses a search engine, do the links in the results point
> directly to one of the variants? That is, does it point to
> https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to
> zh.wikipedia.org/wiki/Article_name ? I guess that among Chinese-speaking
> people Google is not necessarily as ubiquitous as elsewhere, so there is
> probably a separate answer for each search engine.
>
> * If for any search engine the answer above is "yes", does anybody have an
> idea about how does that search engine guess the preferred variant? Usage
> of simplified / traditional characters in the search query? Geolocation?
> Preferred language settings in the browser ("Accept-Language")?
Preferences
> in the search engine itself? A combination of all of the above? Something
> else?
>
> * Does any of the search engine show direct links to country-based
variants

> - zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans and
> zh-hant?
>
> * For users who didn't log in, is the variant selection remembered in a
> cookie or in localStorage?
>
> I cannot easily test any of these things myself, because I don't speak
> Chinese, I'm not familiar with Chinese search engines, and I don't live in
> a Chinese-speaking country (and geolocation matters). But since I care
> about language, I'm very curious about this.
>
> Thanks!
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Chinese conversion, search engines, and auto-detection

Amir E. Aharoni
Quite possible, but does it actually happen?


--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬

2017-05-17 23:25 GMT+03:00 Brian Wolff <[hidden email]>:

> I think the point of all those <link rel=alternate hreflang=foo> tags was
> so google linked to right variant, but i am unsure.
>
> --
> brian
>
> On Wednesday, May 17, 2017, Amir E. Aharoni <[hidden email]>
> wrote:
> > Hi,
> >
> > I wondered about some things around the Chinese variant conversion:
> >
> > * When a person uses a search engine, do the links in the results point
> > directly to one of the variants? That is, does it point to
> > https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to
> > zh.wikipedia.org/wiki/Article_name ? I guess that among Chinese-speaking
> > people Google is not necessarily as ubiquitous as elsewhere, so there is
> > probably a separate answer for each search engine.
> >
> > * If for any search engine the answer above is "yes", does anybody have
> an
> > idea about how does that search engine guess the preferred variant? Usage
> > of simplified / traditional characters in the search query? Geolocation?
> > Preferred language settings in the browser ("Accept-Language")?
> Preferences
> > in the search engine itself? A combination of all of the above? Something
> > else?
> >
> > * Does any of the search engine show direct links to country-based
> variants
> > - zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans and
> > zh-hant?
> >
> > * For users who didn't log in, is the variant selection remembered in a
> > cookie or in localStorage?
> >
> > I cannot easily test any of these things myself, because I don't speak
> > Chinese, I'm not familiar with Chinese search engines, and I don't live
> in
> > a Chinese-speaking country (and geolocation matters). But since I care
> > about language, I'm very curious about this.
> >
> > Thanks!
> >
> > --
> > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> > http://aharoni.wordpress.com
> > ‪“We're living in pieces,
> > I want to live in peace.” – T. Moore‬
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Chinese conversion, search engines, and auto-detection

Brian Wolff
Supposedly google listens:
https://support.google.com/webmasters/answer/189077?hl=en

On Wednesday, May 17, 2017, Amir E. Aharoni <[hidden email]>
wrote:

> Quite possible, but does it actually happen?
>
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>
> 2017-05-17 23:25 GMT+03:00 Brian Wolff <[hidden email]>:
>
>> I think the point of all those <link rel=alternate hreflang=foo> tags was
>> so google linked to right variant, but i am unsure.
>>
>> --
>> brian
>>
>> On Wednesday, May 17, 2017, Amir E. Aharoni <[hidden email]
>
>> wrote:
>> > Hi,
>> >
>> > I wondered about some things around the Chinese variant conversion:
>> >
>> > * When a person uses a search engine, do the links in the results point
>> > directly to one of the variants? That is, does it point to
>> > https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to
>> > zh.wikipedia.org/wiki/Article_name ? I guess that among
Chinese-speaking
>> > people Google is not necessarily as ubiquitous as elsewhere, so there
is
>> > probably a separate answer for each search engine.
>> >
>> > * If for any search engine the answer above is "yes", does anybody have
>> an
>> > idea about how does that search engine guess the preferred variant?
Usage
>> > of simplified / traditional characters in the search query?
Geolocation?
>> > Preferred language settings in the browser ("Accept-Language")?
>> Preferences
>> > in the search engine itself? A combination of all of the above?
Something

>> > else?
>> >
>> > * Does any of the search engine show direct links to country-based
>> variants
>> > - zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans and
>> > zh-hant?
>> >
>> > * For users who didn't log in, is the variant selection remembered in a
>> > cookie or in localStorage?
>> >
>> > I cannot easily test any of these things myself, because I don't speak
>> > Chinese, I'm not familiar with Chinese search engines, and I don't live
>> in
>> > a Chinese-speaking country (and geolocation matters). But since I care
>> > about language, I'm very curious about this.
>> >
>> > Thanks!
>> >
>> > --
>> > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
>> > http://aharoni.wordpress.com
>> > ‪“We're living in pieces,
>> > I want to live in peace.” – T. Moore‬
>> > _______________________________________________
>> > Wikitech-l mailing list
>> > [hidden email]
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Chinese conversion, search engines, and auto-detection

Amir E. Aharoni
Yeah... but I would love to hear actual experiences of actual Chinese
speakers. (Although I certainly do appreciate other relevant replies, like
yours.)

For every search engine, and often for every search engine _user_ the
actual results are likely different.

If it works well for Chinese speakers—good to know.

If it doesn't, then I'd like to know it, and to think whether we can do
anything about it in our software.


--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬

2017-05-18 2:35 GMT+03:00 Brian Wolff <[hidden email]>:

> Supposedly google listens:
> https://support.google.com/webmasters/answer/189077?hl=en
>
> On Wednesday, May 17, 2017, Amir E. Aharoni <[hidden email]>
> wrote:
> > Quite possible, but does it actually happen?
> >
> >
> > --
> > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> > http://aharoni.wordpress.com
> > ‪“We're living in pieces,
> > I want to live in peace.” – T. Moore‬
> >
> > 2017-05-17 23:25 GMT+03:00 Brian Wolff <[hidden email]>:
> >
> >> I think the point of all those <link rel=alternate hreflang=foo> tags
> was
> >> so google linked to right variant, but i am unsure.
> >>
> >> --
> >> brian
> >>
> >> On Wednesday, May 17, 2017, Amir E. Aharoni <
> [hidden email]
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > I wondered about some things around the Chinese variant conversion:
> >> >
> >> > * When a person uses a search engine, do the links in the results
> point
> >> > directly to one of the variants? That is, does it point to
> >> > https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to
> >> > zh.wikipedia.org/wiki/Article_name ? I guess that among
> Chinese-speaking
> >> > people Google is not necessarily as ubiquitous as elsewhere, so there
> is
> >> > probably a separate answer for each search engine.
> >> >
> >> > * If for any search engine the answer above is "yes", does anybody
> have
> >> an
> >> > idea about how does that search engine guess the preferred variant?
> Usage
> >> > of simplified / traditional characters in the search query?
> Geolocation?
> >> > Preferred language settings in the browser ("Accept-Language")?
> >> Preferences
> >> > in the search engine itself? A combination of all of the above?
> Something
> >> > else?
> >> >
> >> > * Does any of the search engine show direct links to country-based
> >> variants
> >> > - zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans
> and
> >> > zh-hant?
> >> >
> >> > * For users who didn't log in, is the variant selection remembered in
> a
> >> > cookie or in localStorage?
> >> >
> >> > I cannot easily test any of these things myself, because I don't speak
> >> > Chinese, I'm not familiar with Chinese search engines, and I don't
> live
> >> in
> >> > a Chinese-speaking country (and geolocation matters). But since I care
> >> > about language, I'm very curious about this.
> >> >
> >> > Thanks!
> >> >
> >> > --
> >> > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> >> > http://aharoni.wordpress.com
> >> > ‪“We're living in pieces,
> >> > I want to live in peace.” – T. Moore‬
> >> > _______________________________________________
> >> > Wikitech-l mailing list
> >> > [hidden email]
> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >> _______________________________________________
> >> Wikitech-l mailing list
> >> [hidden email]
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Chinese conversion, search engines, and auto-detection

C. Scott Ananian
In reply to this post by Amir E. Aharoni
David or Liangent might be able to answer your questions.
  --scott

On May 17, 2017 12:01 PM, "Amir E. Aharoni" <[hidden email]>
wrote:

> Hi,
>
> I wondered about some things around the Chinese variant conversion:
>
> * When a person uses a search engine, do the links in the results point
> directly to one of the variants? That is, does it point to
> https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to
> zh.wikipedia.org/wiki/Article_name ? I guess that among Chinese-speaking
> people Google is not necessarily as ubiquitous as elsewhere, so there is
> probably a separate answer for each search engine.
>
> * If for any search engine the answer above is "yes", does anybody have an
> idea about how does that search engine guess the preferred variant? Usage
> of simplified / traditional characters in the search query? Geolocation?
> Preferred language settings in the browser ("Accept-Language")? Preferences
> in the search engine itself? A combination of all of the above? Something
> else?
>
> * Does any of the search engine show direct links to country-based variants
> - zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans and
> zh-hant?
>
> * For users who didn't log in, is the variant selection remembered in a
> cookie or in localStorage?
>
> I cannot easily test any of these things myself, because I don't speak
> Chinese, I'm not familiar with Chinese search engines, and I don't live in
> a Chinese-speaking country (and geolocation matters). But since I care
> about language, I'm very curious about this.
>
> Thanks!
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Chinese conversion, search engines, and auto-detection

Trey Jones
In reply to this post by Amir E. Aharoni
>
> * For users who didn't log in, is the variant selection remembered in a

cookie or in localStorage?


A quick test when I'm not logged in indicates that variant selection isn't
remembered between searches, or even when following links! The default
"/wiki/" version of the page can be mixed Traditional and Simplified
characters (which seems maximally unhelpful).

If it doesn't, then I'd like to know it, and to think whether we can do
> anything about it in our software.


Discovery is almost ready to deploy changes to improve searching on Chinese
wikis. We are using an Elastic plugin to convert Traditional Chinese
characters to Simplified, and another plugin that does a better job of
segmenting Simplified text into words (the plugin only works on Simplified
text—which is why we haven't used it in the past: it's pretty bad on
Traditional text).

So, searching for Simplified or Traditional text will find the other
(modulo the imperfections in the software—but it's definitely a big
improvement). "Exact" matches are still weighted some, so Traditional and
Simplified variants don't always return the exact same results, but they
will be much more similar compared to the current situation, where they
don't necessarily overlap at all.

The changes should go out after the Hackathon (and possibly some vacations
tacked on to the Hackathon), and then we'll have to re-index Chinese wikis
to make them live. So, not yet—but soon!

—Trey

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation

On Thu, May 18, 2017 at 1:41 AM, Amir E. Aharoni <
[hidden email]> wrote:

> Yeah... but I would love to hear actual experiences of actual Chinese
> speakers. (Although I certainly do appreciate other relevant replies, like
> yours.)
>
> For every search engine, and often for every search engine _user_ the
> actual results are likely different.
>
> If it works well for Chinese speakers—good to know.
>
> If it doesn't, then I'd like to know it, and to think whether we can do
> anything about it in our software.
>
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>
> 2017-05-18 2:35 GMT+03:00 Brian Wolff <[hidden email]>:
>
> > Supposedly google listens:
> > https://support.google.com/webmasters/answer/189077?hl=en
> >
> > On Wednesday, May 17, 2017, Amir E. Aharoni <
> [hidden email]>
> > wrote:
> > > Quite possible, but does it actually happen?
> > >
> > >
> > > --
> > > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> > > http://aharoni.wordpress.com
> > > ‪“We're living in pieces,
> > > I want to live in peace.” – T. Moore‬
> > >
> > > 2017-05-17 23:25 GMT+03:00 Brian Wolff <[hidden email]>:
> > >
> > >> I think the point of all those <link rel=alternate hreflang=foo> tags
> > was
> > >> so google linked to right variant, but i am unsure.
> > >>
> > >> --
> > >> brian
> > >>
> > >> On Wednesday, May 17, 2017, Amir E. Aharoni <
> > [hidden email]
> > >
> > >> wrote:
> > >> > Hi,
> > >> >
> > >> > I wondered about some things around the Chinese variant conversion:
> > >> >
> > >> > * When a person uses a search engine, do the links in the results
> > point
> > >> > directly to one of the variants? That is, does it point to
> > >> > https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to
> > >> > zh.wikipedia.org/wiki/Article_name ? I guess that among
> > Chinese-speaking
> > >> > people Google is not necessarily as ubiquitous as elsewhere, so
> there
> > is
> > >> > probably a separate answer for each search engine.
> > >> >
> > >> > * If for any search engine the answer above is "yes", does anybody
> > have
> > >> an
> > >> > idea about how does that search engine guess the preferred variant?
> > Usage
> > >> > of simplified / traditional characters in the search query?
> > Geolocation?
> > >> > Preferred language settings in the browser ("Accept-Language")?
> > >> Preferences
> > >> > in the search engine itself? A combination of all of the above?
> > Something
> > >> > else?
> > >> >
> > >> > * Does any of the search engine show direct links to country-based
> > >> variants
> > >> > - zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans
> > and
> > >> > zh-hant?
> > >> >
> > >> > * For users who didn't log in, is the variant selection remembered
> in
> > a
> > >> > cookie or in localStorage?
> > >> >
> > >> > I cannot easily test any of these things myself, because I don't
> speak
> > >> > Chinese, I'm not familiar with Chinese search engines, and I don't
> > live
> > >> in
> > >> > a Chinese-speaking country (and geolocation matters). But since I
> care
> > >> > about language, I'm very curious about this.
> > >> >
> > >> > Thanks!
> > >> >
> > >> > --
> > >> > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> > >> > http://aharoni.wordpress.com
> > >> > ‪“We're living in pieces,
> > >> > I want to live in peace.” – T. Moore‬
> > >> > _______________________________________________
> > >> > Wikitech-l mailing list
> > >> > [hidden email]
> > >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >> _______________________________________________
> > >> Wikitech-l mailing list
> > >> [hidden email]
> > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Chinese conversion, search engines, and auto-detection

Federico Leva (Nemo)
In reply to this post by Amir E. Aharoni
Indexing is a known issue, tracked at:
* https://phabricator.wikimedia.org/T93213
* https://phabricator.wikimedia.org/T54429

(Tilman, Kaldari or others with Google Search Console access may quickly
provide an update on https://phabricator.wikimedia.org/T93213#2518417 .)

Nemo

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Loading...