Language variants

classic Classic list List threaded Threaded
69 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Language variants

Helder Geovane Gomes de Lima
Hello!

I noticed at sr.wikipedia there is an option "Variant" under
"Internationalization" at the preferences. How is that different from
the 'sr', 'sr-ec' and 'sr-el' which are shown at "Language" option
(also under "Internationalization")?

I'm interested in this because there are some differences between
"Brazilian Portuguese" ('pt-br') and "Portuguese of Portugal" ('pt')
which usually cause troubles for the admins at the Portuguese
projects, who needs to warn the users not to change the wording of the
texts from one variant to another (this usually happens, mainly from
anonymous contributions), because some differences between the
variants seems to be [at a first glance] a typo, and they want to
"correct" it...

So, I would like to know if there is currently any feature which could
help us to avoid the problem of having a divided community of users
('pt' x 'pt-br') "fighting" with each other ad infinitum... (and to
avoid proposals like that [1] of a new "Brazilian Wikipedia", which
IMHO will not have any good result, and is not the better way of
solving the problem...)

I found [http://strategy.wikimedia.org/w/index.php?title=Proposal_talk%3AA_Brazilian_Portuguese_Wikipedia&diff=14163&oldid=13621
a comment] about the existence of "on-the-fly translation" for some
languages (Chinese and Serbian), but I don't know how it works, and if
it solves or improve the situation.

And before this I was also thinking of use (a possible enhanced
version of) a procedure like this: considering that currently it is
possible to show a system message using {{int:MESSAGE}} in the
wikitext in a way that the result changes according to the user's
language, would it be possible to create new messages at "MediaWiki:"
Namespace just for defining language variants of words which usually
appears at the content of the projects? For example, would it be
possible to create "MediaWiki:WORD/pt-br" and "MediaWiki:WORD/pt", and
use them (with {{int:WORD}}) instead of the actual word variant in
wikitext? This isn't likely to be the better solution, but it could be
a first step towards a solution...

Any thoughts on how could Portuguese community improve the situation
at pt.* projects?
(is there any other list I should ask about this?)

Helder

[1] http://strategy.wikimedia.org/wiki/Proposal:A_Brazilian_Portuguese_Wikipedia

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Roan Kattouw-2
2009/9/9 Helder Geovane Gomes de Lima <[hidden email]>:

> Hello!
>
> I noticed at sr.wikipedia there is an option "Variant" under
> "Internationalization" at the preferences. How is that different from
> the 'sr', 'sr-ec' and 'sr-el' which are shown at "Language" option
> (also under "Internationalization")?
>
> I'm interested in this because there are some differences between
> "Brazilian Portuguese" ('pt-br') and "Portuguese of Portugal" ('pt')
> which usually cause troubles for the admins at the Portuguese
> projects, who needs to warn the users not to change the wording of the
> texts from one variant to another (this usually happens, mainly from
> anonymous contributions), because some differences between the
> variants seems to be [at a first glance] a typo, and they want to
> "correct" it...
>
sr-ec and sr-el refer to the Latin and Cyrillic variants of Serbian
(not sure which is which), and AFAIK the software can convert
everything, even article text, because the conversion rules are so
simple that a computer can execute them. Basically, sr-ec and sr-el
have the same text in the same language, but in different alphabets.
(This is my understanding, which may be completely wrong; in that
case, please correct me.)

The difference between pt and pt-br are more delicate than that, and
the two can't be autoconverted between by a computer, because of
differences in spelling word usage and grammar(?).

> So, I would like to know if there is currently any feature which could
> help us to avoid the problem of having a divided community of users
> ('pt' x 'pt-br') "fighting" with each other ad infinitum... (and to
> avoid proposals like that [1] of a new "Brazilian Wikipedia", which
> IMHO will not have any good result, and is not the better way of
> solving the problem...)
>
No. We already offer users the choice between having the interface in
pt or pt-br (or any other language, really), but such a choice doesn't
exist for the content.

> I found [http://strategy.wikimedia.org/w/index.php?title=Proposal_talk%3AA_Brazilian_Portuguese_Wikipedia&diff=14163&oldid=13621
> a comment] about the existence of "on-the-fly translation" for some
> languages (Chinese and Serbian), but I don't know how it works, and if
> it solves or improve the situation.
>
That's the alphabet variant thing I mentioned earlier. If the majority
of the differences between pt and pt-br can be summed up with simple
rules that a computer can handle, we might be able to work something
out. However, that's usually not the case; I don't know Portugese, but
I do know that handling even simple differences between en-us and
en-gb is too complex already: a system that would successfully convert
'realise' to 'realize' may also try to wrongfully convert 'disguise'.

> And before this I was also thinking of use (a possible enhanced
> version of) a procedure like this: considering that currently it is
> possible to show a system message using {{int:MESSAGE}} in the
> wikitext in a way that the result changes according to the user's
> language, would it be possible to create new messages at "MediaWiki:"
> Namespace just for defining language variants of words which usually
> appears at the content of the projects? For example, would it be
> possible to create "MediaWiki:WORD/pt-br" and "MediaWiki:WORD/pt", and
> use them (with {{int:WORD}}) instead of the actual word variant in
> wikitext? This isn't likely to be the better solution, but it could be
> a first step towards a solution...
>
This sounds like it could work, but only if the /langcode trick
actually works (I don't know what that depends on) and if there's a
relatively small set of words that makes a relatively big difference
(otherwise it'd be more trouble than it's worth IMO; but that's up to
the community).

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

David Gerard-2
2009/9/9 Roan Kattouw <[hidden email]>:
> 2009/9/9 Helder Geovane Gomes de Lima <[hidden email]>:

>> So, I would like to know if there is currently any feature which could
>> help us to avoid the problem of having a divided community of users
>> ('pt' x 'pt-br') "fighting" with each other ad infinitum... (and to
>> avoid proposals like that [1] of a new "Brazilian Wikipedia", which
>> IMHO will not have any good result, and is not the better way of
>> solving the problem...)

> No. We already offer users the choice between having the interface in
> pt or pt-br (or any other language, really), but such a choice doesn't
> exist for the content.


This is a community issue. Having a single pt:wp is a win because
there's more content in one place and it avoids local-POV bias, same
as there's one en:wp rather than US-English and Commonwealth-English.

So you need a community rule.

The rule we have on en:wp is:

1. It doesn't matter.
2. Use the variant spoken in the location, if relevant.
3. Don't change articles from one to the other except per 2.
4. Try not to worry too much about it.

4. is the important step ;-) It should be simple enough to let new
users know the rule and "not to worry about which variant" :-)


- d.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Tim Starling-2
In reply to this post by Roan Kattouw-2
Roan Kattouw wrote:
> That's the alphabet variant thing I mentioned earlier. If the majority
> of the differences between pt and pt-br can be summed up with simple
> rules that a computer can handle, we might be able to work something
> out. However, that's usually not the case; I don't know Portugese, but
> I do know that handling even simple differences between en-us and
> en-gb is too complex already: a system that would successfully convert
> 'realise' to 'realize' may also try to wrongfully convert 'disguise'.

I don't know why you're writing this nonsense, you obviously haven't
looked at the code at all.

The language variant system that we have could easily convert between
US and UK English. In fact it already does convert between a language
pair with a far more complex relationship, that is Simplified and
Traditional Chinese.

The language conversion system is very simple, it's just a table of
translated pairs, where the longest match takes precedence. The
translation table in one direction (e.g. UK -> US) can be different to
the table in the other direction (US -> UK). You would not list "ize
-> ise", you would list every word in the dictionary with an -ize
ending that can be translated to -ise without controversy. The current
software could handle 50k pairs or so without serious performance
problems, and it could be extended and optimised to allow millions of
pairs if there was a need for that.

It's possible to handle any pair of languages which are separated only
by vocabulary, and transliteration or spelling. It's only differences
in grammar, such as word order, that would give it trouble.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Helder Geovane Gomes de Lima
Nice! ;-)

Do you think tables like these
http://pt.wiktionary.org/wiki/Wikcionário:Versões da língua portuguesa/Tabela
http://pt.wikipedia.org/wiki/Wikipedia:Versões da língua portuguesa/tabela
could be a start point to a similar conversion system for pt <-> pt-br?

Meanwhile, I was also trying to adapt the Template:LangSwitch from
Wikimedia Commons
(http://commons.wikimedia.org/wiki/Template:LangSwitch), in order to
be able to use the template syntax like this:
{{Language variations| pt = word 1| pt-br = word 2}}

For this, I've created two pages:
* MediaWiki:Lang, with 'pt'
* MediaWiki:Lang/pt-br, with 'pt-br'

and the template code is essentially:
{{#switch:{{int:Lang}}
|pt-br={{{pt-br|}}}
|pt
|#default={{{pt|}}}
}}

But I wasn't able to create a param "default" in order we could set
which of the variants will be shown by default for anonymous users. It
would be good if we could use {{Language variations| default = pt-br |
pt = word 1| pt-br = word 2}} to get:
(a) word 2, for annonimous users;
(b) word 1, for logged users which choose 'pt' in their preferences;
(c) word 2, for logged users which choose 'pt-br' in their preferences;
The option (a) would be necessary if we don't want to change an
existing text from 'pt-br' to 'pt' (for anonymous users) just because
we want the logged users to be able to choose the "content variant".

Is there any way of detect if the reader is logged in with something
in the style {{#if: <what?> | foo| bar}}?
(the problem with {{int:Lang}} is that for anonymous users and for
users who choose 'pt' the result is the same: 'pt', so I can't
distinguish these two cases at the template...)

Anyway, I think it would be better to have some kind of an automatized
conversion system, even if it doesn't convert all cases ( at least for
the words in the tables above it would be useful)

Thank you for all,

Helder

2009/9/9 Tim Starling <[hidden email]>:

> Roan Kattouw wrote:
>> That's the alphabet variant thing I mentioned earlier. If the majority
>> of the differences between pt and pt-br can be summed up with simple
>> rules that a computer can handle, we might be able to work something
>> out. However, that's usually not the case; I don't know Portugese, but
>> I do know that handling even simple differences between en-us and
>> en-gb is too complex already: a system that would successfully convert
>> 'realise' to 'realize' may also try to wrongfully convert 'disguise'.
>
> I don't know why you're writing this nonsense, you obviously haven't
> looked at the code at all.
>
> The language variant system that we have could easily convert between
> US and UK English. In fact it already does convert between a language
> pair with a far more complex relationship, that is Simplified and
> Traditional Chinese.
>
> The language conversion system is very simple, it's just a table of
> translated pairs, where the longest match takes precedence. The
> translation table in one direction (e.g. UK -> US) can be different to
> the table in the other direction (US -> UK). You would not list "ize
> -> ise", you would list every word in the dictionary with an -ize
> ending that can be translated to -ise without controversy. The current
> software could handle 50k pairs or so without serious performance
> problems, and it could be extended and optimised to allow millions of
> pairs if there was a need for that.
>
> It's possible to handle any pair of languages which are separated only
> by vocabulary, and transliteration or spelling. It's only differences
> in grammar, such as word order, that would give it trouble.
>
> -- Tim Starling
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Platonides
Helder Geovane Gomes de Lima wrote:

> But I wasn't able to create a param "default" in order we could set
> which of the variants will be shown by default for anonymous users. It
> would be good if we could use {{Language variations| default = pt-br |
> pt = word 1| pt-br = word 2}} to get:
> (a) word 2, for annonimous users;
> (b) word 1, for logged users which choose 'pt' in their preferences;
> (c) word 2, for logged users which choose 'pt-br' in their preferences;
> The option (a) would be necessary if we don't want to change an
> existing text from 'pt-br' to 'pt' (for anonymous users) just because
> we want the logged users to be able to choose the "content variant".

There's no difference. Anonymous users get the default language.
What you could do is having thee "languages": pt (generic Portuguese,
default), pt-pt and pt-br.

> Is there any way of detect if the reader is logged in with something
> in the style {{#if: <what?> | foo| bar}}?
No.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Aryeh Gregor
In reply to this post by Tim Starling-2
On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling <[hidden email]> wrote:
> I don't know why you're writing this nonsense, you obviously haven't
> looked at the code at all.

This paragraph is unnecessary.

> The language variant system that we have could easily convert between
> US and UK English. In fact it already does convert between a language
> pair with a far more complex relationship, that is Simplified and
> Traditional Chinese.
>
> The language conversion system is very simple, it's just a table of
> translated pairs, where the longest match takes precedence. The
> translation table in one direction (e.g. UK -> US) can be different to
> the table in the other direction (US -> UK). You would not list "ize
> -> ise", you would list every word in the dictionary with an -ize
> ending that can be translated to -ise without controversy. The current
> software could handle 50k pairs or so without serious performance
> problems, and it could be extended and optimised to allow millions of
> pairs if there was a need for that.
>
> It's possible to handle any pair of languages which are separated only
> by vocabulary, and transliteration or spelling. It's only differences
> in grammar, such as word order, that would give it trouble.

Is there any reason nobody's tried adding such support for us/uk
English?  It would resolve some long-standing tension on enwiki.
Would anons have to be given one variant or the other, or would they
get untransformed text or what?  Does the variant transformation apply
to the edit page as well?

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Trevor Parscal-2
On 9/10/09 10:06 AM, Aryeh Gregor wrote:
> On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling<[hidden email]>  wrote:
>    
>> I don't know why you're writing this nonsense, you obviously haven't
>> looked at the code at all.
>>      
> This paragraph is unnecessary.
>    
Seriously! Please read things aloud before clicking send. You will
hopefully then be able to better detect when it's time to take a break,
eat some fruit and take it down a notch.

>> The language variant system that we have could easily convert between
>> US and UK English. In fact it already does convert between a language
>> pair with a far more complex relationship, that is Simplified and
>> Traditional Chinese.
>>
>> The language conversion system is very simple, it's just a table of
>> translated pairs, where the longest match takes precedence. The
>> translation table in one direction (e.g. UK ->  US) can be different to
>> the table in the other direction (US ->  UK). You would not list "ize
>> ->  ise", you would list every word in the dictionary with an -ize
>> ending that can be translated to -ise without controversy. The current
>> software could handle 50k pairs or so without serious performance
>> problems, and it could be extended and optimised to allow millions of
>> pairs if there was a need for that.
>>
>> It's possible to handle any pair of languages which are separated only
>> by vocabulary, and transliteration or spelling. It's only differences
>> in grammar, such as word order, that would give it trouble.
>>      
> Is there any reason nobody's tried adding such support for us/uk
> English?  It would resolve some long-standing tension on enwiki.
> Would anons have to be given one variant or the other, or would they
> get untransformed text or what?  Does the variant transformation apply
> to the edit page as well?
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>    
The variant system seems poorly understood by most people (including me)
which often tends to cause something (like it for instance) to also be
under-utilized...

Perhaps we need more information on what it intends to provide the user.
All I find in Google on this topic are blurbs about configuration
variables and lots of people confused as to what language variants even
are...

Is there some awesome documentation somewhere I have yet to find?

- Trevor

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Chad
On Thu, Sep 10, 2009 at 1:39 PM, Trevor Parscal <[hidden email]> wrote:

> On 9/10/09 10:06 AM, Aryeh Gregor wrote:
>> On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling<[hidden email]>  wrote:
>>
>>> I don't know why you're writing this nonsense, you obviously haven't
>>> looked at the code at all.
>>>
>> This paragraph is unnecessary.
>>
> Seriously! Please read things aloud before clicking send. You will
> hopefully then be able to better detect when it's time to take a break,
> eat some fruit and take it down a notch.
>>> The language variant system that we have could easily convert between
>>> US and UK English. In fact it already does convert between a language
>>> pair with a far more complex relationship, that is Simplified and
>>> Traditional Chinese.
>>>
>>> The language conversion system is very simple, it's just a table of
>>> translated pairs, where the longest match takes precedence. The
>>> translation table in one direction (e.g. UK ->  US) can be different to
>>> the table in the other direction (US ->  UK). You would not list "ize
>>> ->  ise", you would list every word in the dictionary with an -ize
>>> ending that can be translated to -ise without controversy. The current
>>> software could handle 50k pairs or so without serious performance
>>> problems, and it could be extended and optimised to allow millions of
>>> pairs if there was a need for that.
>>>
>>> It's possible to handle any pair of languages which are separated only
>>> by vocabulary, and transliteration or spelling. It's only differences
>>> in grammar, such as word order, that would give it trouble.
>>>
>> Is there any reason nobody's tried adding such support for us/uk
>> English?  It would resolve some long-standing tension on enwiki.
>> Would anons have to be given one variant or the other, or would they
>> get untransformed text or what?  Does the variant transformation apply
>> to the edit page as well?
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> The variant system seems poorly understood by most people (including me)
> which often tends to cause something (like it for instance) to also be
> under-utilized...
>
> Perhaps we need more information on what it intends to provide the user.
> All I find in Google on this topic are blurbs about configuration
> variables and lots of people confused as to what language variants even
> are...
>
> Is there some awesome documentation somewhere I have yet to find?
>
> - Trevor
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Nope, but there's a bug asking for documentation :)

https://bugzilla.wikimedia.org/show_bug.cgi?id=19044

I certainly agree that it's completely undocumented and thus not usable
to many people. The vast majority of devs--myself included--don't even
understand how it works, much less how to use it. Maybe if we had docs,
it'd be more usable outside of the (very) small minority who do use and
maintain it.

-Chad

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Ariel Glenn WMF
In reply to this post by Aryeh Gregor
The differences between the UK and American varieties of English are not
limited just to spelling and vocabulary.

Ariel

Στις 10-09-2009, ημέρα Πεμ, και ώρα 13:06 -0400, ο/η Aryeh Gregor
έγραψε:

> On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling <[hidden email]> wrote:
> > I don't know why you're writing this nonsense, you obviously haven't
> > looked at the code at all.
>
> This paragraph is unnecessary.
>
> > The language variant system that we have could easily convert between
> > US and UK English. In fact it already does convert between a language
> > pair with a far more complex relationship, that is Simplified and
> > Traditional Chinese.
> >
> > The language conversion system is very simple, it's just a table of
> > translated pairs, where the longest match takes precedence. The
> > translation table in one direction (e.g. UK -> US) can be different to
> > the table in the other direction (US -> UK). You would not list "ize
> > -> ise", you would list every word in the dictionary with an -ize
> > ending that can be translated to -ise without controversy. The current
> > software could handle 50k pairs or so without serious performance
> > problems, and it could be extended and optimised to allow millions of
> > pairs if there was a need for that.
> >
> > It's possible to handle any pair of languages which are separated only
> > by vocabulary, and transliteration or spelling. It's only differences
> > in grammar, such as word order, that would give it trouble.
>
> Is there any reason nobody's tried adding such support for us/uk
> English?  It would resolve some long-standing tension on enwiki.
> Would anons have to be given one variant or the other, or would they
> get untransformed text or what?  Does the variant transformation apply
> to the edit page as well?
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Aryeh Gregor
On Thu, Sep 10, 2009 at 2:23 PM, Ariel T. Glenn <[hidden email]> wrote:
> The differences between the UK and American varieties of English are not
> limited just to spelling and vocabulary.

Those account for the large majority of the more noticeable
differences, however.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Helder Geovane Gomes de Lima
2009/9/10 Aryeh Gregor
<[hidden email]<Simetrical%[hidden email]>
>

> On Thu, Sep 10, 2009 at 2:23 PM, Ariel T. Glenn <[hidden email]>
> wrote:
> > The differences between the UK and American varieties of English are not
> > limited just to spelling and vocabulary.
>
> Those account for the large majority of the more noticeable
> differences, however.


I think this is also the case for Portuguese ('pt' x 'pt-br'). So, even if
the table doesn't solves every case, what it solves is sufficiently good...

2009/9/10 Aryeh Gregor
<[hidden email]<Simetrical%[hidden email]>
>
>
> Is there any reason nobody's tried adding such support for us/uk
> English?  It would resolve some long-standing tension on enwiki.
> Would anons have to be given one variant or the other, or would they
> get untransformed text or what?  Does the variant transformation apply
> to the edit page as well?
>

I have the same questions...

Helder
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

M. Williamson
It might be possible to make it apply to the edit page as well, but in
zh.wp, sr.wp, and kk.wp currently it does not. I'm guessing (could be
wrong) that it would eat up a lot more resources.

Mark

skype: node.ue



On Thu, Sep 10, 2009 at 11:49 AM, Helder Geovane Gomes de Lima
<[hidden email]> wrote:

> 2009/9/10 Aryeh Gregor
> <[hidden email]<Simetrical%[hidden email]>
>>
>
>> On Thu, Sep 10, 2009 at 2:23 PM, Ariel T. Glenn <[hidden email]>
>> wrote:
>> > The differences between the UK and American varieties of English are not
>> > limited just to spelling and vocabulary.
>>
>> Those account for the large majority of the more noticeable
>> differences, however.
>
>
> I think this is also the case for Portuguese ('pt' x 'pt-br'). So, even if
> the table doesn't solves every case, what it solves is sufficiently good...
>
> 2009/9/10 Aryeh Gregor
> <[hidden email]<Simetrical%[hidden email]>
>>
>>
>> Is there any reason nobody's tried adding such support for us/uk
>> English?  It would resolve some long-standing tension on enwiki.
>> Would anons have to be given one variant or the other, or would they
>> get untransformed text or what?  Does the variant transformation apply
>> to the edit page as well?
>>
>
> I have the same questions...
>
> Helder
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Helder Geovane Gomes de Lima
In reply to this post by Tim Starling-2
2009/9/9 Tim Starling <[hidden email]>

> The language variant system that we have could easily convert between
> US and UK English. In fact it already does convert between a language
> pair with a far more complex relationship, that is Simplified and
> Traditional Chinese.
>
> The language conversion system is very simple, it's just a table of
> translated pairs, where the longest match takes precedence. The
> translation table in one direction (e.g. UK -> US) can be different to
> the table in the other direction (US -> UK). You would not list "ize
> -> ise", you would list every word in the dictionary with an -ize
> ending that can be translated to -ise without controversy. The current
> software could handle 50k pairs or so without serious performance
> problems, and it could be extended and optimised to allow millions of
> pairs if there was a need for that.


Hello again!

What would be needed in order to use pages like MediaWiki:Conversiontable/pt
and MediaWiki:Conversiontable/pt-br at the wikimedia projects in Portuguese
for the conversion? Is it easy to have the language conversion enabled?
Could we gradually create the conversion tables?

Sorry for so many questions...

Helder
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Roan Kattouw-2
In reply to this post by Trevor Parscal-2
2009/9/10 Trevor Parscal <[hidden email]>:

> On 9/10/09 10:06 AM, Aryeh Gregor wrote:
>> On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling<[hidden email]>  wrote:
>>
>>> I don't know why you're writing this nonsense, you obviously haven't
>>> looked at the code at all.
>>>
>> This paragraph is unnecessary.
>>
> Seriously! Please read things aloud before clicking send. You will
> hopefully then be able to better detect when it's time to take a break,
> eat some fruit and take it down a notch.
In Tim's defense: I had indeed not looked at the code at all, and what
I wrote was incorrect, so what he wrote was completely true. I also
mentioned that my understanding of the variant conversion system was
limited, and that I might be completely wrong. Turns out I was, and
Tim corrected me. It's true that he probably didn't use the most
friendly tone in the world, but I've seen much worse, so I don't
really care. Let's just drop this before it turns into a flame war;
I'd like to keep those off wikitech-l.

> The variant system seems poorly understood by most people (including me)
> which often tends to cause something (like it for instance) to also be
> under-utilized...
>
Seems I'm not the only one who had a completely wrong idea about how
variants work. We definitely need more documentation and fame for this
system, so its potential doesn't go to waste.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Aryeh Gregor
On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw <[hidden email]> wrote:
> Seems I'm not the only one who had a completely wrong idea about how
> variants work. We definitely need more documentation and fame for this
> system, so its potential doesn't go to waste.

I theoretically knew that it was just a string-replace system, but it
didn't occur to me that it would be useful for more than
transliteration.  It makes sense now that Tim pointed that out.  How
would it handle word breaks, though?  It would just ignore them, so
color -> colour also changes uncolored -> uncoloured?  What about
things like HTML id's or even attribute/property names (<span
style="color:red">)?  I'm sure I could dig through the code to find
the answers to these, but actually I'm not even sure offhand where the
code *is*.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Helder Geovane Gomes de Lima
Hello!

I think the code is these:
http://svn.wikimedia.org/doc/LanguageConverter_8php-source.html#l00018
http://svn.wikimedia.org/doc/LanguageZh_8php-source.html#l00009

and a comment at
http://svn.wikimedia.org/doc/LanguageConverter_8php-source.html#l00258
says:

00271  /* we convert everything except:
00272  1. html markups (anything between < and >)
00273  2. html entities
00274  3. place holders created by the parser
00275  */

So, I don't think it will convert <span style="color:red">. But I'm
not sure, because I'm still learning php...

By the way, I can't understand Chinese, but (after using an on-line
translator) I think the page they have for documenting the system is
this:
http://zh.wikipedia.org/wiki/Help:%E4%B8%AD%E6%96%87%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91%E7%9A%84%E7%B9%81%E7%AE%80%E5%A4%84%E7%90%86

Helder




2009/9/10 Aryeh Gregor <[hidden email]>

>
> On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw <[hidden email]> wrote:
> > Seems I'm not the only one who had a completely wrong idea about how
> > variants work. We definitely need more documentation and fame for this
> > system, so its potential doesn't go to waste.
>
> I theoretically knew that it was just a string-replace system, but it
> didn't occur to me that it would be useful for more than
> transliteration.  It makes sense now that Tim pointed that out.  How
> would it handle word breaks, though?  It would just ignore them, so
> color -> colour also changes uncolored -> uncoloured?  What about
> things like HTML id's or even attribute/property names (<span
> style="color:red">)?  I'm sure I could dig through the code to find
> the answers to these, but actually I'm not even sure offhand where the
> code *is*.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Tim Starling-2
In reply to this post by Aryeh Gregor
Aryeh Gregor wrote:

> On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw <[hidden email]> wrote:
>> Seems I'm not the only one who had a completely wrong idea about how
>> variants work. We definitely need more documentation and fame for this
>> system, so its potential doesn't go to waste.
>
> I theoretically knew that it was just a string-replace system, but it
> didn't occur to me that it would be useful for more than
> transliteration.  It makes sense now that Tim pointed that out.  How
> would it handle word breaks, though?  It would just ignore them, so
> color -> colour also changes uncolored -> uncoloured?

Neither of the implementations so far has required any knowledge of
word breaks, and so it has not been implemented. In theory you could
just list every larger word that contains a smaller transformed word, e.g.

humor -> humour
humorous -> humorous

But it might be better to just add a word segmentation feature.

> What about
> things like HTML id's or even attribute/property names (<span
> style="color:red">)?  I'm sure I could dig through the code to find
> the answers to these, but actually I'm not even sure offhand where the
> code *is*.

languages/LanguageConverter.php. There are some rather inelegant
regexes to deal with cases like these, they seem to work. The
converter operates at a near-HTML stage of the parser, so it's not too
hard to skip attributes.

Note that the FastStringSearch extension is important for acheiving
good performance, especially in Chinese.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Tim Starling-2
In reply to this post by Ariel Glenn WMF
Ariel T. Glenn wrote:
> The differences between the UK and American varieties of English are not
> limited just to spelling and vocabulary.


Note that the -{...}- structure is available in wikitext to translate
article-specific fragments of text, so you can also translate worldview:

A popular game played with a bat and ball is -{en-gb:Cricket;
en-us:Baseball}-.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Language variants

Ilmari Karonen
Tim Starling wrote:
> Ariel T. Glenn wrote:
>> The differences between the UK and American varieties of English are not
>> limited just to spelling and vocabulary.
>
> Note that the -{...}- structure is available in wikitext to translate
> article-specific fragments of text, so you can also translate worldview:
>
> A popular game played with a bat and ball is -{en-gb:Cricket;
> en-us:Baseball}-.

That reminds me... some time ago, someone proposed to enable
LanguageConverter on Commons (but without any automatic conversion,
presumably) and to (ab?)use it to replace the existing autotranslation
hacks based on {{int:lang}}.  Would that be in any sense feasible?

There would presumably be two major use cases: the easy one, which I do
believe the converter should handle just fine, would be to replace the
current <http://commons.wikipedia.org/wiki/Template:LangSwitch>,
generally used to autotranslate short phrases, with syntax like:

-{de:Eigene Arbeit; en:Own work; fi:Oma teos; fr:Travail personnel; etc.}-

(See <http://commons.wikipedia.org/wiki/Template:Own> for the source of
the example.)

The not-so-simple case would be replacing
<http://commons.wikipedia.org/wiki/Template:Autotranslate>, which is
used to translate entire templates, usually (though by no means
necessarily) combined with a long list of links to the various
translations so that users can easily browse them if the automatically
chosen version is no good or something.  A naive implementation of that
would look something like:

-{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}};
ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; <!--
...and so on for about 70 more languages -->}-

(Source: <http://commons.wikipedia.org/wiki/Template:GFDL>.)

I'd like to hope that there might be some better way of doing it,
though, even if I can't offhand think of what it might look like.

Still, would something like that work, even in theory, and would it be
an improvement over the way these things are currently done (which is
hacky enough itself)?

--
Ilmari Karonen

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
1234