RFC: make Parser::getTargetLanguage aware of multilingual wikis

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: make Parser::getTargetLanguage aware of multilingual wikis

Daniel Kinzler
Hi all!

Tomorrow's RFC discussion[1] on IRC (22:00 UTC at #wikimedia-office) will be
about my proposal to use  Parser::getTargetLanguage to allow wiki pages to be
generated in different languages depending on the user's interface language [2].

I would like to take this opportunity to gather some input beforehand about how
we can improve MediaWiki's support for multilingual wikis on the parser level.
In particular, I'm interested to learn about the implications my proposal has
for the Translate extension, the templates currently used on commons, sites that
use automatic transliteration, etc.


Some context: Currently, MediaWiki doesn't really have a concept of multilingual
content. But some wikis, like Commons and Wikidata, show page content in the
user's language, using a veriety hacks implemented by extensions such as
Translate and Wikibase. It would be nice to make MediaWiki aware of multilingual
content, and add some limited suppor for this to core. Some bits and pieces
already exist, but that don't quite work for what we need.

One issue is that parser functions (and Lua code) have no good way to know what
the target language for the current page rendering is. Both ParserOptions and
Parser have a getTargetLanguage method, but this is used *only* when displaying
system messages in a different language on pages like MediaWiki:Foo/fr.

I propose to change core so it will set the target language in the parser
options to the user language on wikis/namespaces/pages marked as multilingual.
This would allow parser functions and Lua libraries to generate content in the
desired target language.


There is another related method, which I propose to drop, or at least move:
Title::getDisplayLanguage (resp ContentHandler::getDisplayLanguage). This seems
to be used by wikis that apply transliteration to page content, but it's a but
the semantics ar ea it unclear. I propose to drop this in favor of
ParserOptions::getTargetLanguage, since the display language is not a property
of the page, but an option defined for the rendering of the page.


Another related issue is anonymous browsing of multi-lingual content. This will
either go past the web cache layer (as is currently done on commons), or it's
simply not possible (as currently on wikidata). I have put up an RFC for that as
well[3], to be discussed at a different time.


[1] <https://phabricator.wikimedia.org/E89>
[2] <https://phabricator.wikimedia.org/T114640>
[3] <https://phabricator.wikimedia.org/T114662>


-- Daniel Kinzler


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: RFC: make Parser::getTargetLanguage aware of multilingual wikis

Brad Jorsch (Anomie)
This in general reminds me of https://phabricator.wikimedia.org/T4085.

Also, if page content can vary based on user language, what to do about bug
reports that Special:WhatLinksHere, category listings, file usage data at
the bottom of file description pages, and so on don't report a
link/template/category/file that only exists on the page when it's viewed
in a non-default language? Yeah, we already have that with {{int:}} hacks,
but you're talking about making it more of a feature.

On Tue, Nov 10, 2015 at 2:07 PM, Daniel Kinzler <[hidden email]>
wrote:

> Hi all!
>
> Tomorrow's RFC discussion[1] on IRC (22:00 UTC at #wikimedia-office) will
> be
> about my proposal to use  Parser::getTargetLanguage to allow wiki pages to
> be
> generated in different languages depending on the user's interface
> language [2].
>
> I would like to take this opportunity to gather some input beforehand
> about how
> we can improve MediaWiki's support for multilingual wikis on the parser
> level.
> In particular, I'm interested to learn about the implications my proposal
> has
> for the Translate extension, the templates currently used on commons,
> sites that
> use automatic transliteration, etc.
>
>
> Some context: Currently, MediaWiki doesn't really have a concept of
> multilingual
> content. But some wikis, like Commons and Wikidata, show page content in
> the
> user's language, using a veriety hacks implemented by extensions such as
> Translate and Wikibase. It would be nice to make MediaWiki aware of
> multilingual
> content, and add some limited suppor for this to core. Some bits and pieces
> already exist, but that don't quite work for what we need.
>
> One issue is that parser functions (and Lua code) have no good way to know
> what
> the target language for the current page rendering is. Both ParserOptions
> and
> Parser have a getTargetLanguage method, but this is used *only* when
> displaying
> system messages in a different language on pages like MediaWiki:Foo/fr.
>
> I propose to change core so it will set the target language in the parser
> options to the user language on wikis/namespaces/pages marked as
> multilingual.
> This would allow parser functions and Lua libraries to generate content in
> the
> desired target language.
>
>
> There is another related method, which I propose to drop, or at least move:
> Title::getDisplayLanguage (resp ContentHandler::getDisplayLanguage). This
> seems
> to be used by wikis that apply transliteration to page content, but it's a
> but
> the semantics ar ea it unclear. I propose to drop this in favor of
> ParserOptions::getTargetLanguage, since the display language is not a
> property
> of the page, but an option defined for the rendering of the page.
>
>
> Another related issue is anonymous browsing of multi-lingual content. This
> will
> either go past the web cache layer (as is currently done on commons), or
> it's
> simply not possible (as currently on wikidata). I have put up an RFC for
> that as
> well[3], to be discussed at a different time.
>
>
> [1] <https://phabricator.wikimedia.org/E89>
> [2] <https://phabricator.wikimedia.org/T114640>
> [3] <https://phabricator.wikimedia.org/T114662>
>
>
> -- Daniel Kinzler
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: RFC: make Parser::getTargetLanguage aware of multilingual wikis

Brian Wolff
On 11/10/15, Brad Jorsch (Anomie) <[hidden email]> wrote:
> This in general reminds me of https://phabricator.wikimedia.org/T4085.
>
> Also, if page content can vary based on user language, what to do about bug
> reports that Special:WhatLinksHere, category listings, file usage data at
> the bottom of file description pages, and so on don't report a
> link/template/category/file that only exists on the page when it's viewed
> in a non-default language? Yeah, we already have that with {{int:}} hacks,
> but you're talking about making it more of a feature.

If I remember correctly, we already parse the page once in the user
language, and once in the content language (canonical parser options)
in order to prevent this issue.

I think the biggest thing we could do for multi-lingual support, is
introduce {{USERLANGUAGE}} magic word (and equivalent for lua) so
people stop using int hacks which is a poor user experience, even by
wikitext standards. Most arguments against it are about parser cache
splitting, which is silly, as people already split the parser cache on
a massive level using {{int: hacks on commons, and the table of
contents on pretty much every other wiki (As an aside, TOC really
shouldn't split parser cache imo, and that's something I'd like to fix
at some point, but as it stands, any page with a ToC is split by user
language)

The biggest gotcha to look out for imo, is things like number
formatting in parser functions. Sometimes users write templates that
make assumptions about the number formatting, and it can vary by page
language (however, its entirely possible to make proper templates that
don't do that). [Sometimes number formatting seems to use content
language, sometimes it seems to use functionLang]

As for actual proposal, I'm a fan of being able to associate a
language with a specific revision, to override the default wiki
language on a per revision basis. I think it might be interesting to
be able to set 'mul' as the content language, in order to make the
pages always be in the user language, but that's the sort of thing I
think needs some testing to discover forgotten about assumptions about
language that MediaWiki might make.

--
bawolff

> On Tue, Nov 10, 2015 at 2:07 PM, Daniel Kinzler <[hidden email]>
> wrote:
>
>> Hi all!
>>
>> Tomorrow's RFC discussion[1] on IRC (22:00 UTC at #wikimedia-office) will
>> be
>> about my proposal to use  Parser::getTargetLanguage to allow wiki pages to
>> be
>> generated in different languages depending on the user's interface
>> language [2].
>>
>> I would like to take this opportunity to gather some input beforehand
>> about how
>> we can improve MediaWiki's support for multilingual wikis on the parser
>> level.
>> In particular, I'm interested to learn about the implications my proposal
>> has
>> for the Translate extension, the templates currently used on commons,
>> sites that
>> use automatic transliteration, etc.
>>
>>
>> Some context: Currently, MediaWiki doesn't really have a concept of
>> multilingual
>> content. But some wikis, like Commons and Wikidata, show page content in
>> the
>> user's language, using a veriety hacks implemented by extensions such as
>> Translate and Wikibase. It would be nice to make MediaWiki aware of
>> multilingual
>> content, and add some limited suppor for this to core. Some bits and
>> pieces
>> already exist, but that don't quite work for what we need.
>>
>> One issue is that parser functions (and Lua code) have no good way to know
>> what
>> the target language for the current page rendering is. Both ParserOptions
>> and
>> Parser have a getTargetLanguage method, but this is used *only* when
>> displaying
>> system messages in a different language on pages like MediaWiki:Foo/fr.
>>
>> I propose to change core so it will set the target language in the parser
>> options to the user language on wikis/namespaces/pages marked as
>> multilingual.
>> This would allow parser functions and Lua libraries to generate content in
>> the
>> desired target language.
>>
>>
>> There is another related method, which I propose to drop, or at least
>> move:
>> Title::getDisplayLanguage (resp ContentHandler::getDisplayLanguage). This
>> seems
>> to be used by wikis that apply transliteration to page content, but it's a
>> but
>> the semantics ar ea it unclear. I propose to drop this in favor of
>> ParserOptions::getTargetLanguage, since the display language is not a
>> property
>> of the page, but an option defined for the rendering of the page.
>>
>>
>> Another related issue is anonymous browsing of multi-lingual content. This
>> will
>> either go past the web cache layer (as is currently done on commons), or
>> it's
>> simply not possible (as currently on wikidata). I have put up an RFC for
>> that as
>> well[3], to be discussed at a different time.
>>
>>
>> [1] <https://phabricator.wikimedia.org/E89>
>> [2] <https://phabricator.wikimedia.org/T114640>
>> [3] <https://phabricator.wikimedia.org/T114662>
>>
>>
>> -- Daniel Kinzler
>>
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
>
> --
> Brad Jorsch (Anomie)
> Senior Software Engineer
> Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: RFC: make Parser::getTargetLanguage aware of multilingual wikis

C. Scott Ananian
I believe the title language support is for the LanguageConverter
extension.  They used to (ab)use the `{{DISPLAYTITLE:title}}` magic
word in order to use the proper language variant, something like:

`{{DISPLAYTITLE:-{en-us:Color; en-gb:Colour}-}}`

Then support was added to avoid the need for this hack, and just Do
The Right Thing.  I don't know the details, but presumably
`Title::getDisplayLanguage` is part of it.


On Tue, Nov 10, 2015 at 4:00 PM, Brian Wolff <[hidden email]> wrote:
> contents on pretty much every other wiki (As an aside, TOC really
> shouldn't split parser cache imo, and that's something I'd like to fix
> at some point, but as it stands, any page with a ToC is split by user
> language)

Then you'll be interested in taking a look at
https://phabricator.wikimedia.org/T114057
 --scott

--
(http://cscott.net)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: RFC: make Parser::getTargetLanguage aware of multilingual wikis

Brad Jorsch (Anomie)
In reply to this post by Brian Wolff
On Tue, Nov 10, 2015 at 4:00 PM, Brian Wolff <[hidden email]> wrote:

> On 11/10/15, Brad Jorsch (Anomie) <[hidden email]> wrote:
> > Also, if page content can vary based on user language, what to do about
> bug
> > reports that Special:WhatLinksHere, category listings, file usage data at
> > the bottom of file description pages, and so on don't report a
> > link/template/category/file that only exists on the page when it's viewed
> > in a non-default language? Yeah, we already have that with {{int:}}
> hacks,
> > but you're talking about making it more of a feature.
>
> If I remember correctly, we already parse the page once in the user
> language, and once in the content language (canonical parser options)
> in order to prevent this issue.
>

We parse in the content language to avoid T16404
<https://phabricator.wikimedia.org/T16404>, which is somewhat the opposite.

My concern here is that if varying page content on user language becomes a
supported thing, people will probably complain that
{{#ifeq:{{USERLANG}}|en|[[Category:Foo]]|[[Category:Bar]]}} (or the
equivalent in Lua) on a site with 'en' as the default won't show the page
when you look at Category:Bar, even though it probably will show
Category:Bar at the bottom of the page in non-English languages.

T16404 <https://phabricator.wikimedia.org/T16404> was about the fact that
doing the equivalent with {{int:}} hacks used to sometimes put the page in
Category:Foo and sometimes in Category:Bar, depending on the language of
whoever last edited (or null-edited) the page.


--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: RFC: make Parser::getTargetLanguage aware of multilingual wikis

Brian Wolff
On 11/10/15, Brad Jorsch (Anomie) <[hidden email]> wrote:

> On Tue, Nov 10, 2015 at 4:00 PM, Brian Wolff <[hidden email]> wrote:
>
>> On 11/10/15, Brad Jorsch (Anomie) <[hidden email]> wrote:
>> > Also, if page content can vary based on user language, what to do about
>> bug
>> > reports that Special:WhatLinksHere, category listings, file usage data
>> > at
>> > the bottom of file description pages, and so on don't report a
>> > link/template/category/file that only exists on the page when it's
>> > viewed
>> > in a non-default language? Yeah, we already have that with {{int:}}
>> hacks,
>> > but you're talking about making it more of a feature.
>>
>> If I remember correctly, we already parse the page once in the user
>> language, and once in the content language (canonical parser options)
>> in order to prevent this issue.
>>
>
> We parse in the content language to avoid T16404
> <https://phabricator.wikimedia.org/T16404>, which is somewhat the opposite.
>
> My concern here is that if varying page content on user language becomes a
> supported thing, people will probably complain that
> {{#ifeq:{{USERLANG}}|en|[[Category:Foo]]|[[Category:Bar]]}} (or the
> equivalent in Lua) on a site with 'en' as the default won't show the page
> when you look at Category:Bar, even though it probably will show
> Category:Bar at the bottom of the page in non-English languages.
>
> T16404 <https://phabricator.wikimedia.org/T16404> was about the fact that
> doing the equivalent with {{int:}} hacks used to sometimes put the page in
> Category:Foo and sometimes in Category:Bar, depending on the language of
> whoever last edited (or null-edited) the page.
>
>

Ah. I read your previous email too fast.

Maybe we should have something like:

{{#langswitch:
en=foo
fr=le foo
..
}}

which works like normal #switch, except without dead-branch
elimination. (And for bonus points, implements language fallback
sanely).

Or maybe an in-core feature {{#langtransclude:foo}}, which works like
normal {{foo}}, except it translcudes the language subpage instead
(and does smart fallback, and records a transclusion link record for
all the 2-3 letter subpages of the template).

Whatever else we do, I'm really not a fan of the syntax that translate
extension does. If we implement something in core to make
multilingualism easier, I really hope we go with more sane syntax.
[And I say that as a person who love's MW's general crazy syntax]

--
-bawolff

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: RFC: make Parser::getTargetLanguage aware of multilingual wikis

Brad Jorsch (Anomie)
On Tue, Nov 10, 2015 at 4:39 PM, Brian Wolff <[hidden email]> wrote:

> Maybe we should have something like:
>
> {{#langswitch:
> en=foo
> fr=le foo
> ..
> }}
>
> which works like normal #switch, except without dead-branch
> elimination. (And for bonus points, implements language fallback
> sanely).
>

That might work in itself. But then {{foo|var={{#langswitch:...}}}} would
probably still have potential issues, and the same sort of thing in
Scribunto however it's implemented there.


> Or maybe an in-core feature {{#langtransclude:foo}}, which works like
> normal {{foo}}, except it translcudes the language subpage instead
> (and does smart fallback, and records a transclusion link record for
> all the 2-3 letter subpages of the template).
>

You'd also have to parse all those 2-3 letter subpages to get their links,
categories, subtemplates, and so on.


--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: RFC: make Parser::getTargetLanguage aware of multilingual wikis

Purodha Blissenbach
In reply to this post by Brian Wolff
On 10.11.2015 22:00, Brian Wolff wrote:

>
> ... Most arguments against it are about parser cache
> splitting, which is silly, as people already split the parser cache
> on
> a massive level using {{int: hacks on commons, and the table of
> contents on pretty much every other wiki (As an aside, TOC really
> shouldn't split parser cache imo, and that's something I'd like to
> fix
> at some point, but as it stands, any page with a ToC is split by user
> language)

See https://phabricator.wikimedia.org/T114057#1798538
on that issue.

> I think it might be interesting to
> be able to set 'mul' as the content language, in order to make the
> pages always be in the user language, but that's the sort of thing I
> think needs some testing to discover forgotten about assumptions
> about
> language that MediaWiki might make.

'mul' is to be used if the page content is in mixed languages.

We need to use another, different marker code internally, which is
replaced
by the user language code when the page is rendered.

Purodha


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: RFC: make Parser::getTargetLanguage aware of multilingual wikis

Daniel Kinzler
In reply to this post by Daniel Kinzler
Quick poke: the IRC discussion is coming up on #wikimedia-office in less than
two hours, at 22:00 UTC.

-- daniel

Am 10.11.2015 um 20:07 schrieb Daniel Kinzler:

> Hi all!
>
> Tomorrow's RFC discussion[1] on IRC (22:00 UTC at #wikimedia-office) will be
> about my proposal to use  Parser::getTargetLanguage to allow wiki pages to be
> generated in different languages depending on the user's interface language [2].
>
> I would like to take this opportunity to gather some input beforehand about how
> we can improve MediaWiki's support for multilingual wikis on the parser level.
> In particular, I'm interested to learn about the implications my proposal has
> for the Translate extension, the templates currently used on commons, sites that
> use automatic transliteration, etc.
>
>
> Some context: Currently, MediaWiki doesn't really have a concept of multilingual
> content. But some wikis, like Commons and Wikidata, show page content in the
> user's language, using a veriety hacks implemented by extensions such as
> Translate and Wikibase. It would be nice to make MediaWiki aware of multilingual
> content, and add some limited suppor for this to core. Some bits and pieces
> already exist, but that don't quite work for what we need.
>
> One issue is that parser functions (and Lua code) have no good way to know what
> the target language for the current page rendering is. Both ParserOptions and
> Parser have a getTargetLanguage method, but this is used *only* when displaying
> system messages in a different language on pages like MediaWiki:Foo/fr.
>
> I propose to change core so it will set the target language in the parser
> options to the user language on wikis/namespaces/pages marked as multilingual.
> This would allow parser functions and Lua libraries to generate content in the
> desired target language.
>
>
> There is another related method, which I propose to drop, or at least move:
> Title::getDisplayLanguage (resp ContentHandler::getDisplayLanguage). This seems
> to be used by wikis that apply transliteration to page content, but it's a but
> the semantics ar ea it unclear. I propose to drop this in favor of
> ParserOptions::getTargetLanguage, since the display language is not a property
> of the page, but an option defined for the rendering of the page.
>
>
> Another related issue is anonymous browsing of multi-lingual content. This will
> either go past the web cache layer (as is currently done on commons), or it's
> simply not possible (as currently on wikidata). I have put up an RFC for that as
> well[3], to be discussed at a different time.
>
>
> [1] <https://phabricator.wikimedia.org/E89>
> [2] <https://phabricator.wikimedia.org/T114640>
> [3] <https://phabricator.wikimedia.org/T114662>
>
>
> -- Daniel Kinzler
>


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l