Is it possible to change the locale of a scribunto module and have identifiers with locale characters

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it possible to change the locale of a scribunto module and have identifiers with locale characters

Mathieu Stumpf Guntz
Hello everybody,

According to lua wiki
<http://lua-users.org/wiki/LuaLocales%20In%20Lua%205.1>, in Lua 5.1
"identifiers [are] locale dependent, and from the reference manual which
states that "[the documentation] derived from the Lua 5.1 reference
manual <http://www.lua.org/manual/5.1/index.html>", I guess tha
Scribunto is still derived form Lua 5.1.

So, what I would like is being able to set the locale for a module and
use identifiers with locale characters. But `os.setlocale` isn't
accessible in scribunto modules.

Might I have some information about reasons to disabling it and feedback
related to the possibility to enable it?

Ĝis baldaŭ

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to change the locale of a scribunto module and have identifiers with locale characters

Brad Jorsch (Anomie)
On Thu, Sep 28, 2017 at 5:19 AM, mathieu stumpf guntz <
[hidden email]> wrote:

> According to lua wiki <http://lua-users.org/wiki/Lua
> Locales%20In%20Lua%205.1>, in Lua 5.1 "identifiers [are] locale
> dependent, and from the reference manual which states that "[the
> documentation] derived from the Lua 5.1 reference manual <
> http://www.lua.org/manual/5.1/index.html>", I guess tha Scribunto is
> still derived form Lua 5.1.
>

That's correct.


> So, what I would like is being able to set the locale for a module and use
> identifiers with locale characters. But `os.setlocale` isn't accessible in
> scribunto modules.
>

Allowing os.setlocale would very likely cause problems on threaded
webservers where one thread's locale change stomps on another's. It might
even cause trouble for subsequent requests on non-threaded servers if the
locale doesn't get reset, or for other code running during the same request
(e.g. see T107128 <https://phabricator.wikimedia.org/T107128>).

For sanity's sake, on Wikimedia wikis we use C.UTF-8 as the OS-level
locale. This doesn't affect much since MediaWiki usually uses its own i18n
mechanisms instead of using the locale.


--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to change the locale of a scribunto module and have identifiers with locale characters

Mathieu Stumpf Guntz


Le 28/09/2017 à 16:25, Brad Jorsch (Anomie) a écrit :

> On Thu, Sep 28, 2017 at 5:19 AM, mathieu stumpf guntz <
> [hidden email]> wrote:
>
>> According to lua wiki <http://lua-users.org/wiki/Lua
>> Locales%20In%20Lua%205.1>, in Lua 5.1 "identifiers [are] locale
>> dependent, and from the reference manual which states that "[the
>> documentation] derived from the Lua 5.1 reference manual <
>> http://www.lua.org/manual/5.1/index.html>", I guess tha Scribunto is
>> still derived form Lua 5.1.
>>
> That's correct.
Ok, thank you, I think that's a very important point. It appears to me
that Lua developers have a rather "we don't care about backward
compatibility" approach, so later version can have significant
incompatibilities.

By the way is there an official policy or whatever document regarding
Scribunto evolutions?

>> So, what I would like is being able to set the locale for a module and use
>> identifiers with locale characters. But `os.setlocale` isn't accessible in
>> scribunto modules.
>>
> Allowing os.setlocale would very likely cause problems on threaded
> webservers where one thread's locale change stomps on another's. It might
> even cause trouble for subsequent requests on non-threaded servers if the
> locale doesn't get reset, or for other code running during the same request
> (e.g. see T107128 <https://phabricator.wikimedia.org/T107128>).
Ok, thank you. I guessed that each Scribunto process was hugely
sandboxed, especially as everything seems to be done to prevent passing
information between successive invocations of the same module. I hadn't
thought of possible side effect on PHP execution as explained in the
ticket. Do we have some nice (or even ugly) schema of PHP/Scribunto
execution process so I could have a clearer representation of what's
happening when I grab a webpage of a mediawiki article with some
Scribunto invocation?
>
> For sanity's sake, on Wikimedia wikis we use C.UTF-8 as the OS-level
> locale. This doesn't affect much since MediaWiki usually uses its own i18n
> mechanisms instead of using the locale.
>
>
Well, ok. I mean that doesn't seems a problem for string, all the more
when the mw library provide specific helper around the topic.

But that's not the concern I was writing for. That is, I can't use
unicode identifiers as in `locale plâtrière = préamorçage()`. When I see
UTF-8 somewhere, I would expect no problem to use any glyph. So are my
expectations misguided, or is there something wrong with the way C.UTF-8
is handled somewhere in the software stack?
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to change the locale of a scribunto module and have identifiers with locale characters

Brad Jorsch (Anomie)
On Fri, Sep 29, 2017 at 6:48 AM, mathieu stumpf guntz <
[hidden email]> wrote:

> By the way is there an official policy or whatever document regarding
> Scribunto evolutions?
>

Not that I know of. The biggest technical blocker to having Scribunto use a
newer version of Lua is that 5.2 heavily changed how function environments
work, so we'd have to redo the sandboxing and put it through a fresh
security review.


> Ok, thank you. I guessed that each Scribunto process was hugely sandboxed,
> especially as everything seems to be done to prevent passing information
> between successive invocations of the same module. I hadn't thought of
> possible side effect on PHP execution as explained in the ticket.
>

The problem with os.setlocale is that it's global to the whole process, not
inside the sandbox. When using luastandalone that's less of an issue since
the Lua code runs in a separate process (but we still don't start a new
process for each #invoke on the page), but when running with the luasandbox
PHP extension it shares the process.


> Do we have some nice (or even ugly) schema of PHP/Scribunto execution
> process so I could have a clearer representation of what's happening when I
> grab a webpage of a mediawiki article with some Scribunto invocation?
>

Not really. When the parser processes the {{#invoke:}}, it calls
ScribuntoHooks::invokeHook() which loads the module invoked, initializes
it, then calls the method invoked.


> But that's not the concern I was writing for. That is, I can't use unicode
> identifiers as in `locale plâtrière = préamorçage()`. When I see UTF-8
> somewhere, I would expect no problem to use any glyph. So are my
> expectations misguided, or is there something wrong with the way C.UTF-8 is
> handled somewhere in the software stack?
>

Lua's processing operates on C chars (i.e. bytes), and uses C's isalpha()
and isalnum() to recognize which characters are "letters" for the purpose
of identifiers. For single-byte encodings this allows non-ASCII characters
such as 'â', 'è', 'é', and 'ç' to be recognized as "letters", hence the
documentation in Lua 5.1 about that, but in UTF-8 these are all represented
with multiple bytes so that doesn't work.

Changing that would require rewriting all the Lua input processing to use
functions that can handle "wide" characters, which is well beyond what
we're at all likely to do. It'd have to happen upstream, and then we'd have
to spend the time to actually upgrade to Lua 5.4 or whatever version
implemented it. But since Lua 5.2 actually changed things the other way
("Lua identifiers cannot use locale-dependent letters",
https://www.lua.org/manual/5.2/manual.html#8.1) that too seems unlikely.


--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l