Character-counting parser function

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Character-counting parser function

Greg L
Hi,

I’m new to this venue so please have patience with me. Jimbo suggested  
I contact Erik and Erik said I should post here.

Wikipedia authors of magic words and templates could really use a  
character-counting parser function. All the background information can  
be found here:

http://en.wikipedia.org/w/index.php?title=User_talk:Jimbo_Wales&oldid=260819871 
  - Developer_support_for_parser_function

In a nutshell though, there is currently a template on en.Wikipedia  
called {{val}} that delimits numbers (places what appears to be  
thinspaces very three characters in scientific notation). It currently  
must use math-based techniques to parse the value and this results in  
rounding errors 5–10% of the time.

A character-counting parser function would accept interrogations such  
as “Are there more than four characters remaining in the string when  
counting right from the decimal point?” And “If so, feed me three more  
characters.” Such a parser function would be very handy for many other  
purposes. With a good, bullet-proof parser function, our small army of  
template authors could produce some nice new tools.

I can be reached at [hidden email]

Greg
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Character-counting parser function

Chad
On Mon, Jan 5, 2009 at 11:54 PM, Greg L <[hidden email]>wrote:

> Hi,
>
> I'm new to this venue so please have patience with me. Jimbo suggested
> I contact Erik and Erik said I should post here.
>
> Wikipedia authors of magic words and templates could really use a
> character-counting parser function. All the background information can
> be found here:
>
>
> http://en.wikipedia.org/w/index.php?title=User_talk:Jimbo_Wales&oldid=260819871
>  - Developer_support_for_parser_function
>
> In a nutshell though, there is currently a template on en.Wikipedia
> called {{val}} that delimits numbers (places what appears to be
> thinspaces very three characters in scientific notation). It currently
> must use math-based techniques to parse the value and this results in
> rounding errors 5–10% of the time.
>
> A character-counting parser function would accept interrogations such
> as "Are there more than four characters remaining in the string when
> counting right from the decimal point?" And "If so, feed me three more
> characters." Such a parser function would be very handy for many other
> purposes. With a good, bullet-proof parser function, our small army of
> template authors could produce some nice new tools.
>
> I can be reached at [hidden email]
>
> Greg
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

There's http://www.mediawiki.org/wiki/Extension:StringFunctions, but
it had some issues (which caused a reversal of a merge with Parser
Functions). IIRC, it was mentioned at the time that without some
improvements, it wouldn't get enabled on WMF wikis.

-Chad
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Character-counting parser function

Greg L
Yes, I heard about that one. Do you think it is more feasible to fix  
Extension:StringFunctions or to just make a new, specialized one? -Greg


On Jan 5, 2009, at 8:58 PM, Chad wrote:

On Mon, Jan 5, 2009 at 11:54 PM, Greg L  
<[hidden email]>wrote:

> Hi,
>
> I'm new to this venue so please have patience with me. Jimbo suggested
> I contact Erik and Erik said I should post here.
>
> Wikipedia authors of magic words and templates could really use a
> character-counting parser function. All the background information can
> be found here:
>
>
> http://en.wikipedia.org/w/index.php?title=User_talk:Jimbo_Wales&oldid=260819871
> - Developer_support_for_parser_function
>
> In a nutshell though, there is currently a template on en.Wikipedia
> called {{val}} that delimits numbers (places what appears to be
> thinspaces very three characters in scientific notation). It currently
> must use math-based techniques to parse the value and this results in
> rounding errors 5–10% of the time.
>
> A character-counting parser function would accept interrogations such
> as "Are there more than four characters remaining in the string when
> counting right from the decimal point?" And "If so, feed me three more
> characters." Such a parser function would be very handy for many other
> purposes. With a good, bullet-proof parser function, our small army of
> template authors could produce some nice new tools.
>
> I can be reached at [hidden email]
>
> Greg
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

There's http://www.mediawiki.org/wiki/Extension:StringFunctions, but
it had some issues (which caused a reversal of a merge with Parser
Functions). IIRC, it was mentioned at the time that without some
improvements, it wouldn't get enabled on WMF wikis.

-Chad
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Character-counting parser function

Bugzilla from andrew@epstone.net
On 06/01/2009, at 0:02, Greg L <[hidden email]> wrote:

> Yes, I heard about that one. Do you think it is more feasible to fix
> Extension:StringFunctions or to just make a new, specialized one? -
> Greg
>>
>>
>>
>>
>>
>>
>>
>>
>>

Neither. You should write your number formatting in PHP as an  
extension, not in template.

Wikitext is supposed to be a markup language, not a programming  
language, and for this reason we currently have no plans to enable  
StringFunctions or any similar functionality on Wikimedia sites.

This is my understanding of Brion and Tim's position. Please correct  
me if I'm wrong.

Andrew Garrett

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Character-counting parser function

Aryeh Gregor
On Wed, Jan 7, 2009 at 8:36 AM, Andrew Garrett <[hidden email]> wrote:
> Wikitext is supposed to be a markup language, not a programming
> language, and for this reason we currently have no plans to enable
> StringFunctions or any similar functionality on Wikimedia sites.
>
> This is my understanding of Brion and Tim's position. Please correct
> me if I'm wrong.

My understanding is that StringFunctions isn't enabled because it
hasn't passed review, and needs improvements before it can, but that
it's planned to eventually merge it into ParserFunctions.  Tim
explicitly recommended that string-related functions be added to
ParserFunctions:

https://bugzilla.wikimedia.org/show_bug.cgi?id=6455#c36

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Character-counting parser function

Alex Zaddach
Aryeh Gregor wrote:

> On Wed, Jan 7, 2009 at 8:36 AM, Andrew Garrett <[hidden email]> wrote:
>> Wikitext is supposed to be a markup language, not a programming
>> language, and for this reason we currently have no plans to enable
>> StringFunctions or any similar functionality on Wikimedia sites.
>>
>> This is my understanding of Brion and Tim's position. Please correct
>> me if I'm wrong.
>
> My understanding is that StringFunctions isn't enabled because it
> hasn't passed review, and needs improvements before it can, but that
> it's planned to eventually merge it into ParserFunctions.  Tim
> explicitly recommended that string-related functions be added to
> ParserFunctions:
>
> https://bugzilla.wikimedia.org/show_bug.cgi?id=6455#c36
>

I believe its LoopFunctions[1] and VariablesExtension[2] that have the
"programming language" issues, and DynamicFunctions[3], which also has
caching problems.

[1] <http://www.mediawiki.org/wiki/Extension:LoopFunctions>
[2] <http://www.mediawiki.org/wiki/Extension:VariablesExtension>
[3] <http://www.mediawiki.org/wiki/DynamicFunctions>
--
Alex (wikipedia:en:User:Mr.Z-man)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Character-counting parser function

Greg L
I’m not a developer so it would be great if either of you (Aryeh or  
Mr.Z-man) could explain whether a character-counting parser function  
(or similar tool) is currently available (or could be made) for  
template authors to use. As for “we currently have no plans to enable  
StringFunctions or any similar functionality on Wikimedia sites”, why  
would that be a good plan? If it makes sense to Jimbo to have  
character-counting a parser function, and to several template authors,  
and to some editors who rely upon templates that could benefit from  
such tools, then what is wrong a character-counting parser function?  
Or is there something particular about “StringFunctions” that goes  
beyond the straight and narrow requirements of the character-counting  
parser function as required to implement {val} and {delimitnum}?

BTW: Has anyone looked at http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_(dates_and_numbers)/Archive_94#Grouping_of_digits_after_the_decimal_point_.28next_attempt.29

The functionality required is explained there.

Greg L

On Jan 7, 2009, at 11:55 AM, Alex wrote:

Aryeh Gregor wrote:

> On Wed, Jan 7, 2009 at 8:36 AM, Andrew Garrett <[hidden email]>  
> wrote:
>> Wikitext is supposed to be a markup language, not a programming
>> language, and for this reason we currently have no plans to enable
>> StringFunctions or any similar functionality on Wikimedia sites.
>>
>> This is my understanding of Brion and Tim's position. Please correct
>> me if I'm wrong.
>
> My understanding is that StringFunctions isn't enabled because it
> hasn't passed review, and needs improvements before it can, but that
> it's planned to eventually merge it into ParserFunctions.  Tim
> explicitly recommended that string-related functions be added to
> ParserFunctions:
>
> https://bugzilla.wikimedia.org/show_bug.cgi?id=6455#c36
>

I believe its LoopFunctions[1] and VariablesExtension[2] that have the
"programming language" issues, and DynamicFunctions[3], which also has
caching problems.

[1] <http://www.mediawiki.org/wiki/Extension:LoopFunctions>
[2] <http://www.mediawiki.org/wiki/Extension:VariablesExtension>
[3] <http://www.mediawiki.org/wiki/DynamicFunctions>
--
Alex (wikipedia:en:User:Mr.Z-man)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Character-counting parser function

Chad
On Thu, Jan 8, 2009 at 1:31 PM, Greg L <[hidden email]>wrote:

> I'm not a developer so it would be great if either of you (Aryeh or
> Mr.Z-man) could explain whether a character-counting parser function
> (or similar tool) is currently available (or could be made) for
> template authors to use. As for "we currently have no plans to enable
> StringFunctions or any similar functionality on Wikimedia sites", why
> would that be a good plan? If it makes sense to Jimbo to have
> character-counting a parser function, and to several template authors,
> and to some editors who rely upon templates that could benefit from
> such tools, then what is wrong a character-counting parser function?
> Or is there something particular about "StringFunctions" that goes
> beyond the straight and narrow requirements of the character-counting
> parser function as required to implement {val} and {delimitnum}?
>
> BTW: Has anyone looked at
> http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_(dates_and_numbers)/Archive_94#Grouping_of_digits_after_the_decimal_point_.28next_attempt.29<http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_%28dates_and_numbers%29/Archive_94#Grouping_of_digits_after_the_decimal_point_.28next_attempt.29>
>
> The functionality required is explained there.
>
> Greg L
>
>
Brion outlined his concerns with StringFunctions--when it
was merged with ParserFunctions and he reverted it--back
in r39653. Mainly, the overall package is too memory
intensive as currently written.

-Chad

http://www.mediawiki.org/wiki/Special:Code/MediaWiki/39653
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Character-counting parser function

Aryeh Gregor
In reply to this post by Greg L
On Thu, Jan 8, 2009 at 1:31 PM, Greg L <[hidden email]> wrote:
> I'm not a developer so it would be great if either of you (Aryeh or
> Mr.Z-man) could explain whether a character-counting parser function
> (or similar tool) is currently available (or could be made) for
> template authors to use.

Such tools are available, but none has been written well enough that
it could be used on Wikimedia sites.

> As for "we currently have no plans to enable
> StringFunctions or any similar functionality on Wikimedia sites", why
> would that be a good plan?

Andrew was mistaken in that statement.  Variables are currently off
the table, but string functions aren't.  They need someone to write a
good version of them that isn't DOSable and handles things like strip
markers acceptably.

On Thu, Jan 8, 2009 at 1:43 PM, Chad <[hidden email]> wrote:
> Brion outlined his concerns with StringFunctions--when it
> was merged with ParserFunctions and he reverted it--back
> in r39653. Mainly, the overall package is too memory
> intensive as currently written.

His exact comment might be more elucidating: "o_O These look like the
least CPU- and memory-efficient implementations of strlen(), strpos()
etc that could possibly be created..."  For example, the {{#len:}}
function was implemented as the return value of this:

        /**
         * Splits the string into its component parts using preg_match_all().
         * $chars is set to the resulting array of multibyte characters.
         * Returns count($chars).
         */
        function mwSplit ( &$parser, $str, &$chars ) {
                # Get marker prefix & suffix
                $prefix = preg_quote( $parser->mUniqPrefix );
                if( isset($parser->mMarkerSuffix) )
                        $suffix = preg_quote( $parser->mMarkerSuffix );
                else if ( strcmp( MW_PARSER_VERSION, '1.6.1' ) > 0 )
                        $suffix = 'QINU\x07';
                else $suffix = 'QINU';

                # Treat strip markers as single multibyte characters
                $count = preg_match_all('/' . $prefix . '.*?' . $suffix . '|./su',
$str, $arr);
                $chars = $arr[0];
                return $count;
        }

Rather than, say, replacing strip markers using the appropriate Parser
method, and then returning mb_strlen().  Or whatever would be
appropriate.  I'm not sure what would be, but I'm pretty sure it
doesn't involve exploding the string into an array to calculate its
length.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Character-counting parser function

Daniel Friesen
If someone is looking for a better implementation of string functions
like stuff my WikiCode <http://wiki-tools.com/wiki/WikiCode> extension
is GPL so it'll be fine if you want to extract some of it to build a new
string functions extension or expand parser functions.
Just remember to credit me for the parts of the code you copy and keep
it under GPL.

You can see some code in use on the demo page:
http://dev.wiki-tools.com/wiki/WikiCode

If you'll take note on a few of my string functions there, you'll also
see that the ones I've written quite nicely handle nowiki tags,
multibyte strings, and if you go and cross check with the actual string
functions extension and a bug in bugzilla they are free of known issues
with string functions and multibyte characters in certain places.
The only string functions there that aren't completed is #pad, my
implementation of that isn't as simple because one of my aims was also
to handle nowiki tags in a fairly logical way. Because of that native
functions can't really be used.
Unfortunately for the extension itself, I haven't really been doing any
MediaWiki related development lately.

~Daniel Friesen (Dantman, Nadir-Seen-Fire)
~Profile/Portfolio: http://nadir-seen-fire.com
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)



Aryeh Gregor wrote:

> On Thu, Jan 8, 2009 at 1:31 PM, Greg L <[hidden email]> wrote:
>  
>> I'm not a developer so it would be great if either of you (Aryeh or
>> Mr.Z-man) could explain whether a character-counting parser function
>> (or similar tool) is currently available (or could be made) for
>> template authors to use.
>>    
>
> Such tools are available, but none has been written well enough that
> it could be used on Wikimedia sites.
>
>  
>> As for "we currently have no plans to enable
>> StringFunctions or any similar functionality on Wikimedia sites", why
>> would that be a good plan?
>>    
>
> Andrew was mistaken in that statement.  Variables are currently off
> the table, but string functions aren't.  They need someone to write a
> good version of them that isn't DOSable and handles things like strip
> markers acceptably.
>
> On Thu, Jan 8, 2009 at 1:43 PM, Chad <[hidden email]> wrote:
>  
>> Brion outlined his concerns with StringFunctions--when it
>> was merged with ParserFunctions and he reverted it--back
>> in r39653. Mainly, the overall package is too memory
>> intensive as currently written.
>>    
>
> His exact comment might be more elucidating: "o_O These look like the
> least CPU- and memory-efficient implementations of strlen(), strpos()
> etc that could possibly be created..."  For example, the {{#len:}}
> function was implemented as the return value of this:
>
> /**
> * Splits the string into its component parts using preg_match_all().
> * $chars is set to the resulting array of multibyte characters.
> * Returns count($chars).
> */
> function mwSplit ( &$parser, $str, &$chars ) {
> # Get marker prefix & suffix
> $prefix = preg_quote( $parser->mUniqPrefix );
> if( isset($parser->mMarkerSuffix) )
> $suffix = preg_quote( $parser->mMarkerSuffix );
> else if ( strcmp( MW_PARSER_VERSION, '1.6.1' ) > 0 )
> $suffix = 'QINU\x07';
> else $suffix = 'QINU';
>
> # Treat strip markers as single multibyte characters
> $count = preg_match_all('/' . $prefix . '.*?' . $suffix . '|./su',
> $str, $arr);
> $chars = $arr[0];
> return $count;
> }
>
> Rather than, say, replacing strip markers using the appropriate Parser
> method, and then returning mb_strlen().  Or whatever would be
> appropriate.  I'm not sure what would be, but I'm pretty sure it
> doesn't involve exploding the string into an array to calculate its
> length.
>  
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l