Aquestion about templates parsing and caching

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Aquestion about templates parsing and caching

Alex Brollo
I'd like to know something more about template parsing/caching for
performance issues.

My question is: when a template is called, it's wikicode, I suppose, is
parsed and translated into "something running" - I can't imagine what
precisely, but I don't care so much about (so far :-) ). If a second call
comes to the server for the same template, but with different parameters,
the template is parsed again from scratch or something from previous parsing
is used again, so saving a little bit of server load?

If the reply is "yes", t.i. if the "running code" of the whole template is
somehow saved and cached, ready to be used again with new parameters,
perhaps it could be a good idea to build templates as "librares of different
templates", using the name of the template as a "library name" and a
parameter as the name of "specific function"; a simple #switch could be used
to use the appropriate code of that "specific function".

On the contrary, if nothing is saved, there would be good reasons to keep
the template code as simple as possible, and this idea of "libraries" would
be a bad one.

Alex
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Aquestion about templates parsing and caching

Brion Vibber
On Fri, Apr 8, 2011 at 2:11 PM, Alex Brollo <[hidden email]> wrote:

> I'd like to know something more about template parsing/caching for
> performance issues.
>
> My question is: when a template is called, it's wikicode, I suppose, is
> parsed and translated into "something running" - I can't imagine what
> precisely, but I don't care so much about (so far :-) ). If a second call
> comes to the server for the same template, but with different parameters,
> the template is parsed again from scratch or something from previous
> parsing
> is used again, so saving a little bit of server load?
>

Currently there's not really a solid intermediate parse structure in
MediaWiki (something we hope to change; I'll be ramping up some
documentation for the soon-to-begin mega parser redo project soon).

Approximately speaking... In the current system, the page is preprocessed
into a partial preprocessor tree which identifies certain structure
boundaries (for templates and function & tag-hook extensions); templates and
some hooks get expanded in, then it's all basically flattened back to
wikitext. Then the main parser takes over, turning the whole wikitext
document into HTML output.

I believe we do locally (in-process) cache the preprocessor structure for
pages and templates, so multiple use of the same template won't incur as
much preprocessor work. But, the preprocessor parsing is usually one of the
fastest parts of the whole parse.


If the reply is "yes", t.i. if the "running code" of the whole template is
> somehow saved and cached, ready to be used again with new parameters,
> perhaps it could be a good idea to build templates as "librares of
> different
> templates", using the name of the template as a "library name" and a
> parameter as the name of "specific function"; a simple #switch could be
> used
> to use the appropriate code of that "specific function".
>

I think for the most part, it'll be preferable to only have to work with the
functions that are needed, rather than fetching a large number of unneeded
functions at once. Even if it's pre-parsed, loading unneeded stuff means
more CPU used, more memory used, more network bandwidth used.

But being able to bundle together related things as a unit that can be
distributed together would be very nice, and should be considered for future
work on new templating and gadget systems.

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Aquestion about templates parsing and caching

Daniel Friesen-4
On 11-04-08 02:37 PM, Brion Vibber wrote:

> On Fri, Apr 8, 2011 at 2:11 PM, Alex Brollo<[hidden email]>  wrote:
>
>> I'd like to know something more about template parsing/caching for
>> performance issues.
>>
>> My question is: when a template is called, it's wikicode, I suppose, is
>> parsed and translated into "something running" - I can't imagine what
>> precisely, but I don't care so much about (so far :-) ). If a second call
>> comes to the server for the same template, but with different parameters,
>> the template is parsed again from scratch or something from previous
>> parsing
>> is used again, so saving a little bit of server load?
>>
> Currently there's not really a solid intermediate parse structure in
> MediaWiki (something we hope to change; I'll be ramping up some
> documentation for the soon-to-begin mega parser redo project soon).
>
> Approximately speaking... In the current system, the page is preprocessed
> into a partial preprocessor tree which identifies certain structure
> boundaries (for templates and function&  tag-hook extensions); templates and
> some hooks get expanded in, then it's all basically flattened back to
> wikitext. Then the main parser takes over, turning the whole wikitext
> document into HTML output.
>
> I believe we do locally (in-process) cache the preprocessor structure for
> pages and templates, so multiple use of the same template won't incur as
> much preprocessor work. But, the preprocessor parsing is usually one of the
> fastest parts of the whole parse.
>
> -- brion
I could swear we locally cache template wikitext, and save preprocessed
data to the object cache. Least I think thats what I gathered last time
I read the code.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Aquestion about templates parsing and caching

Daniel Friesen-4
In reply to this post by Brion Vibber
On 11-04-08 02:37 PM, Brion Vibber wrote:

> On Fri, Apr 8, 2011 at 2:11 PM, Alex Brollo<[hidden email]>  wrote:
>
>> I'd like to know something more about template parsing/caching for
>> performance issues.
>>
>> My question is: when a template is called, it's wikicode, I suppose, is
>> parsed and translated into "something running" - I can't imagine what
>> precisely, but I don't care so much about (so far :-) ). If a second call
>> comes to the server for the same template, but with different parameters,
>> the template is parsed again from scratch or something from previous
>> parsing
>> is used again, so saving a little bit of server load?
>>
> Currently there's not really a solid intermediate parse structure in
> MediaWiki (something we hope to change; I'll be ramping up some
> documentation for the soon-to-begin mega parser redo project soon).
>
> Approximately speaking... In the current system, the page is preprocessed
> into a partial preprocessor tree which identifies certain structure
> boundaries (for templates and function&  tag-hook extensions); templates and
> some hooks get expanded in, then it's all basically flattened back to
> wikitext. Then the main parser takes over, turning the whole wikitext
> document into HTML output.
>
> I believe we do locally (in-process) cache the preprocessor structure for
> pages and templates, so multiple use of the same template won't incur as
> much preprocessor work. But, the preprocessor parsing is usually one of the
> fastest parts of the whole parse.
>
> -- brion
I could swear we locally cache template wikitext, and save preprocessed
data to the object cache. Least I think thats what I gathered last time
I read the code.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Aquestion about templates parsing and caching

Platonides
Daniel Friesen wrote:

>> I believe we do locally (in-process) cache the preprocessor structure for
>> pages and templates, so multiple use of the same template won't incur as
>> much preprocessor work. But, the preprocessor parsing is usually one of the
>> fastest parts of the whole parse.
>
> I could swear we locally cache template wikitext, and save preprocessed
> data to the object cache. Least I think thats what I gathered last time
> I read the code.
>
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]Yes

Yes. Calling a template twice will only fetch the text once, won't
increase the 'used templates' counter...
Preprocessing of wikitext over a threshold is cached serialized (it's
easier to reprocess if it's too small).

On the original question:
The tree will be reused, but it has to be expanded again. It's not clear
that you gain by using a library since you will pay the library costs on
all articles using it. Templates should be kept simple (yes, enwiki is
particularly bad at that).

In early 2007, eswiki implemented a library template
(Plantilla:Interproyecto) which was used for adding any interwikis to
sister projects. It caused server problems and got disabled by the
sysadmins.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Aquestion about templates parsing and caching

Roan Kattouw-2
2011/4/9 Platonides <[hidden email]>:
> Yes. Calling a template twice will only fetch the text once, won't
> increase the 'used templates' counter...
> Preprocessing of wikitext over a threshold is cached serialized (it's
> easier to reprocess if it's too small).
>
To clarify: there's an in-process cache, like Brion said, so a
template that is used twice on the same page is only fetched and
preprocessed once. However, this only applies to templates called with
no parameters. If the template is passed parameters, this in-process
cache won't be used, even if the same set of parameters is used twice.

What we store in memcached is a serialized version of the preprocessor
XML tree, keyed on the MD5 hash of the wikitext input, unless it's too
small, like Platonides said. This means that if the exact same input
is fed to the preprocessor twice, it will do part of the work only one
and cache the intermediate result.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Aquestion about templates parsing and caching

Platonides
Roan Kattouw wrote:

> 2011/4/9 Platonides <[hidden email]>:
>> Yes. Calling a template twice will only fetch the text once, won't
>> increase the 'used templates' counter...
>> Preprocessing of wikitext over a threshold is cached serialized (it's
>> easier to reprocess if it's too small).
>
> To clarify: there's an in-process cache, like Brion said, so a
> template that is used twice on the same page is only fetched and
> preprocessed once. However, this only applies to templates called with
> no parameters. If the template is passed parameters, this in-process
> cache won't be used, even if the same set of parameters is used twice.

I don't think so. The preprocess-to-tree is always the same, regardless
of the parameters, and it is always used. It is the expansion where
parameter change. I don't see that cache for parameterless templates,
maybe it's the mTplExpandCache?


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Aquestion about templates parsing and caching

Roan Kattouw-2
2011/4/10 Platonides <[hidden email]>:
> I don't think so. The preprocess-to-tree is always the same, regardless
> of the parameters, and it is always used. It is the expansion where
> parameter change. I don't see that cache for parameterless templates,
> maybe it's the mTplExpandCache?
>
Could be, I don't know. This is just something Tim told me like two
years ago, it might not even be accurate anymore.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Aquestion about templates parsing and caching

Andrew Garrett-4
In reply to this post by Roan Kattouw-2
On Mon, Apr 11, 2011 at 5:59 AM, Roan Kattouw <[hidden email]> wrote:
> What we store in memcached is a serialized version of the preprocessor
> XML tree, keyed on the MD5 hash of the wikitext input, unless it's too
> small, like Platonides said. This means that if the exact same input
> is fed to the preprocessor twice, it will do part of the work only one
> and cache the intermediate result.

Yes, I implemented this with Tim's help to try to cut down on the CPU
load caused by lots of Cite templates, IIRC. If I recall correctly,
the performance benefit was not particularly substantial.

--
Andrew Garrett
http://werdn.us/

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Aquestion about templates parsing and caching

Alex Brollo
2011/4/11 Andrew Garrett <[hidden email]>

> On Mon, Apr 11, 2011 at 5:59 AM, Roan Kattouw <[hidden email]>
> wrote:
> > What we store in memcached is a serialized version of the preprocessor
> > XML tree, keyed on the MD5 hash of the wikitext input, unless it's too
> > small, like Platonides said. This means that if the exact same input
> > is fed to the preprocessor twice, it will do part of the work only one
> > and cache the intermediate result.
>
> Yes, I implemented this with Tim's help to try to cut down on the CPU
> load caused by lots of Cite templates, IIRC. If I recall correctly,
> the performance benefit was not particularly substantial.
>

Ok, coming bac to my idea of building small "libraries of  work-specific
templates into a unique template" doesn't seems a particularly brilliant
one; something that can be done only if templates merged into one are
simple, and few, and only for contributor's comfort, if any. Thanks for your
interest!

Alex
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A question about templates parsing and caching

Tim Starling-2
In reply to this post by Platonides
On 11/04/11 06:32, Platonides wrote:

> Roan Kattouw wrote:
>> 2011/4/9 Platonides<[hidden email]>:
>>> Yes. Calling a template twice will only fetch the text once, won't
>>> increase the 'used templates' counter...
>>> Preprocessing of wikitext over a threshold is cached serialized (it's
>>> easier to reprocess if it's too small).
>>
>> To clarify: there's an in-process cache, like Brion said, so a
>> template that is used twice on the same page is only fetched and
>> preprocessed once. However, this only applies to templates called with
>> no parameters. If the template is passed parameters, this in-process
>> cache won't be used, even if the same set of parameters is used twice.
>
> I don't think so. The preprocess-to-tree is always the same, regardless
> of the parameters, and it is always used. It is the expansion where
> parameter change. I don't see that cache for parameterless templates,
> maybe it's the mTplExpandCache?

The stages are basically preprocessToObj() -> expand() -> internalParse().

preprocessToObj() is the parsing stage of the preprocessor. It is fast
and easily cachable. It produces an object-based representation of the
parse tree of the text of single article or template. This object
representation is stored in a cache ($wgParser->mTplDomCache) which
exists for the duration of a single article parse operation. It
depends only on a single input string, it does not expand templates.

There is a persistent cache which stores the result of
preprocessToObj() across multiple requests, however this provides only
a small benefit.

expand() is slow. Its function is to take the parse tree of an
article, and to expand the template invocations and parser functions
that it sees to produce preprocessed wikitext.

There is a cache of the expand() step which persists for the duration
of a single parse operation ($wgParser->mTplExpandCache), but it only
operates on template invocations with no arguments, like {{!}}. It's
possible in theory to cache the expand() results for templates with
arguments, but I didn't do it because it looked like it would be
difficult to efficiently hash the parse tree of the arguments in order
to retrieve the correct entry from the cache. This would be a good
project for future development work.

I think it's fair to constrain parser functions to require that they
return the same result for the same arguments, during a single parse
operation. That's all you need to do to have an effective expand() cache.

However, the benefit would be limited due to the dominance of
infoboxes and navboxes which appear only once in each article. It's
not guaranteed that the result of expand() will be the same when done
at different times or in different articles.

internalParse() takes preprocessed wikitext and produces HTML. The
final output is cached by the parser cache.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A question about templates parsing and caching

Daniel Friesen-4
On 11-04-10 06:13 PM, Tim Starling wrote:
> [...]
> I think it's fair to constrain parser functions to require that they
> return the same result for the same arguments, during a single parse
> operation. That's all you need to do to have an effective expand() cache.
> [...]
> -- Tim Starling
That /might/ work nicely for #ask.
However Counter, ArrayExtension, Variables, Random, etc... won't play
nicely with that.

Perhaps a way for parser functions to opt-in or opt-out. So we can
exclude functions that .

Side thought... why a #switch library? What happened to the old
{{Foo/{{{1}}}|...}} trick?

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A question about templates parsing and caching

Alex Brollo
2011/4/11 Daniel Friesen <[hidden email]>

>
> Side thought... why a #switch library? What happened to the old
> {{Foo/{{{1}}}|...}} trick?
>

Simply,  {{Foo/{{{1}}}|...}} links to different pages, while
{{Foo|{{{1}}}|...}} points to the same page. I had been frustrated when I
tried to use Labeled Section Transclusion to build template libraries :-),
that would be an excellent way to build "collection of objects" into a wiki
page, both of  "methods" and "attributes"... ... but #lst doesn't parse raw
wiki code "from scratch". If it would (t.i.: if #lst would read wiki code
"as it is",  before any parsing of it, ignoring at all the code outside
labelled section: t.i. ignoring noinclude, html comment tags... anything)
interesting scenarios would raise.

But, if there's no performance gain with {{Foo|{{{1}}}|...}} trick, I'll use
{{Foo/{{{1}}}|...}} for sure. KISS is always a good guide line. :-)

Alex
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A question about templates parsing and caching

Daniel Friesen-4
On 11-04-11 12:05 AM, Alex Brollo wrote:

> 2011/4/11 Daniel Friesen<[hidden email]>
>
>> Side thought... why a #switch library? What happened to the old
>> {{Foo/{{{1}}}|...}} trick?
>>
> Simply,  {{Foo/{{{1}}}|...}} links to different pages, while
> {{Foo|{{{1}}}|...}} points to the same page. I had been frustrated when I
> tried to use Labeled Section Transclusion to build template libraries :-),
> that would be an excellent way to build "collection of objects" into a wiki
> page, both of  "methods" and "attributes"... ... but #lst doesn't parse raw
> wiki code "from scratch". If it would (t.i.: if #lst would read wiki code
> "as it is",  before any parsing of it, ignoring at all the code outside
> labelled section: t.i. ignoring noinclude, html comment tags... anything)
> interesting scenarios would raise.
>
> But, if there's no performance gain with {{Foo|{{{1}}}|...}} trick, I'll use
> {{Foo/{{{1}}}|...}} for sure. KISS is always a good guide line. :-)
>
> Alex
Pointing to different pages is essentially the point of the trick.
[[Template:Library]] =
{{#ifexist:Library/{{{1}}}|{{Library/{{{1}}}|...}}|There is no library
function by the name "{{{1}}}".}}
[[Template:Library/a]] = Do a
[[Template:Library/b]] = Do b

{{library|a}} => "Do a"
{{library|b}} => "Do b"

It essentially works the same as:
[[Template:Library]] = {{#switch:{{{1}}}|a=Do a|b=Do b|There is no
library function by the name "{{{1}}}".}}

Except you don't create an obscenely large preprocessed hierarchy which
is cloned in it's entirety to multiple places and expanded multiple
times just to get access to multiple pieces of the library.


Though, when we're talking about stuff this complex... that line about
using a REAL programming language comes into play...
Would be nice if there was some implemented-in-php language script
language we could use that would work on any wiki. I "had" a project
playing around with that idea but it's dead.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A question about templates parsing and caching

Alex Brollo
2011/4/11 Daniel Friesen <[hidden email]>

>
>
> Though, when we're talking about stuff this complex... that line about
> using a REAL programming language comes into play...
> Would be nice if there was some implemented-in-php language script
> language we could use that would work on any wiki. I "had" a project
> playing around with that idea but it's dead.
>
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
>

Are you a wikisource contributor? If you are, I guess that you considered
too this syntax, pointing to what I wrote into my last message:
{{#section:Foo|{{{1}}}}} but... it refuses to run, if section {{{1}}} of
page Foo is a "method" (while its runs obviuosly if section is an
"attribute"). :-)

Alex
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A question about templates parsing and caching

Neil Harris
In reply to this post by Daniel Friesen-4
On 11/04/11 02:51, Daniel Friesen wrote:

> On 11-04-10 06:13 PM, Tim Starling wrote:
>> [...]
>> I think it's fair to constrain parser functions to require that they
>> return the same result for the same arguments, during a single parse
>> operation. That's all you need to do to have an effective expand() cache.
>> [...]
>> -- Tim Starling
> That /might/ work nicely for #ask.
> However Counter, ArrayExtension, Variables, Random, etc... won't play
> nicely with that.
>
> Perhaps a way for parser functions to opt-in or opt-out. So we can
> exclude functions that .
>
> Side thought... why a #switch library? What happened to the old
> {{Foo/{{{1}}}|...}} trick?
>
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
>

I would have thought this could be done automatically, by designating
templates as being "pure" or "impure" (in the sense of pure and impure
functions) -- something which could be done recursively all the way down
to basic parser functions, magic words, etc., which would have to be
designated pure or impure by hand as part of the software implementation.

-- Neil


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l