Parser cache update/migration strategies

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Parser cache update/migration strategies

Daniel Kinzler
Hi all!

tl;dr: How to best handle the situation of an old parser cache entry not
containing all the info expected by a newly deployed version of code?


We are currently working to improve our usage of the parser cache for
Wikibase/Wikidata. E.g., We are attaching additional information related to
languagelinks the to ParserOutput, so we can use it in the skin when generating
the sidebar.

However, when we change what gets stored in the parser cache, we still need to
deal with old cache entries that do not yet have the desired information
attached. Here's a few options we have if the expected info isn't in the cached
ParserOutput:

1) ...then generate it on the fly. On every page view, until the parser cache is
purged. This seems bad especially if generating the required info means hitting
the database.

2) ...then invalidate the parser cache for this page, and then a) just live with
this request missing a bit of output, or b) generate on the fly c) trigger a
self-redirect.

3) ...then generated it, attach it to the ParserOutput, and push the updated
ParserOutput object back into the cache. This seems nice, but I'm not sure how
to do that.

4) ...then force a full re-rendering and re-caching of the page, then continue.
I'm not sure how to do this cleanly.


So, the simplest solution seems to be 2, but it means that we invalidate the
parser cache of *every* page on the wiki potentially (though we will not hit the
long tail of rarely viewed pages immediately). It effectively means that any
such change requires all pages to be re-rendered eventually. Is that acceptable?

Solution 3 seems nice and surgical, just injecting the new info into the cached
object. Is there a nice and clean way to *update* a parser cache entry like
that, without re-generating it in full? Do you see any issues with this
approach? Is it worth the trouble?


Any input would be great!

Thanks,
daniel

--
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Parser cache update/migration strategies

Aude-2
On Tue, Sep 9, 2014 at 12:03 PM, Daniel Kinzler <[hidden email]>
wrote:

> Hi all!
>
> tl;dr: How to best handle the situation of an old parser cache entry not
> containing all the info expected by a newly deployed version of code?
>
>
> We are currently working to improve our usage of the parser cache for
> Wikibase/Wikidata. E.g., We are attaching additional information related to
> languagelinks the to ParserOutput, so we can use it in the skin when
> generating
> the sidebar.
>
> However, when we change what gets stored in the parser cache, we still
> need to
> deal with old cache entries that do not yet have the desired information
> attached. Here's a few options we have if the expected info isn't in the
> cached
> ParserOutput:
>
> 1) ...then generate it on the fly. On every page view, until the parser
> cache is
> purged. This seems bad especially if generating the required info means
> hitting
> the database.
>
> 2) ...then invalidate the parser cache for this page, and then a) just
> live with
> this request missing a bit of output, or b) generate on the fly c) trigger
> a
> self-redirect.
>
> 3) ...then generated it, attach it to the ParserOutput, and push the
> updated
> ParserOutput object back into the cache. This seems nice, but I'm not sure
> how
> to do that.
>


https://gerrit.wikimedia.org/r/#/c/158879/ is my attempt to update
ParserOutput cache entry, though it seems too simplistic a solution.

Any feedback on this would be great or suggestions on how to do this
better, or maybe it's crazy idea. :P

Cheers,
Katie


>
> 4) ...then force a full re-rendering and re-caching of the page, then
> continue.
> I'm not sure how to do this cleanly.
>
>
> So, the simplest solution seems to be 2, but it means that we invalidate
> the
> parser cache of *every* page on the wiki potentially (though we will not
> hit the
> long tail of rarely viewed pages immediately). It effectively means that
> any
> such change requires all pages to be re-rendered eventually. Is that
> acceptable?
>
> Solution 3 seems nice and surgical, just injecting the new info into the
> cached
> object. Is there a nice and clean way to *update* a parser cache entry like
> that, without re-generating it in full? Do you see any issues with this
> approach? Is it worth the trouble?
>
>
> Any input would be great!
>
> Thanks,
> daniel
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




--
@wikimediadc / @wikidata
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Parser cache update/migration strategies

Nikolas Everett
Also option 5 could be to continue without the days until the parser cash
is invalidated on its own.
Maybe option 6 could be to continue without the data and invalidate the
cache and completely rerender only some of the time. Like 5% of the time
for the first couple hours then 25% of the time for a day then 100% of the
time after that. It'd guarantee that the cache is good after a certain
amount of time without causing a big spike ridge after deploys.
All those options are less good then just updating the cache I think.

Nik
On Sep 9, 2014 6:42 AM, "aude" <[hidden email]> wrote:

> On Tue, Sep 9, 2014 at 12:03 PM, Daniel Kinzler <[hidden email]>
> wrote:
>
> > Hi all!
> >
> > tl;dr: How to best handle the situation of an old parser cache entry not
> > containing all the info expected by a newly deployed version of code?
> >
> >
> > We are currently working to improve our usage of the parser cache for
> > Wikibase/Wikidata. E.g., We are attaching additional information related
> to
> > languagelinks the to ParserOutput, so we can use it in the skin when
> > generating
> > the sidebar.
> >
> > However, when we change what gets stored in the parser cache, we still
> > need to
> > deal with old cache entries that do not yet have the desired information
> > attached. Here's a few options we have if the expected info isn't in the
> > cached
> > ParserOutput:
> >
> > 1) ...then generate it on the fly. On every page view, until the parser
> > cache is
> > purged. This seems bad especially if generating the required info means
> > hitting
> > the database.
> >
> > 2) ...then invalidate the parser cache for this page, and then a) just
> > live with
> > this request missing a bit of output, or b) generate on the fly c)
> trigger
> > a
> > self-redirect.
> >
> > 3) ...then generated it, attach it to the ParserOutput, and push the
> > updated
> > ParserOutput object back into the cache. This seems nice, but I'm not
> sure
> > how
> > to do that.
> >
>
>
> https://gerrit.wikimedia.org/r/#/c/158879/ is my attempt to update
> ParserOutput cache entry, though it seems too simplistic a solution.
>
> Any feedback on this would be great or suggestions on how to do this
> better, or maybe it's crazy idea. :P
>
> Cheers,
> Katie
>
>
> >
> > 4) ...then force a full re-rendering and re-caching of the page, then
> > continue.
> > I'm not sure how to do this cleanly.
> >
> >
> > So, the simplest solution seems to be 2, but it means that we invalidate
> > the
> > parser cache of *every* page on the wiki potentially (though we will not
> > hit the
> > long tail of rarely viewed pages immediately). It effectively means that
> > any
> > such change requires all pages to be re-rendered eventually. Is that
> > acceptable?
> >
> > Solution 3 seems nice and surgical, just injecting the new info into the
> > cached
> > object. Is there a nice and clean way to *update* a parser cache entry
> like
> > that, without re-generating it in full? Do you see any issues with this
> > approach? Is it worth the trouble?
> >
> >
> > Any input would be great!
> >
> > Thanks,
> > daniel
> >
> > --
> > Daniel Kinzler
> > Senior Software Developer
> >
> > Wikimedia Deutschland
> > Gesellschaft zur Förderung Freien Wissens e.V.
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
>
> --
> @wikimediadc / @wikidata
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Parser cache update/migration strategies

Daniel Kinzler
Am 09.09.2014 13:45, schrieb Nikolas Everett:
> All those options are less good then just updating the cache I think.

Indeed. And that *sounds* simple enough. The issue is that we have to be sure to
update the correct cache key, the exact one the OutputPage object in question
was loaded from. Otherwise, we'll be updating the wrong key, and will read the
incomplete object again, and try to update again, and again, on every page view.

Sadly, the mechanism for determining the parser cache key is quite complicated
and rather opaque. The approach Katie tries in I1a11b200f0c looks fine at a
glance, but even if i can verify that it works as expected on my machine, I have
no idea how it will behave on the more strange wikis on the live cluster.

Any ideas who could help with that?

-- daniel



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Parser cache update/migration strategies

Nikolas Everett
On Tue, Sep 9, 2014 at 8:00 AM, Daniel Kinzler <[hidden email]> wrote:

> Am 09.09.2014 13:45, schrieb Nikolas Everett:
> > All those options are less good then just updating the cache I think.
>
> Indeed. And that *sounds* simple enough. The issue is that we have to be
> sure to
> update the correct cache key, the exact one the OutputPage object in
> question
> was loaded from. Otherwise, we'll be updating the wrong key, and will read
> the
> incomplete object again, and try to update again, and again, on every page
> view.
>
> Sadly, the mechanism for determining the parser cache key is quite
> complicated
> and rather opaque. The approach Katie tries in I1a11b200f0c looks fine at a
> glance, but even if i can verify that it works as expected on my machine,
> I have
> no idea how it will behave on the more strange wikis on the live cluster.
>
> Any ideas who could help with that?
>

No, not really.  My only experience with the parser cache was accidentally
polluting it with broken pages one time.

I suppose one option is to be defensive around reusing the key.  I mean, if
you could check the key used to fetch from the parser cache and you had a
cache hit then you know if you do a put you'll be setting _something_.

Another thing - I believe uncached calls to the parser are wrapped in pool
counter acquisitions to make sure no two processes spend duplicate effort.
You may want to acquire that to make sure anything you do that is heavy
doesn't get done twice.

Once you start talking about that it might just be simpler to invalidate
the whole entry.....

Another option:
Kick off some kind of cache invalidation job that _slowly_ invalidates the
appropriate parts of the cache.  Something like how the varnish cache is
invalidated on template change.  That gives you marginally more control
than randomized invalidation.

Nik
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Parser cache update/migration strategies

Tim Starling-2
In reply to this post by Daniel Kinzler
On 09/09/14 22:00, Daniel Kinzler wrote:
> Sadly, the mechanism for determining the parser cache key is quite complicated
> and rather opaque.

It's only as complicated as it has to be, to support the desired features:

* Options which change the parser output.
* Merging of parser output objects when a given option does not affect
the output for a given input text, even though it may affect the
output other inputs.

> The approach Katie tries in I1a11b200f0c looks fine at a
> glance, but even if i can verify that it works as expected on my machine, I have
> no idea how it will behave on the more strange wikis on the live cluster.

It will probably work. It assumes that the parser output for the
current article will always be added to the OutputPage before the
SidebarBeforeOutput hook is called. If that assumption was violated on
some popular URL, then it could waste quite a lot of CPU time.

It also assumes that the context page is always the same as the page
which was parsed and added to the OutputPage -- another assumption
which could have nasty consequences if it is violated.

I think it is fine to just invalidate all pages on the wiki. This can
be done by deploying the parser change, then progressively increasing
$wgCacheEpoch over the course of a week or two, until it is higher
than the change deployment time. If you increase $wgCacheEpoch too
fast, then you will get an overload.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l