Re: Query result caching and invalidation (Jeroen De Dauw)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Query result caching and invalidation (Jeroen De Dauw)

James HK
Hi,

Well let me just hawk into this discussion since we are running a
query cache solution for the last two month based on:

## Storage engine
We do not use any database table but instead rely on an available
caching engine either APC, memcached or MW's own object cache.

## Query uniqueness
Before a getQueryResult() is executed a hash key is generated from
$query->getQueryString() . '#'. $query->getLimit() . '#' .
$query->getOffset() . '#'.  serialize( $printouts ) . '#' . serialize(
$query->sortkeys ) which gives enough depth to ensure comparability
among queries.

## Associated objects
While having $res->getResults stored as single cache object (which is
the simplest of all operations) the more important role goes to
associated objects. Associated objects are individual entities (page,
property etc.)  that part of the result set and the condition and each
stored as separated cache object (each associated object uses its own
md5 hash key, easy to track any object with the same key during any
update process). This allows to build a chain between objects and the
query that requested its involvement.

During each update process (onChangeTitle, onUpdateDataBefore,
onDelete etc) an object and its hash key (it is a cheap operation due
to its 1:1 relation) is checked against the cache pool and if an cache
object exists the stored array of hash keys will point to all involved
queries of that object. When one of these objects are detected during
the update process they will be purged from cache ($cache->delete(
$row ).

If a change happened, a result is build up from scratch because of any
potential alteration (we do not compare any change we assume a change
happened and therefore the risk of an invalid result is higher than
just having a new result set) that could have happened with one of the
involved objects that caused the invalidation.

Since we only use in-memory cache objects, we don't have to care about
synchronization of table objects and we simply use the hash key as
comparator, invalidator and chaining object.

## Result
The greatest benefit will emerge from query results that only change
occasionally and for queries with have a high turnover due to a high
velocity on associated objects there has been no measurable downside
(we use APC or memcached rather than a database).

Cheers,

mwjames

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Semediawiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Reply | Threaded
Open this post in threaded view
|

Re: Query result caching and invalidation (Jeroen De Dauw)

Jeroen De Dauw-2
Hey James,

This implementation of you is great, but there are some important differences with having full query management functionality.

Your solutions is great because:

* It's very simple
* It has huge benefits for API and Special:Ask queries

While the thing I'm proposing

* Is complex
* Does not benefit the API and Special:Ask (since it would consciously ignore these to avoid doing not needed work)

But then again, it solves another problem then your change:

* it fixes the existing persistent MediaWiki cache for articles to get invalidated at the correct points
* and it persistently caches results for inline queries

So these two caching solutions can live happily next to each other. Once the new one has been implemented I suggest disabling the current one for inline queries, but keeping it for all the rest.

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Semediawiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Reply | Threaded
Open this post in threaded view
|

Re: Query result caching and invalidation (Jeroen De Dauw)

John McClure

Hi James and Jeroen,

Would you pleaes explain what is being gained by query caching? It seems to me that simple transclusion of pages stored on wikidata would be just as effective, would require no additional code. Thanks for your reply - john

On 26.07.2012 12:12, Jeroen De Dauw wrote:

Hey James,

This implementation of you is great, but there are some important differences with having full query management functionality.

Your solutions is great because:

* It's very simple
* It has huge benefits for API and Special:Ask queries

While the thing I'm proposing

* Is complex
* Does not benefit the API and Special:Ask (since it would consciously ignore these to avoid doing not needed work)

But then again, it solves another problem then your change:

* it fixes the existing persistent MediaWiki cache for articles to get invalidated at the correct points
* and it persistently caches results for inline queries

So these two caching solutions can live happily next to each other. Once the new one has been implemented I suggest disabling the current one for inline queries, but keeping it for all the rest.

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--

 

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Semediawiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Reply | Threaded
Open this post in threaded view
|

Re: Query result caching and invalidation (Jeroen De Dauw)

Jeroen De Dauw-2
Hey,

> Would you pleaes explain what is being gained by query caching?

https://en.wikipedia.org/wiki/Cache_%28computing%29

> It seems to me that simple transclusion of pages stored on wikidata would be just as effective, would require no additional code.

* This has nothing to do with Wikidata
* This has nothing to do with transclusion of pages
* This definitely requires code to work

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Semediawiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Reply | Threaded
Open this post in threaded view
|

Re: Query result caching and invalidation (Jeroen De Dauw)

John McClure

I'm sorry, I meant to say that Concepts cache queries, so it suggests there's a mechanism driving the creation of Concept pages. It'd be interesting to hear about that mechanism and whether it can be used with Concept pages in SMW.

Thanks - john

On 26.07.2012 14:10, Jeroen De Dauw wrote:

Hey,

> Would you pleaes explain what is being gained by query caching?

https://en.wikipedia.org/wiki/Cache_%28computing%29

> It seems to me that simple transclusion of pages stored on wikidata would be just as effective, would require no additional code.

* This has nothing to do with Wikidata
* This has nothing to do with transclusion of pages
* This definitely requires code to work

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--

 

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Semediawiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Reply | Threaded
Open this post in threaded view
|

Re: Query result caching and invalidation (Jeroen De Dauw)

Markus Krötzsch-2
On 27/07/12 05:12, [hidden email] wrote:
> I'm sorry, I meant to say that Concepts cache queries, so it suggests
> there's a mechanism driving the creation of Concept pages. It'd be
> interesting to hear about that mechanism and whether it can be used with
> Concept pages in SMW.

Concepts have some similarities in that they are also conceived as
"cached queries". The main difference to our current discussion is that
we now want to update these caches automatically to ensure that they are
always up-to-date. This will also require us to store more information
about each query.

The concept-based caching, in contrast, is a manual approach where users
create and update caches. It is intended for queries that are in general
too slow to be computed at page render time (which would be necessary in
all other cases, even if we maintain a cache of the results once they
are computed).

As a side effect, the new solution would also keep track of all queries
that are used on the wiki, which could be useful for other purposes
(including performance analysis). Right now, there is no good way to
find out which queries are used on a wiki.

Markus

> On 26.07.2012 14:10, Jeroen De Dauw wrote:
>
>> Hey,
>>
>> > Would you pleaes explain what is being gained by query caching?
>>
>> https://en.wikipedia.org/wiki/Cache_%28computing%29
>>
>> > It seems to me that simple transclusion of pages stored on wikidata
>> would be just as effective, would require no additional code.
>>
>> * This has nothing to do with Wikidata
>> * This has nothing to do with transclusion of pages
>> * This definitely requires code to work
>>
>> Cheers
>>
>> --
>> Jeroen De Dauw
>> http://www.bn2vs.com
>> Don't panic. Don't be evil.
>> --
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> Semediawiki-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
>



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Semediawiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel