Re: [Wikidata] Scaling Wikidata Query Service

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [Wikidata] Scaling Wikidata Query Service

Wikipedia Developers mailing list
Thanks, Guillaume - this is very helpful, and it would be great to
have similar information posted/ collected on other kinds of limits
and potential approaches to addressing them.

Some weeks ago, we started a project to keep track of tsuch limits,
and I have added pointers to your information there:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Limits_of_Wikidata .

If anyone is aware of similar discussions for any of the other limits,
please edit that page to include pointers to those discussions.

Thanks!

Daniel

On Thu, Jun 6, 2019 at 9:33 PM Guillaume Lederrey
<[hidden email]> wrote:

>
> Hello all!
>
> There has been a number of concerns raised about the performance and
> scaling of Wikdata Query Service. We share those concerns and we are
> doing our best to address them. Here is some info about what is going
> on:
>
> In an ideal world, WDQS should:
>
> * scale in terms of data size
> * scale in terms of number of edits
> * have low update latency
> * expose a SPARQL endpoint for queries
> * allow anyone to run any queries on the public WDQS endpoint
> * provide great query performance
> * provide a high level of availability
>
> Scaling graph databases is a "known hard problem", and we are reaching
> a scale where there are no obvious easy solutions to address all the
> above constraints. At this point, just "throwing hardware at the
> problem" is not an option anymore. We need to go deeper into the
> details and potentially make major changes to the current architecture.
> Some scaling considerations are discussed in [1]. This is going to take
> time.
>
> Reasonably, addressing all of the above constraints is unlikely to
> ever happen. Some of the constraints are non negotiable: if we can't
> keep up with Wikidata in term of data size or number of edits, it does
> not make sense to address query performance. On some constraints, we
> will probably need to compromise.
>
> For example, the update process is asynchronous. It is by nature
> expected to lag. In the best case, this lag is measured in minutes,
> but can climb to hours occasionally. This is a case of prioritizing
> stability and correctness (ingesting all edits) over update latency.
> And while we can work to reduce the maximum latency, this will still
> be an asynchronous process and needs to be considered as such.
>
> We currently have one Blazegraph expert working with us to address a
> number of performance and stability issues. We
> are planning to hire an additional engineer to help us support the
> service in the long term. You can follow our current work in phabricator [2].
>
> If anyone has experience with scaling large graph databases, please
> reach out to us, we're always happy to share ideas!
>
> Thanks all for your patience!
>
>    Guillaume
>
> [1] https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy
> [2] https://phabricator.wikimedia.org/project/view/1239/
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> _______________________________________________
> Wikidata mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l