Major trouble with Cirrus forceSearchIndex.php script

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Major trouble with Cirrus forceSearchIndex.php script

Wikipedia Developers mailing list
Hi all,

We're having terrible trouble with the Cirrus search maintenance script
for initialising the elastic indexes:
forceSearchIndex.php --skipLinks --indexOnSkip...

It's happening with MW 1.31 .. 1.33, we're using redis job queue and a
single instance of Elastic on the same host (these are low traffic
wikis). Debian 10.2, PHP 7.3.

No matter what parameters we use (--queue or not, different --maxJobs,
or --fromId/--toId, --batchSize etc etc) we're always finding that
hundreds of elastic docs are not being created.

There's nothing about the articles themselves that are preventing it, if
we run the maintenance script on just a single missing one afterwards it
gets created no problem, and also each time this problem happens, there
are many differences in the missing docs.

Please if anyone has heard of this kind of things and could point us in
the right direction here that would be awesome!

Thanks a lot,
Aran


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Major trouble with Cirrus forceSearchIndex.php script

David Causse
On Sat, Nov 16, 2019 at 6:58 PM Aran via Wikitech-l <
[hidden email]> wrote:

> Hi all,
> [...]
> Please if anyone has heard of this kind of things and could point us in
> the right direction here that would be awesome!
>
>
Hi,
no, I've never encountered such random scenario. If inspecting the various
logs (mediawiki and elasticsearch) did not provide any clues I would
suggest adding debug log messages to the DataSender::sendData method
(includes/DataSender.php). This is the last method called from mediawiki
before reaching elasticsearch.
If you find something interesting or something you think is broken please
file a task to http://phabricator.wikimedia.org/ under the tag CirrusSearch.

David.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Major trouble with Cirrus forceSearchIndex.php script

David Causse
On Wed, Nov 27, 2019 at 7:05 PM David Causse <[hidden email]> wrote:

> On Sat, Nov 16, 2019 at 6:58 PM Aran via Wikitech-l <
> [hidden email]> wrote:
>
>> Hi all,
>> [...]
>> Please if anyone has heard of this kind of things and could point us in
>> the right direction here that would be awesome!
>>
>>
> Hi,
> no, I've never encountered such random scenario. If inspecting the various
> logs (mediawiki and elasticsearch) did not provide any clues I would
> suggest adding debug log messages to the DataSender::sendData method
> (includes/DataSender.php). This is the last method called from mediawiki
> before reaching elasticsearch.
> If you find something interesting or something you think is broken please
> file a task to http://phabricator.wikimedia.org/ under the tag
> CirrusSearch.
>


I forgot to mention that we host office hours every first Wednesday of the
month, this might be a good opportunity to discuss this :
Details for our next meeting:

Date: Wednesday, Dec 6th, 2019

Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Major trouble with Cirrus forceSearchIndex.php script

Trey Jones
David copied my typo on the dateā€”just to be clear, our office hours will be
this Wednesday, which is the *4th.*


> I forgot to mention that we host office hours every first Wednesday of the
> month, this might be a good opportunity to discuss this :
>


Details for our next meeting:
>


Date: Wednesday, *Dec 4th,* 2019
> Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET
> Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
> Google Meet link: https://meet.google.com/vyc-jvgq-dww


Apologies to David and Aran.


On Thu, Nov 28, 2019 at 3:47 AM David Causse <[hidden email]> wrote:

> On Wed, Nov 27, 2019 at 7:05 PM David Causse <[hidden email]>
> wrote:
>
> > On Sat, Nov 16, 2019 at 6:58 PM Aran via Wikitech-l <
> > [hidden email]> wrote:
> >
> >> Hi all,
> >> [...]
> >> Please if anyone has heard of this kind of things and could point us in
> >> the right direction here that would be awesome!
> >>
> >>
> > Hi,
> > no, I've never encountered such random scenario. If inspecting the
> various
> > logs (mediawiki and elasticsearch) did not provide any clues I would
> > suggest adding debug log messages to the DataSender::sendData method
> > (includes/DataSender.php). This is the last method called from mediawiki
> > before reaching elasticsearch.
> > If you find something interesting or something you think is broken please
> > file a task to http://phabricator.wikimedia.org/ under the tag
> > CirrusSearch.
> >
>
>
> I forgot to mention that we host office hours every first Wednesday of the
> month, this might be a good opportunity to discuss this :
> Details for our next meeting:
>
> Date: Wednesday, Dec 6th, 2019
>
> Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET
> Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
> Google Meet link: https://meet.google.com/vyc-jvgq-dww
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l