CirrusSearch index incompleteness problem

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

CirrusSearch index incompleteness problem

Aran Dunkley
Hello,

I'm managing some mediawiki 1.27.1's running CirrusSearch 0.2 with
Elasticsearch 1.7.5. I been noticing that there are often search results
missing so I started running the forceSearchIndex.php script each night
on a cron job.

But I'm still finding results missing. Today I re-ran the script
manually and then found that one of the missing results showed up and
that the result count for that term had increased from 18 to 23. I ran
the script again and it increased more to 37. I ran more times but the
result count did not increase any more.

The commands I've been doing are:
forceSearchIndex.php --skipLinks --indexOnSkip
forceSearchIndex.php --skipParse

Is this the correct way to do a full index rebuild? is there some
parameter that can ensure that no pages get missed?

Thanks,
Aran


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: CirrusSearch index incompleteness problem

Erik Bernhardson
On Thu, Jan 26, 2017 at 11:30 AM, Aran <[hidden email]> wrote:

> Hello,
>
> I'm managing some mediawiki 1.27.1's running CirrusSearch 0.2 with
> Elasticsearch 1.7.5. I been noticing that there are often search results
> missing so I started running the forceSearchIndex.php script each night
> on a cron job.
>
> But I'm still finding results missing. Today I re-ran the script
> manually and then found that one of the missing results showed up and
> that the result count for that term had increased from 18 to 23. I ran
> the script again and it increased more to 37. I ran more times but the
> result count did not increase any more.
>
> The commands I've been doing are:
> forceSearchIndex.php --skipLinks --indexOnSkip
> forceSearchIndex.php --skipParse
>
> Is this the correct way to do a full index rebuild? is there some
> parameter that can ensure that no pages get missed?
>
>
This is the correct way to rebuild documents in place. It sounds like
something is running into errors while building documents though. Could you
check your logs for errors related to CirrusSearch?


> Thanks,
> Aran
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: CirrusSearch index incompleteness problem

Aran Dunkley
I can't see any errors in the debug log... one page had five occurrences
of a search term and only showed up after rebuilding the index many
times. I kept running it because I knew the page should be showing up in
the results and eventually it got included.


On 26/01/17 17:36, Erik Bernhardson wrote:

> On Thu, Jan 26, 2017 at 11:30 AM, Aran <[hidden email]> wrote:
>
>> Hello,
>>
>> I'm managing some mediawiki 1.27.1's running CirrusSearch 0.2 with
>> Elasticsearch 1.7.5. I been noticing that there are often search results
>> missing so I started running the forceSearchIndex.php script each night
>> on a cron job.
>>
>> But I'm still finding results missing. Today I re-ran the script
>> manually and then found that one of the missing results showed up and
>> that the result count for that term had increased from 18 to 23. I ran
>> the script again and it increased more to 37. I ran more times but the
>> result count did not increase any more.
>>
>> The commands I've been doing are:
>> forceSearchIndex.php --skipLinks --indexOnSkip
>> forceSearchIndex.php --skipParse
>>
>> Is this the correct way to do a full index rebuild? is there some
>> parameter that can ensure that no pages get missed?
>>
>>
> This is the correct way to rebuild documents in place. It sounds like
> something is running into errors while building documents though. Could you
> check your logs for errors related to CirrusSearch?
>
>
>> Thanks,
>> Aran
>>
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: CirrusSearch index incompleteness problem

Aran Dunkley
In reply to this post by Erik Bernhardson
This has finally been solved!

The problem was a parser-function that relied on the $wgTitle global
which was not available in the context of a command-line index update.
Now it's using $parser->getTitle() instead of relying on a global and
all results are present :-)

Thanks for your help,
Aran

On 26/01/17 17:36, Erik Bernhardson wrote:

> On Thu, Jan 26, 2017 at 11:30 AM, Aran <[hidden email]> wrote:
>
>> Hello,
>>
>> I'm managing some mediawiki 1.27.1's running CirrusSearch 0.2 with
>> Elasticsearch 1.7.5. I been noticing that there are often search results
>> missing so I started running the forceSearchIndex.php script each night
>> on a cron job.
>>
>> But I'm still finding results missing. Today I re-ran the script
>> manually and then found that one of the missing results showed up and
>> that the result count for that term had increased from 18 to 23. I ran
>> the script again and it increased more to 37. I ran more times but the
>> result count did not increase any more.
>>
>> The commands I've been doing are:
>> forceSearchIndex.php --skipLinks --indexOnSkip
>> forceSearchIndex.php --skipParse
>>
>> Is this the correct way to do a full index rebuild? is there some
>> parameter that can ensure that no pages get missed?
>>
>>
> This is the correct way to rebuild documents in place. It sounds like
> something is running into errors while building documents though. Could you
> check your logs for errors related to CirrusSearch?
>
>
>> Thanks,
>> Aran
>>
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l