Advice on Tuning Search?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Advice on Tuning Search?

Hogan (US), Michael C
Can anyone point me to a starting point for learning about how to tune CirrusSearch (or examples)? I found the CirrusSearchScoreBuilder page [1], which implies it is possible to modify how search results are ranked. But, the documentation page hasn't been created yet. Thank you!

[1]: https://www.mediawiki.org/wiki/Manual:Hooks/CirrusSearchScoreBuilder
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Advice on Tuning Search?

David Causse
On Fri, Nov 2, 2018 at 3:51 AM Hogan (US), Michael C <
[hidden email]> wrote:

> Can anyone point me to a starting point for learning about how to tune
> CirrusSearch (or examples)? I found the CirrusSearchScoreBuilder page [1],
> which implies it is possible to modify how search results are ranked. But,
> the documentation page hasn't been created yet. Thank you!
>

Hi,

there are many ways to tune the ranking of search results.
The hook you mention is designed to be used by extensions that want to tune
everything related to the search query itself. I strongly discourage to use
it, it is highly experimental and will be removed in the future.

To understand how cirrus scores docs I suggest to start with this
documentation [2].
You can then tune the retrieval query using profiles and the
wgCirrusSearchFullTextQueryBuilderProfiles config array:
E.g.
$wgCirrusSearchFullTextQueryBuilderProfiles => [
    'my_custom_profile' => [
                'builder_class' =>
\CirrusSearch\Query\FullTextSimpleMatchQueryBuilder::class,
                'settings' => [
                        'default_min_should_match' => '1',
                        'default_query_type' => 'most_fields',
                        'default_stem_weight' => 3.0,
                        'fields' => [
                                'title' => 0.3,
                                'redirect.title' => [
                                        'boost' => 0.27,
                                        'in_dismax' =>
'redirects_or_shingles'
                                ],
                                'suggest' => [
                                        'is_plain' => true,
                                        'boost' => 0.20,
                                        'in_dismax' =>
'redirects_or_shingles',
                                ],
                                'category' => 0.05,
                                'heading' => 0.05,
                                'text' => [
                                        'boost' => 0.6,
                                        'in_dismax' =>
'text_and_opening_text',
                                ],
                                'opening_text' => [
                                        'boost' => 0.5,
                                        'in_dismax' =>
'text_and_opening_text',
                                ],
                                'auxiliary_text' => 0.05,
                                'file_text' => 0.5,
                        ],
                        'phrase_rescore_fields' => [
                                'all' => 0.06,
                                'all.plain' => 0.1,
                        ],
                ],
        ],
];

And then activate it by default:
$wgCirrusSearchFullTextQueryBuilderProfile = "perfield_builder";

Please see [3] for more doc on the various settings.

To tune the query independent signals (the rescoring part in the doc), this
is similar as you declare a profile and activate it by default.
The config var to add a new profile is $wgCirrusSearchRescoreProfiles and
you can add more by following these examples [4].
The config var to change the default rescore profile is
$wgCirrusSearchRescoreProfile.
Rescore profiles internally use "rescore function chains" which can be
tuned as well using $wgCirrusSearchRescoreFunctionChains [5].

I'm sorry if this is bit dense and for the lack of comprehensive
documentation. I suggest having a look at the elasticsearch documentation
as well as many concepts here are related to elasticsearch features
(dismax, rescoring, function score, ...).
We have also some integration with the LTR plugin [6].

Please let me know if you have specific questions or specific problems I
could help going into a specific direction instead of digesting all of this.

Thank you.

[2] https://www.mediawiki.org/wiki/Extension:CirrusSearch/Scoring
[3]
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/master/profiles/FullTextQueryBuilderProfiles.config.php#39
[4]
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/master/profiles/RescoreProfiles.config.php
[5]
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/master/profiles/RescoreFunctionChains.config.php
[6] https://github.com/o19s/elasticsearch-learning-to-rank
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Advice on Tuning Search?

Pine W
In reply to this post by Hogan (US), Michael C
Hi Michael,

If you're interested in subjects regarding search then I suggest that you
subscribe to the Discovery mailing list. Your question would be a great fit
for that list (not that it's bad to post it to Wikitech-l). See
https://lists.wikimedia.org/mailman/listinfo/discovery.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Fri, Nov 2, 2018 at 2:51 AM Hogan (US), Michael C <
[hidden email]> wrote:

> Can anyone point me to a starting point for learning about how to tune
> CirrusSearch (or examples)? I found the CirrusSearchScoreBuilder page [1],
> which implies it is possible to modify how search results are ranked. But,
> the documentation page hasn't been created yet. Thank you!
>
> [1]: https://www.mediawiki.org/wiki/Manual:Hooks/CirrusSearchScoreBuilder
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Advice on Tuning Search?

Erika Bjune
Also, just FYI, the Search Platform team has started holding regular office
hours on the first Wednesday of every month.  Details for our next meeting
were just sent out a couple of days ago:

Date: Wednesday, November 7th, 2018
Time: 16:00 GMT / 08:00 PST / 11:00 EST / 17:00 CET
Google Meet link: https://meet.google.com/vyc-jvgq-dww

Our team will be glad to help you with specific questions in person :)

Cheers,
Erika
------------------------------------
Erika Bjune
Director of Engineering - Search Platform & Fundraising Tech
Wikimedia Foundation


On Fri, Nov 2, 2018 at 1:19 PM Pine W <[hidden email]> wrote:

> Hi Michael,
>
> If you're interested in subjects regarding search then I suggest that you
> subscribe to the Discovery mailing list. Your question would be a great fit
> for that list (not that it's bad to post it to Wikitech-l). See
> https://lists.wikimedia.org/mailman/listinfo/discovery.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> On Fri, Nov 2, 2018 at 2:51 AM Hogan (US), Michael C <
> [hidden email]> wrote:
>
> > Can anyone point me to a starting point for learning about how to tune
> > CirrusSearch (or examples)? I found the CirrusSearchScoreBuilder page
> [1],
> > which implies it is possible to modify how search results are ranked.
> But,
> > the documentation page hasn't been created yet. Thank you!
> >
> > [1]:
> https://www.mediawiki.org/wiki/Manual:Hooks/CirrusSearchScoreBuilder
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Advice on Tuning Search?

Trey Jones
We have an Etherpad agenda that I just created, too:
https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours

—Trey

Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation


On Fri, Nov 2, 2018 at 5:03 PM Erika Bjune <[hidden email]> wrote:

> Also, just FYI, the Search Platform team has started holding regular office
> hours on the first Wednesday of every month.  Details for our next meeting
> were just sent out a couple of days ago:
>
> Date: Wednesday, November 7th, 2018
> Time: 16:00 GMT / 08:00 PST / 11:00 EST / 17:00 CET
> Google Meet link: https://meet.google.com/vyc-jvgq-dww
>
> Our team will be glad to help you with specific questions in person :)
>
> Cheers,
> Erika
> ------------------------------------
> Erika Bjune
> Director of Engineering - Search Platform & Fundraising Tech
> Wikimedia Foundation
>
>
> On Fri, Nov 2, 2018 at 1:19 PM Pine W <[hidden email]> wrote:
>
> > Hi Michael,
> >
> > If you're interested in subjects regarding search then I suggest that you
> > subscribe to the Discovery mailing list. Your question would be a great
> fit
> > for that list (not that it's bad to post it to Wikitech-l). See
> > https://lists.wikimedia.org/mailman/listinfo/discovery.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> > On Fri, Nov 2, 2018 at 2:51 AM Hogan (US), Michael C <
> > [hidden email]> wrote:
> >
> > > Can anyone point me to a starting point for learning about how to tune
> > > CirrusSearch (or examples)? I found the CirrusSearchScoreBuilder page
> > [1],
> > > which implies it is possible to modify how search results are ranked.
> > But,
> > > the documentation page hasn't been created yet. Thank you!
> > >
> > > [1]:
> > https://www.mediawiki.org/wiki/Manual:Hooks/CirrusSearchScoreBuilder
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l