New Beta Feature: completion suggester

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

New Beta Feature: completion suggester

Dan Garry
Hey all,

In the continued quest to make the search bar a better tool, the Wikimedia
Foundation's Discovery Department
<https://www.mediawiki.org/wiki/Wikimedia_Discovery> has put a completion
suggester into Beta Features. The tool functions with search-as-you-type,
with a small tolerance for typos and spacing in finding results. Possible
matches are then displayed as you type in a drop down menu, hopefully
eliminating the need to perform a fulltext search with landing page and
all. You can read more details at mediawiki.org
<https://www.mediawiki.org/wiki/Extension:CirrusSearch/CompletionSuggester>
and use the talk page for now for feedback.

The tool is now available and will only be enabled for the article
namespace for now, and will progress into full production at some point
hopefully in early 2016, depending on feedback. It's going to be important
to get feedback from regular contributors who use search to make sure that
any of the basic feature requests for searching the main space can at least
be addressed while in Beta Features.

Thanks!

Dan

--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: New Beta Feature: completion suggester

Sage Ross
If I'm, say, building a web app that could benefit from that kind of
search suggestion tool, is there an API I can use?

-Sage

On Thu, Dec 17, 2015 at 5:09 PM, Dan Garry <[hidden email]> wrote:

> Hey all,
>
> In the continued quest to make the search bar a better tool, the Wikimedia
> Foundation's Discovery Department
> <https://www.mediawiki.org/wiki/Wikimedia_Discovery> has put a completion
> suggester into Beta Features. The tool functions with search-as-you-type,
> with a small tolerance for typos and spacing in finding results. Possible
> matches are then displayed as you type in a drop down menu, hopefully
> eliminating the need to perform a fulltext search with landing page and
> all. You can read more details at mediawiki.org
> <https://www.mediawiki.org/wiki/Extension:CirrusSearch/CompletionSuggester>
> and use the talk page for now for feedback.
>
> The tool is now available and will only be enabled for the article
> namespace for now, and will progress into full production at some point
> hopefully in early 2016, depending on feedback. It's going to be important
> to get feedback from regular contributors who use search to make sure that
> any of the basic feature requests for searching the main space can at least
> be addressed while in Beta Features.
>
> Thanks!
>
> Dan
>
> --
> Dan Garry
> Lead Product Manager, Discovery
> Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: New Beta Feature: completion suggester

John Erling Blad
I tried this on a search for "Sør-Aurdal" (a municipality in Norway),
dropped the dash and wrote "sørau" and got a hit on "Søraust-Svalbard
naturreservat" among other things. The topmost hit was "søraurdøl", which
is a denomyn for someone from Sør-Aurdal. It seems to me that a spelling
error is compensated with a fuzzy search for long(est?) words, but that
imply nearly completing the word if there is a spelling error.

What if the topmost entry in the list had a less aggressive fuzzy search,
and used shorter words? I tried several other searches, and somehow "sørau"
seems to be difficult. All searches was on nowiki.

I'm a bit impressed... :D

On Sun, Dec 20, 2015 at 9:55 PM, Sage Ross <[hidden email]>
wrote:

> If I'm, say, building a web app that could benefit from that kind of
> search suggestion tool, is there an API I can use?
>
> -Sage
>
> On Thu, Dec 17, 2015 at 5:09 PM, Dan Garry <[hidden email]> wrote:
> > Hey all,
> >
> > In the continued quest to make the search bar a better tool, the
> Wikimedia
> > Foundation's Discovery Department
> > <https://www.mediawiki.org/wiki/Wikimedia_Discovery> has put a
> completion
> > suggester into Beta Features. The tool functions with search-as-you-type,
> > with a small tolerance for typos and spacing in finding results. Possible
> > matches are then displayed as you type in a drop down menu, hopefully
> > eliminating the need to perform a fulltext search with landing page and
> > all. You can read more details at mediawiki.org
> > <
> https://www.mediawiki.org/wiki/Extension:CirrusSearch/CompletionSuggester>
> > and use the talk page for now for feedback.
> >
> > The tool is now available and will only be enabled for the article
> > namespace for now, and will progress into full production at some point
> > hopefully in early 2016, depending on feedback. It's going to be
> important
> > to get feedback from regular contributors who use search to make sure
> that
> > any of the basic feature requests for searching the main space can at
> least
> > be addressed while in Beta Features.
> >
> > Thanks!
> >
> > Dan
> >
> > --
> > Dan Garry
> > Lead Product Manager, Discovery
> > Wikimedia Foundation
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: New Beta Feature: completion suggester

David Causse
In reply to this post by Sage Ross
Le 20/12/2015 21:55, Sage Ross a écrit :
> If I'm, say, building a web app that could benefit from that kind of
> search suggestion tool, is there an API I can use?

The API endpoing is action=cirrus-suggest[1] and accepts 2 parameters:
text for the user input and limit (5 by default).

Example :
/w/api.php?action=cirrus-suggest&format=json&text=albert%20einstein&limit=5

Note that this API is highly experimental and is subject to change. I'd
suggest to use it only for evaluation purpose at this point. We may
provide a better integration in the mediawiki API ecosystem (i.e.
generators[2]) in the coming weeks.

[1]
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=cirrus-suggest&titles=Main%20Page&prop=revisions&rvprop=content&format=jsonfm
[2] https://www.mediawiki.org/wiki/API:Query#Generators

David

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: New Beta Feature: completion suggester

David Causse
In reply to this post by John Erling Blad
Le 20/12/2015 22:19, John Erling Blad a écrit :
> I tried this on a search for "Sør-Aurdal" (a municipality in Norway),
> dropped the dash and wrote "sørau" and got a hit on "Søraust-Svalbard
> naturreservat" among other things. The topmost hit was "søraurdøl", which
> is a denomyn for someone from Sør-Aurdal. It seems to me that a spelling
> error is compensated with a fuzzy search for long(est?) words, but that
> imply nearly completing the word if there is a spelling error.

Thank you, this is exactly the kind of feedback we were looking for when
we deployed this feature as a beta feature.

In this case the first thing to note is that "søraurdøl" [1] is a
redirect to "Sør-Aurdal" [2]. The completion suggester won't display
multiple suggestions that have the same target page. Here it will
receive internally both "søraurdøl" and "Sør-Aurdal" but because these
pages are related to "Sør-Aurdal" it will have to decide which one to
display and will choose "søraurdøl" because the query "sørau" is a
perfect prefix hit.
You can see when the algorithm will prefer "Sør-Aurdal" by continuing
typing :
"søraud" => "søraurdøl" (still a perfect prefix)
"sørauda" => "Sør-Aurdal" (here both are not perfect prefix and thus
will decide to display the canonical page "Sør-Aurdal")

There are many knobs we could adjust to display better suggestions. Here
I can see two of them:

1. At index time the suggester will group redirects that are very
similar to the canonical title:
On enwiki the redirect "Albert Enstein" is grouped with its canonical
page "Albert Einstein", "Albert Enstein" will never be proposed to the
suggester and thus won't have to choose between "Albert Enstein" and
"Albert Einstein". It will always display "Albert Einstein". This
technique allows us to display proper suggestions even if the user types
something very far like "alberensten". Here the suggester can take
benefits from popular pages that have been manually curated by editors
with common typos.
Unfortunately such arbitrary decisions have also drawbacks, a counter
example is "life a", on enwiki this query will suggest "Life insurance"
instead of "life assurance" because the redirect "Life assurance" has
been wrongly grouped with "Life insurance". This is not completely
wrong, both suggestions will lead to the same page, but it's not perfect...
So we could fix the "sørau" problem by increasing the tolerance of this
"grouping step" but unfortunately we will increase the number of cases
like "life assurance".

2. Change the decision at query time
We could also change the decision and always prefer canonical pages vs
redirects even if the canonical page is not a perfect prefix hit. I'm
not aware of a counter example here but since our ranking algorithm is
far from perfect we preferred to choose perfect prefix hits for now.
In the coming months we should be able to include pageviews statistics
in the formula, we hope to see positive improvements with such metrics
and will hopefully allow us to review this decision.

As you can see the suggester will make arbitrary decisions (sometimes
hazardous) that could be wrong and this is the whole purpose of having
this feature in beta. Depending on feedback like yours we may review and
adjust various parameters in the algorithm.

Thank you!

David.

[1] (Omdirigert fra Søraurdøl):
https://no.wikipedia.org/w/index.php?title=S%C3%B8raurd%C3%B8l&redirect=no
[2]
https://no.wikipedia.org/w/api.php?action=query&list=backlinks&bltitle=S%C3%B8r-Aurdal&blfilterredir=redirects
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: New Beta Feature: completion suggester

Brad Jorsch (Anomie)
In reply to this post by David Causse
On Mon, Dec 21, 2015 at 4:48 AM, David Causse <[hidden email]> wrote:

> Le 20/12/2015 21:55, Sage Ross a écrit :
>
>> If I'm, say, building a web app that could benefit from that kind of
>> search suggestion tool, is there an API I can use?
>>
>
> The API endpoing is action=cirrus-suggest[1] and accepts 2 parameters:
> text for the user input and limit (5 by default).
>
> Example :
> /w/api.php?action=cirrus-suggest&format=json&text=albert%20einstein&limit=5
>
> Note that this API is highly experimental and is subject to change.


You should have implemented isInternal() to return true in your module, so
the auto-generated documentation would properly reflect that status.


> I'd suggest to use it only for evaluation purpose at this point. We may
> provide a better integration in the mediawiki API ecosystem (i.e.
> generators[2]) in the coming weeks.
>

Does your plan for "better integration" include making it the backend for
action=opensearch when CirrusSearch is installed? That would allow
browsers' search bars to benefit too.

I'd recommend against a non-beta CirrusSearch module for suggestions,
versus something in core that Cirrus provides the backend for. That
something is probably the existing list=prefixsearch.[1]


 [1]: Which, despite the name,[2] doesn't really correspond to
Special:PrefixIndex. That would be list=allpages with apprefix.
 [2]: We may want to look into the increasingly inaccurate name of that
module at some point, but I wouldn't block Cirrus's work on doing anything
more than just updating the apihelp-query+prefixsearch-description message.

--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: New Beta Feature: completion suggester

David Causse
Le 21/12/2015 16:12, Brad Jorsch (Anomie) a écrit :
>
> You should have implemented isInternal() to return true in your module, so
> the auto-generated documentation would properly reflect that status.

I'll fix it, thanks for the advice.

>> I'd suggest to use it only for evaluation purpose at this point. We may
>> provide a better integration in the mediawiki API ecosystem (i.e.
>> generators[2]) in the coming weeks.
>>
> Does your plan for "better integration" include making it the backend for
> action=opensearch when CirrusSearch is installed? That would allow
> browsers' search bars to benefit too.

It was the initial plan but for simplicity reasons I preferred to bind
the MW js API searchSuggest to the cirrus-suggest internal API.
If the completion suggester is proven successful and useful it will be a
nice candidate for TitlePrefixSearch replacement in opensearch.

> I'd recommend against a non-beta CirrusSearch module for suggestions,
> versus something in core that Cirrus provides the backend for. That
> something is probably the existing list=prefixsearch.[1]

I agree. On this point I will follow any recommendations from API
maintainers, my knowledge of the current API ecosystem is too limited to
make any good decision here.

Thanks!

David.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l