[Wikimedia-l] any open search engine for web project starting

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Wikimedia-l] any open search engine for web project starting

carl hansen
https://about.commonsearch.org/

"We are building a nonprofit search engine for the Web"

Sounds alot like Knowledge Engine, if there were such a thing.
Any overlap with wikimedia projects?
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] any open search engine for web project starting

SarahSV
On Fri, Mar 18, 2016 at 5:17 PM, carl hansen <[hidden email]>
wrote:

> https://about.commonsearch.org/
>
> "We are building a nonprofit search engine for the Web"
>
> Sounds alot like Knowledge Engine, if there were such a thing.
> Any overlap with wikimedia projects?
>

​Thanks for the link, Carl. Erik and Lydia are advisors, so perhaps they
could say a bit more about it. ​

https://about.commonsearch.org/people

​Sarah​
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] any open search engine for web project starting

Erik Moeller-3
2016-03-18 21:44 GMT-07:00 SarahSV <[hidden email]>:
> On Fri, Mar 18, 2016 at 5:17 PM, carl hansen <[hidden email]>
> wrote:
>
>> https://about.commonsearch.org/
>>
>> "We are building a nonprofit search engine for the Web"
>>
>> Sounds alot like Knowledge Engine, if there were such a thing.
>> Any overlap with wikimedia projects?

> Thanks for the link, Carl. Erik and Lydia are advisors, so perhaps they
> could say a bit more about it.

Sylvain has been working on this stuff for a while, blissfully
ignorant of Wikimedia's discussions of search engines, rocketships and
so on. He reached out to me shortly before the public announcement and
we've talked a bit about governance, community & funding models. I've
agreed to provide some continued advice along the way but have not
otherwise been involved.

He recently posted on wikitech-l asking for suggestions how
Wikipedia/Wikidata could be integrated:
https://lists.wikimedia.org/pipermail/wikitech-l/2016-March/084984.html

There's a lot of heavy lifting still until Common Search can become a
viable project even for narrowly defined purposes but I think it's a
very worthwhile effort. It also is -- I think correctly -- based on
the largest pre-existing open effort to index the web, the Common
Crawl. This could lead to a mutually beneficial relationship between
Common Search and Common Crawl. From a Wikimedia perspective, it might
develop into an opportunity to jointly showcase some of the amazing
stuff that Wikidata can already do.

Erik

_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] any open search engine for web project starting

Dan Garry
A few of us from Discovery (myself, Tomasz Finc, Erik Bernhardson, and some
others too) had the opportunity to meet Sylvain recently when he was in San
Francisco. We chatted about touch points between Discovery and Common
Search.

One important thing I personally learnt from chatting to Sylvain is that
the challenges are *very* different for building an in-site search (like
Discovery) and building a web search (like Common Search). Data that's
critically important for one may be close to irrelevant for the other.
Scaling issues for one may not even exist for the other. We did identify a
few areas where Common Search may be creating datasets that would be useful
for us in Discovery; Sylvain said he'd be in touch with us if that happens.

We're going to keep in touch and see if we can help each other out in the
future.

Thanks,
Dan

On 18 March 2016 at 22:15, Erik Moeller <[hidden email]> wrote:

> 2016-03-18 21:44 GMT-07:00 SarahSV <[hidden email]>:
> > On Fri, Mar 18, 2016 at 5:17 PM, carl hansen <[hidden email]>
> > wrote:
> >
> >> https://about.commonsearch.org/
> >>
> >> "We are building a nonprofit search engine for the Web"
> >>
> >> Sounds alot like Knowledge Engine, if there were such a thing.
> >> Any overlap with wikimedia projects?
>
> > Thanks for the link, Carl. Erik and Lydia are advisors, so perhaps they
> > could say a bit more about it.
>
> Sylvain has been working on this stuff for a while, blissfully
> ignorant of Wikimedia's discussions of search engines, rocketships and
> so on. He reached out to me shortly before the public announcement and
> we've talked a bit about governance, community & funding models. I've
> agreed to provide some continued advice along the way but have not
> otherwise been involved.
>
> He recently posted on wikitech-l asking for suggestions how
> Wikipedia/Wikidata could be integrated:
> https://lists.wikimedia.org/pipermail/wikitech-l/2016-March/084984.html
>
> There's a lot of heavy lifting still until Common Search can become a
> viable project even for narrowly defined purposes but I think it's a
> very worthwhile effort. It also is -- I think correctly -- based on
> the largest pre-existing open effort to index the web, the Common
> Crawl. This could lead to a mutually beneficial relationship between
> Common Search and Common Crawl. From a Wikimedia perspective, it might
> develop into an opportunity to jointly showcase some of the amazing
> stuff that Wikidata can already do.
>
> Erik
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:[hidden email]?subject=unsubscribe>
>



--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] any open search engine for web project starting

Andreas Kolbe-2
Dan,

I understand you are currently only working on internal search,
representing stage 1 of this project. But does the long-term vision of the
subsequent stages still include things like –

1. incorporation of non-Wikimedia sources in search results,
2. an open source knowledge engine like IBM's Watson (i.e. an answer engine
based on structured data),
3. an open source search engine,
4. public curation of relevance (i.e. volunteer-based search results
ranking)?

1 and 4 remain mentioned in the Discovery FAQ[1] on MediaWiki; 1, 2 and 3
have been mentioned in recent on-wiki discussions by Jimmy Wales.[2] In
fact, I see little in the Knowledge Engine grant agreement that is
incompatible with the FAQ and those recent discussions.

From a fundraising point of view, I could fully understand why the WMF
might consider it a desirable long-term goal to turn wikipedia.org into a
high-traffic search and answer engine. I am not sure how successful such an
attempt would be – internet users have a marked preference for one-stop
shops, and it would take some really nifty features to entice users away
from the established search and answer engines – but I do understand why
the idea would be attractive.

Andreas

[1] https://www.mediawiki.org/wiki/Wikimedia_Discovery/FAQ
[2] https://en.wikipedia.org/wiki/User_talk:Jimbo_Wales

On Tue, Mar 29, 2016 at 12:14 AM, Dan Garry <[hidden email]> wrote:

> A few of us from Discovery (myself, Tomasz Finc, Erik Bernhardson, and some
> others too) had the opportunity to meet Sylvain recently when he was in San
> Francisco. We chatted about touch points between Discovery and Common
> Search.
>
> One important thing I personally learnt from chatting to Sylvain is that
> the challenges are *very* different for building an in-site search (like
> Discovery) and building a web search (like Common Search). Data that's
> critically important for one may be close to irrelevant for the other.
> Scaling issues for one may not even exist for the other. We did identify a
> few areas where Common Search may be creating datasets that would be useful
> for us in Discovery; Sylvain said he'd be in touch with us if that happens.
>
> We're going to keep in touch and see if we can help each other out in the
> future.
>
> Thanks,
> Dan
>
> On 18 March 2016 at 22:15, Erik Moeller <[hidden email]> wrote:
>
> > 2016-03-18 21:44 GMT-07:00 SarahSV <[hidden email]>:
> > > On Fri, Mar 18, 2016 at 5:17 PM, carl hansen <[hidden email]
> >
> > > wrote:
> > >
> > >> https://about.commonsearch.org/
> > >>
> > >> "We are building a nonprofit search engine for the Web"
> > >>
> > >> Sounds alot like Knowledge Engine, if there were such a thing.
> > >> Any overlap with wikimedia projects?
> >
> > > Thanks for the link, Carl. Erik and Lydia are advisors, so perhaps they
> > > could say a bit more about it.
> >
> > Sylvain has been working on this stuff for a while, blissfully
> > ignorant of Wikimedia's discussions of search engines, rocketships and
> > so on. He reached out to me shortly before the public announcement and
> > we've talked a bit about governance, community & funding models. I've
> > agreed to provide some continued advice along the way but have not
> > otherwise been involved.
> >
> > He recently posted on wikitech-l asking for suggestions how
> > Wikipedia/Wikidata could be integrated:
> > https://lists.wikimedia.org/pipermail/wikitech-l/2016-March/084984.html
> >
> > There's a lot of heavy lifting still until Common Search can become a
> > viable project even for narrowly defined purposes but I think it's a
> > very worthwhile effort. It also is -- I think correctly -- based on
> > the largest pre-existing open effort to index the web, the Common
> > Crawl. This could lead to a mutually beneficial relationship between
> > Common Search and Common Crawl. From a Wikimedia perspective, it might
> > develop into an opportunity to jointly showcase some of the amazing
> > stuff that Wikidata can already do.
> >
> > Erik
> >
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > New messages to: [hidden email]
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:[hidden email]?subject=unsubscribe>
> >
>
>
>
> --
> Dan Garry
> Lead Product Manager, Discovery
> Wikimedia Foundation
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:[hidden email]?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] any open search engine for web project starting

Dan Garry
On 28 March 2016 at 17:48, Andreas Kolbe <[hidden email]> wrote:
>
> I understand you are currently only working on internal search,
> representing stage 1 of this project. But does the long-term vision of the
> subsequent stages still include things like –
>
> 1. incorporation of non-Wikimedia sources in search results,
>

Potentially. However, it's a vague idea, and we've got a long way to go
before this could be seriously considered, so we're not actively working on
it. Right now we have a lot of information even within Wikimedia sites
which is surfaced poorly; surfacing such information is an important part
of Discovery's narrative for FY2016-17
<https://www.mediawiki.org/wiki/Wikimedia_Discovery/FDC_Proposal> (July
2016 - June 2017). I intend for Discovery to work on improving that problem
first.


> 2. an open source knowledge engine like IBM's Watson (i.e. an answer engine
> based on structured data),
>

https://askplatyp.us/ does a pretty good job of this already, and it's
backed by Wikidata. If you want to learn more, you can read the blog post
on wikimedia.de
<https://blog.wikimedia.de/2015/02/23/platypus-a-speaking-interface-for-wikidata/>
and
check out the website of the creators <https://projetpp.github.io/>.

In the long term, I could see something like this being incorporated into
search on our sites if it's good enough. Like the above, it's also a long
way off, and we're not actively working on these efforts.


> 3. an open source search engine,
>

Clearly yes, because we're actively building a search engine for Wikipedia
and our work is open source. If you actually mean "a general purpose web
search engine", then this question is already in the FAQ
<https://www.mediawiki.org/wiki/Wikimedia_Discovery/FAQ#Are_you_building_a_search_engine_to_compete_with_Google.3F>
which you referenced, and the answer is no. I presently don't see how
Discovery could offer something worthwhile to users here, especially with
projects like Common Search already working on the problem.


> 4. public curation of relevance (i.e. volunteer-based search results
> ranking)?


Yes, if users are interested. This is an incredibly early idea that is not
fully fleshed out; we don't know how we would achieve something like this
right now. A naïve example of how we could do something like this is by
boosting the score of certain search results based on the presence of
templates on the page. The reality would likely be something significantly
more complex than this.

So, in short, many things are potentially on the table, but they're early
ideas which are not actively being explored, and in exploring them we may
decide not to do them. Sorry if that's not definitive enough of a
statement, but roadmaps are intentionally not set in stone so as to be
flexible and iterative.

Dan

--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] any open search engine for web project starting

Dan Garry
On 29 March 2016 at 13:36, Dan Garry <[hidden email]> wrote:
>
> Yes, if users are interested. This is an incredibly early idea that is not
> fully fleshed out; we don't know how we would achieve something like this
> right now. A naïve example of how we could do something like this is by
> boosting the score of certain search results based on the presence of
> templates on the page. The reality would likely be something significantly
> more complex than this.
>

A detail I forgot to mention here is that this example is not hypothetical.
I was referring to boost-templates
<https://www.mediawiki.org/wiki/Help:CirrusSearch#boost-templates:>, which
one can already use in CirrusSearch. This work predates the Discovery
Department. So, in that sense, public curation of relevance already exists
in some sense. My point was that building on this is not something we're
working on right now.

Dan

--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>