Searching for characters with diacritical marks

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching for characters with diacritical marks

Tim Ware
I'm building a searchable index where many of the listings have  
letters with diacritical marks (eg tilde, umlaut, etc.). How can I  
enter them so that they're seachable as the character *without* the  
diacriticals? So, for instance

Ñato

which is

Nato with a tilde over the "N"

I'd like a search on "Nato" to pull up that Ñato.

Is there an easy way?

Thanks.

Tim
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Fxp
Reply | Threaded
Open this post in threaded view
|

Re: Searching for characters with diacritical marks

Fxp
Hi,

your problem is also interesting for us (frenchies). I digged in the
mysql manual to find what could be done:

http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html

If you search the word "accented" in this page, you will find this remark:

---- Start quote ----
Posted by Jeff Smith on May 27 2004 1:08pm [Delete] [Edit]

Keep in mind that although MATCH() AGAINST() is case-insensitive, it
also is basically **accent-insensitive**. In other words, if you do not
want _mangé_ to match with _mange_ (this example is in French), you have
no choice but to use the BOOLEAN MODE with the double quote operator.
This is the only way that MATCH() AGAINST() will make accent-sensitive
matches.

E.g.:

SELECT * FROM quotes_table WHERE MATCH (quote) AGAINST ('"mangé"' IN
BOOLEAN MODE)

For multiword searches:

SELECT * FROM quotes_table MATCH (quote) AGAINST ('"mangé" "pensé"' IN
BOOLEAN MODE)

SELECT * FROM quotes_table MATCH (quote) AGAINST ('+"mangé" +"pensé"' IN
BOOLEAN MODE)

Although the double quotes are intended to enable phrase searching, just
like any web search engine for example, you can also use them to signify
single words where accents and other diacritics matter.

The only drawback to this method seems to be that the asterisk operator
is mutually exclusive with the double quote. Or I just haven't been able
to combine both effectively.
---end quote ----

Of course, it not a "simple" thing to do. But you could ask someone
(there are plenty of nice guys in this forum) to hack the search
function of the wiki to include this BOOLEAN MODE predicate to meet your
needs.


Hope it helps... or that someone comes up with a simplier solution.

Sincerly

François

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l