Search and accents

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Search and accents

Lars Aronsson
This is a suggestion to change search, so it ignores
postfix accents.

Russian dictionaries (including Wiktionary) use accents to
indicate stress on syllables, but these accents are never
seen in plain text.

In Russian Wiktionary, the verb бороться has the
inflected form боритесь (imperative, plural),
which does not have an entry of its own, but
appears in a fact box (table) of inflected forms.
However, since this is a dictionary, the word in
the box is written with an accent: бори́тесь
https://ru.wiktionary.org/wiki/бороться

(I do realize that it would be possible to add
redirect entries for all such inflected forms,
but this has not been done in ru.wiktionary.)

Searching for бори́тесь (which nobody would do)
finds the relevant page,
https://ru.wiktionary.org/w/index.php?search=бори́тесь

but searching for боритесь (the normal thing)
does not find the relevant page,
https://ru.wiktionary.org/w/index.php?search=боритесь

Note that Unicode doesn't contain accented versions
of Cyrillic letters. Instead, the accent is made
by suffixing a separate accent sign.

$ echo "и" | od -c
0000000 320 270  \n

$ echo "и́" | od -c
0000000 320 270 314 201  \n


--
   Lars Aronsson ([hidden email])
   Aronsson Datateknik - http://aronsson.se



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Search and accents

Nikolas Everett
On Fri, Jun 12, 2015 at 5:33 PM, Lars Aronsson <[hidden email]> wrote:

> This is a suggestion to change search, so it ignores
> postfix accents.
>
> Russian dictionaries (including Wiktionary) use accents to
> indicate stress on syllables, but these accents are never
> seen in plain text.
>
> In Russian Wiktionary, the verb бороться has the
> inflected form боритесь (imperative, plural),
> which does not have an entry of its own, but
> appears in a fact box (table) of inflected forms.
> However, since this is a dictionary, the word in
> the box is written with an accent: бори́тесь
> https://ru.wiktionary.org/wiki/бороться
>
> (I do realize that it would be possible to add
> redirect entries for all such inflected forms,
> but this has not been done in ru.wiktionary.)
>
> Searching for бори́тесь (which nobody would do)
> finds the relevant page,
> https://ru.wiktionary.org/w/index.php?search=бори́тесь
>
> but searching for боритесь (the normal thing)
> does not find the relevant page,
> https://ru.wiktionary.org/w/index.php?search=боритесь
>
> Note that Unicode doesn't contain accented versions
> of Cyrillic letters. Instead, the accent is made
> by suffixing a separate accent sign.
>
> $ echo "и" | od -c
> 0000000 320 270  \n
>
> $ echo "и́" | od -c
> 0000000 320 270 314 201  \n
>
>
That makes sense to me. I've filed it as
https://phabricator.wikimedia.org/T102298 and we'll get it prioritized.

Let me know if you don't like how I just copied your (very good) email into
the issue and I'll try to re-summarize.

Nik
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l