Re: Parallel text alignment (was: Push translation)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Parallel text alignment (was: Push translation)

Lars Aronsson
On 08/07/2010 02:23 AM, Andreas Kolbe wrote:
> Word-processing the Google output to arrive at a readable, written text creates more work than it saves.

This is where our experience differs. I'm working faster with the Google
Translator Toolkit than without.

> If Google want to build up their translation memory, I suggest they pay publishers for permission to analyse existing, published translations, and read those into their memory. This will give them a database of translations that the market judged good enough to publish, written by people who (presumably) understood the subject matter they were working in.

If we forget Google for a while, this is actually something that we could do
on our own. There are enough texts in Wikisource (out of copyright books)
that are available in more than one language. In some cases, we will run
into old spelling and use of language, but it will be better than nothing.
The result could be good input to Wiktionary.

Here is the Norwegian original of Nansen's Eskimoliv,
http://no.wikisource.org/wiki/Indeks:Nansen-Eskimoliv.djvu

And here is the Swedish translation, both from 1891,
http://sv.wikisource.org/wiki/Index:Eskimålif.djvu

Norwegian: Grønland er paa en eiendommelig vis knyttet til vort land og
folk.

Swedish:   Grönland är på ett egendomligt sätt knutet till vårt land och
vårt folk.

As you can see, there is one difference already in this first
sentence: The original ends "to our country and people",
while the translation ends "to our country and our people".

Is there any good free software for aligning parallel texts and
extracting translations? Looking around, I found NAtools,
TagAligner, and Bitextor, but they require texts to be marked
up already. Are these the best and most modern tools available?


--
   Lars Aronsson ([hidden email])
   Aronsson Datateknik - http://aronsson.se



_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikisource-l] Parallel text alignment (was: Push translation)

John Mark Vandenberg
On Sun, Aug 8, 2010 at 2:10 PM, Lars Aronsson <[hidden email]> wrote:
> ...
> Is there any good free software for aligning parallel texts and
> extracting translations? Looking around, I found NAtools,
> TagAligner, and Bitextor, but they require texts to be marked
> up already. Are these the best and most modern tools available?

there is a Mediawiki extension which is supposed to provide this:

http://wikisource.org/wiki/Wikisource:DoubleWiki_Extension

It is enabled on all wikisource subdomains.

http://en.wikisource.org/wiki/Crito?match=el

It doesn't work very well because our Wikisource projects have
different layouts, esp. templates such as the header on each page.

--
John Vandenberg

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikisource-l] Parallel text alignment

Lars Aronsson
On August 9, John Vandenberg wrote:

> On Sun, Aug 8, 2010 at 2:10 PM, Lars Aronsson<[hidden email]>  wrote:
>> Is there any good free software for aligning parallel texts and
>> extracting translations? Looking around, I found NAtools,
>> TagAligner, and Bitextor, but they require texts to be marked
>> up already. Are these the best and most modern tools available?
>
> there is a Mediawiki extension which is supposed to provide this:
> http://wikisource.org/wiki/Wikisource:DoubleWiki_Extension
>
> It is enabled on all wikisource subdomains.
> http://en.wikisource.org/wiki/Crito?match=el

This is a wonderful feature I didn't know about until now.
But it was not what I'm looking for. In computational
linguistics and natural language processing (NLP), a "text
aligner" is a piece of software that identifies which words
and phrases correspond to which in a translation. The
input is a translated text and the output is a dictionary.
It's like a more advanced "diff" tool.


--
   Lars Aronsson ([hidden email])
   Aronsson Datateknik - http://aronsson.se



_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l