The need for identifying languages properly for lexicological use

Gerard Meijssen-3
When there is a vote for "yet another" wikipedia, it is necessary to
have a code that identifies the new database. As Wikipedias are
written in a language, we use a code that identifies that language.
Typically people say we use the ISO-639 codes for that. This would
imply that a code used has a relation to the language that is being
used and, it should also imply that a wikipedia is indeed in a
particular language as recognised by the code.

The way the Wikipedia are is a matter of history and the continued
abuse of codes makes for often heated political discussions about
languages, it only make things more complicated.When you are
interested in reading more details on this subject, you can read what
I wrote on my blog.

In many projects we use "Babel" templates to indicate the language
proficiency of people. Particularly in Wiktionary and in WiktionaryZ,
we have to be precise when we indicate a  language. It means that when
we are to indicate that a word is in a specific language, it has to be
THAT language and not another language.

I propose for WiktionaryZ and for the Babel proficiency to exclusively
use the ISO-639-3 codes. When there are not enough codes in ISO-639-3
we will have to use codes that are clearly not ISO-639-3. These codes
may indicate orthographies, dialects and different scripts and even
languages that have not yet been considered to be a language.

The use of well defined codes will allow us to have our data used
reliably and to define our content better. This will enable people to
use our data and make WiktionaryZ a success

Where possible we will try to connect the codes used by Wikipedia to
ISO-639-3 codes. This will not be possible for several languages like
Albanian; the als code has been squatted by what ISO-639 considers a
language family.

