The need for identifying languages properly for lexicological use
When there is a vote for "yet another" wikipedia, it is necessary to
have a code that identifies the new database. As Wikipedias are
written in a language, we use a code that identifies that language.
Typically people say we use the ISO-639 codes for that. This would
imply that a code used has a relation to the language that is being
used and, it should also imply that a wikipedia is indeed in a
particular language as recognised by the code.
In many projects we use "Babel" templates to indicate the language
proficiency of people. Particularly in Wiktionary and in WiktionaryZ,
we have to be precise when we indicate a language. It means that when
we are to indicate that a word is in a specific language, it has to be
THAT language and not another language.
I propose for WiktionaryZ and for the Babel proficiency to exclusively
use the ISO-639-3 codes. When there are not enough codes in ISO-639-3
we will have to use codes that are clearly not ISO-639-3. These codes
may indicate orthographies, dialects and different scripts and even
languages that have not yet been considered to be a language.
The use of well defined codes will allow us to have our data used
reliably and to define our content better. This will enable people to
use our data and make WiktionaryZ a success
Where possible we will try to connect the codes used by Wikipedia to
ISO-639-3 codes. This will not be possible for several languages like
Albanian; the als code has been squatted by what ISO-639 considers a