Before implementing the editorial tools for each language in PALDO, we need to build the data models for each component language. Languages differ in more than words, so the database that contains information about multiple languages will have to account for those differences.
Consider, for example, one difference between English and the Romance languages: English nouns do not have gender, while Romance languages like French and Spanish do. So, the database should not contain a field for gender in English, but should in all the Romance languages.
Swahili nouns each belong to a noun class instead of a gender, meaning that the PALDO database needs to include a field for noun class for Swahili words. Noun classes are a defining feature of Bantu languages in general, so many African languages will need noun class data in their entries.
Some languages have tones that need to be marked in a pronunciation field, but noted separately from the standard. Others have multiple spelling systems. The complexities are never-ending, and sometimes extremely nuanced.
We are currently developing the data structures for the initial languages that will go into PALDO. Once we finish the structure for a language, we are programming the tools to edit dictionary entries for that language. Therefore, it is extremely important that we get it right the first time.
The working versions of the dictionary data structures are online at http://www.kasahorow.org/content/pan-african-living-dictionaries-online-paldo (and changing minute-by-minute). We are consulting with experts in each language, but we could use more input from a variety of perspectives. If you have any insights, please share them in the comments section for this blog entry.
|
|