The goal of this project is to provide a framework for the development of language technology for languages with limited computational resources. Using corpora harvested by my web crawler An Crúbadán, and statistical analyses of these corpora, it is possible to get something simple up and running with a minimum of work.
In addition to the flagship Irish version, there are several other language packs currently available (in various stages of completion): Afrikaans (by Petri Jooste and Tjaart van der Walt), Akan (by Paa Kwesi Imbeah), Cornish (by Paul Bowden and Edi Werner), Esperanto (by Tim Morley), French (by Myriam Lechelt and Laurent Godard), Hiligaynon (by Francis Dimzon), Icelandic (by Pétur Thors), Igbo (by Chinedu Uchechukwu), Languedocien (by Bruno Gallart), Scottish Gaelic (by Caoimhín Ó Donnaíle), Tagalog (by Ramil Sagum), Walloon (by Pablo Saratxaga), and Welsh (by Kevin Donnelly). These are kept under CVS at sourceforge.net.
Preliminary work has been done on several other languages; hopefully some of these will become available under CVS before long: Azerbaijani, Breton, Chichewa, Kashubian, Kinyarwanda, Kurdish, Ladin, Malagasy, Malay, Manx Gaelic, Mongolian, Norwegian, Setswana, Tagalog, Tetum, Upper Sorbian, Xhosa, Zulu.