Natural Language Processing
Quick links: aimsigh | gramadoir | litreoir | crubadan | ccgb | lsg | gnu | fleiscin
Lectures, Papers, and Conferences
- I am on the program committe for the 4th Web as Corpus workshop in Marrakech, Morocco, June 1st, 2008.
- I was an invited speaker and served on the program committee for the 3rd Web as Corpus Workshop (WAC3) in Louvain-la-Neuve, Belgium, September 15-16, 2007. (Main talk: ODP, PDF. Panel discussion: ODP, PDF)
- The Crúbadán Project: Corpus building for under-resourced languages, Cahiers du Cental 4 (2007), pp5-15, C. Fairon, H. Naets, A. Kilgarriff, G-M de Schryver, eds., "Building and Exploring Web Corpora", Proceedings of the 3rd Web as Corpus Workshop in Louvain-la-Neuve, Belgium, September 2007. (PDF)
- Translations of free software into Irish (with Séamus Ó Ciardhuáin), Translation Ireland 17 (2007), no. 2, 19-30. Special issue on "Translation and Irish in the 21st Century". (PDF)
- Implementing NLP Projects for Non-Central Languages: Instructions for Funding Bodies, Strategies for Developers (with Oliver Streiter and Mathias Stuflesser), to appear in Machine Translation. (PDF)
- Machine translation for closely related language pairs, Proceedings of the Workshop "Strategies for developing machine translation for minority languages" at LREC 2006, Genoa, Italy, May 2006, pp103-107. (PDF)
- I was on the program committee for the workshop TAL et langues peu dotées at TALN 2005 in Dourdan, France, June 6-10, 2005.
- I was also on the program committee for a similar workshop that was held at the ACL meeting in Ann Arbor, June 29-30, 2005: Building and using parallel texts for languages with scarce resources.
- Applications of parallel corpora to the development of monolingual language technologies. (PDF)
- Automatic thesaurus generation for minority languages: an Irish example, Actes de la 10e conférence TALN à Batz-sur-Mer du 11 au 14 Juin 2003, volume 2, pp 203-212. Paper presented at the workshop Traitement Automatique des Langues Minoritaires et des Petites Langues. (PDF)
- Hyphenation patterns for minority languages, TUGboat 24 (2003), no. 2, 236-239. (PDF)
- Global Software, Lecture on internationalization for undergraduates at Saint Louis University, April 8th, 2002. (PDF Slides)
Software Projects
I have written and actively maintain a number of open source software packages in support of minority languages and other languages with limited computational resources.
Corpora, web-crawling, search engines
- aimsigh.com. Linguistically sophisticated search.
- An Crúbadán. A web crawler for building minority language corpora automatically.
- Corpas Comhthreomhar Gaeilge-Béarla. An aligned parallel corpus of Irish and English texts.
- Internet Corpus of Welsh. Contains approximately 100 million words of Welsh. Now in use by the University of Wales Welsh Dictionary.
- Other Corpora. Asturian, Aymara, Basque, Breton, ... Venda, Walloon, Zulu.
Spell Checking and Grammar Checking
- An Gramadóir. An open source grammar checking engine that works with vim, emacs, and OpenOffice.
- GaelSpell. Irish spellcheckers for multiple platforms built from a single, high-quality database.
- Aspell. I'm using web crawling and statistical methods to develop new spell checking packages for a number of minority languages.
Lexicography
- Líonra Séimeantach na Gaeilge. An Irish language semantic network ("WordNet"), available as a traditional thesaurus, or via a cool 3D browser.
- English-Irish-Afrikaans dictionary. Written with Darrin Speegle.
- Hyphenation. An Irish hyphenation dictionary adapted for use with TeX/LaTeX, Scribus, OpenOffice, etc.
Machine Translation
- ga2gd. Robust machine translation between closely-related languages.
Human Translation
- GNU/Linux. Ever wonder how to say "in compatibility mode, the last two arguments must be offsets" in Irish? I am team leader at the GNU Translation Project.
- OpenOffice.org. I'm also coordinating the effort to translate OpenOffice.org into Irish.
- Mozilla. Localization of the Firefox web browser, Thunderbird email handler, and Sunbird calendar into Irish.
- KDE. Joint work with Séamus Ó Ciardhuáin.