A Bit of History
A mature spelling checker for Kalaallisut has been one of the top priorities on the Greenlandic wish list ever since Kalaallisut became the official language in Greenland with the Home Rule Act of 1979 but for many years the lack of human as well as financial resources in the field of Kalaallisut linguistics effectively prevented any serious attempts of the kind.
Around the turn of the millennium things slowly started to move: The university (founded in 1984) had produced the first graduates in Greenlandic, Oqaasileriffik was established, and – maybe most important of all – the general confidence in the vitality of Kalaallisut gradually grew. Kalaallisut was no longer considered a language on the edge of extinction. On the contrary, a strong belief in the language’s potential as a sustainable language on the little longer sight became the official political agenda.
The new focus called for new means including the exploitation of digital technology. A large-scaled language technology program, though, was generally considered wishful thinking because of the extreme derivational processes that characterize Kalaallisut and supposedly effectively keep the language outside technological reach.
In 2004 Oqaasileriffik received a huge compilation of 350.000 Greenlandic words as a gift from Erik Fleischer in Paamiut. In a (rather desperate) attempt to escape from the attitudinal dead-lock Erik’s word list was compiled into an alpha version of the speller.
The result was discouraging! In spite of the considerable size of the list the speller only had a coverage around 25% of running newspaper text. Not only was it useless but it also drained all available financial resources for the next years.
Still, the disappointment proved one fact namely that there is no easy alternative to rule driven spell checking for Kalaallisut. So next step was to pave the way for a fully-fledged project.
Kukkuniiaat version 1
In the spring of 2005 Oqaasileriffik’s senior adviser, Per Langgård, via a grant from the Nordic Council was bought off normal duties to launch a language technology program for Kalaallisut. His leave was prolonged in 2006 via another grant from the Home Rule government. The first goal was to provide a spell checker with a performance high enough for practical purposes.
The project got the best possible start because senior lecturer Trond Trosterud from Tromsø University and the Samic Divvun project offered the Greenlandic project invaluable help. Not only did Tromsø offer us a place under its technological umbrella by hosting the project on their own server including access to all the tools the Samic project had developed over the years but Trond played a most active role helping to design the project and as an ever-ready consultant whenever needed.
Tromsø’s help together with the fact that Erik Fleischer’s word lists made the compilation of the basic lexicons rather painless paved the way for an unusual rapid development of the basic automaton so that we already in the fall of 2006 could have Lingsoft, inc. compile it as an integrated spell checker and hyphenation tool for MS Office.
The first version had a coverage around 80% on newspaper text. 80% does not entirely meet modern industrial standards for spellers but exceeded on the other hand far our original expectations.
Kukkuniiaat version 2
Funding was scarce in 2007 so that the project slowed down considerably but the senior adviser was fortunately granted 3 months off normal duty via internal reorganizations and received valuable help from two undergraduate students who did a marvelous job. As a result the automaton was debugged and enlarged to a degree that it could be recompiled as version 2.0 of the spell checker.
Version 2.0 of Kukkuniiaat meets the industrial standard with a coverage just around 90%.
The disambiguation engine – first phase
Late in 2007 Per Langgård received a very substantial grant from The Ministry of Finance’s Technology Development Fund. The grant enabled him to take leave from almost all other duties in all of 2008 and also to employ an apprentice full time in the fall of 2008.
The apprentice was BA Arnannguaq Blytmann who took over responsibility for maintaining the automaton thus providing the senior adviser the time to design the next step of the Greenlandic technology project namely the disambiguation engine.
In the point of departure written Greenlandic has a very high level of ambiguity because of historical and assimilatory processes and because of its phonemic orthography. For an average a running Greenlandic word has about 4 different correct interpretations.
The automaton therefore falls short against a number of routines. It is for instance not possible to POS tag a corpus automatically until the tagger knows which of several possible readings should be chosen in a given context not to mention what can happen when automatic translation does not choose properly. We need a program to pick out the correct reading and discard the readings that do not make sense in the actual context. A disambiguator is such a program.
The disambiguation project is based on Constraint Grammar in the concrete shape of the vislcg3 engine developed by Tino Didriksen and Eckhard Bick both of whom played a most active part in helping us to launch the disambiguation project. Because of their invaluable help we once again got a flying start.
By the end of 2008 when we again ran out of funding the ambiguity level had been reduced from a level around 4 readings per word to a level below 2 readings per word.
The disambiguation engine – second phase
The disambiguation project had to be closed down by the end of 2008 because of lack of funding but in late spring 2009 a new grant from the The Ministry of Finance’s Technology Development Fund and another grant from the Nordic Council made it possible to reopen the project.
BA Beatrine Heilmann was employed as apprentice with responsibility for fst-maintenance and Per Langgård started to work full-time on the disambiguator in the fall.
In January 2010 Beatrine and Per received a highly needed boost when BA Judithe Denbæk joined the staff as a research assistant. Judithe is at present temp for Beatrine maintaining fst during Beatrine’s mother’s leave but is expected gradually to move into developing a brand new field of computational semantics for Kalaallisut once Beatrine is back in business.
The present goal is to have a syntactic disambiguator working at an error level of max. 10% by the end of 2010.