Guest professorship at CST - Center for Center for Language Technology, University of Copenhagen founded by Norfa

Hercules Dalianis has a part time Guest professorship at CST Center for Language Technology, University of Copenhagen during 2002-2005

The aim of the guest professorship at CST - Centre for Language Technology, is to transfer and develop competence for language technology between Swedish and Danish within the following areas.

The amount of information on Internet is growing rapidly and we need tools to tame this flow. Tools based in filtering and extracting information.


Text summarization

Automatic text summarization is the technique where a computer summarizes a text by extracting the most important information and compile a new non redundant text. An example of this technique is SweSum automatic text summarizer (Dalianis 2000) for Swedish news text. We adapted the Swedish  text summarizer SweSum to Danish DanSum using the Danish STO-lexicon (in Danish)<>. We adapted also SweSum to Norwegian in NorSum. DanSum has been evaluated in the project DefSum at Danmarks Elektroniske Forskningsbibliotek. DanSum was also evaluated by creating a Danish extract corpus collection, the evaluation is described in Hassel (2005) and in de Smedt et al (2004). The evaluation of NorSum is decribed in Liseth (2004)

Information retrieval

The SiteSeeker search engine is a (Swedish) language sensitive search engine for web sites and intranets. We wanted also to adapt SiteSeeker to Danish and Norwegian to improve precision and recall. We started by connecting SiteSeeker to the Scandinavian multilingual web site Nordoknet.

Then we adapted the Swedish stemmer first to Danish and then to Norwegian. The work was rather straightforward since both Danish and Norwegian are closely related languages to Swedish. We used also the  CST lemmatizer to automatically create stemmers from each of our keyword dictionaries used for each of our text summarizers. The work was a success the stemmers became very exact, except for Norwegian where the manual rules based stemmer was more precise.

Cross language information retrieval was carrid out in an early prototype described in (Wedekind 2005) In the final period of the guest professorship we obtained funding from the Nordic Council to construct a cross language search engine for the scandinavian languages TvärSök.

References

Dalianis, H. 2000 SweSum - A Text Summarizer for Swedish, Technical report TRITA-NA-P0015, IPLab-174, NADA, KTH, October 2000

de Smedt, K., A. Liseth, M. Hassel, H. Dalianis 2005. How short is good? An evaluation of automatic summarization. In Holmboe, H. (ed.) Nordisk Sprogteknologi 2004. Årbog for Nordisk Språkteknologisk Forskningsprogram 2000-2004, pp 267-287, Museum Tusculanums Forlag,

Hassel, Martin. 2004. Evaluation of automatic text summarization – a practical implementation. Licentiate thesis, Stockholm, NADA-KTH.

Liseth, Anja. 2004. Hvor kort er godt? : En evaluering av NorSum: en automatisk tekstsammenfatter for norsk. Hovedoppgave. Department of Linguistics. University of Bergen. (In Norwegian).

Wedekind, J. 2005. Towards Multilingual Retrieval of Document Information on Language Technology. In Holmboe, H. (ed.) Nordisk Sprogteknologi 2005. Årbog for Nordisk Språkteknologisk Forskningsprogram 2000-2004, pp 33-38, Museum Tusculanums Forlag.



Pressreleases

DanSum, the first text summariser for Danish, Norfa October 9, 2002.

Summarised news for the mobile phone, March 29, 2004.

Sammanfattade nyheter i mobilen, March 29, 2004, (in Swedish).


Latest change August 18, 2005.