Minutes of the Fifth ScandSum network meeting Jan 25-28, 2003, Fefor Høifjellshotell, Norway

Participants of the meeting (in conjunction with the Nomen Nescio network)

ScandSum
Hercules Dalianis, KTH
Jürgen Wedekind, CST
Koenraad de Smedt, UiB
Martin Hassel, KTH
Sasan Fallahi, LU Lunds Universitet
Till Lech, CognIT
Trine Dahl, NHH-Norges Handelshögskola Bergen.

Nomen Nescio

Janne Bondi Johannessen, UiO
Åsne Thea Fraser Haaland, UiO
Eckhard Bick, SDU
Dorte Haltrup Hansen, CST (also ScandSum)
Andra Bjork Jonsdottir, UiO
Botolv Helleland, UiO
Dimitrios Kokkinakis, GU
Gordana Ilic Holen, UiO
Lars Nygaard, UiO
Paul Meurer, UiB (also ScandSum)
Lilja Øvrelid, UiO
Kristin Hagen, UiO
Anders Nøklestad, UiO

Conference

The ScandSum meeting took place in conjunction with Nomen Nescio network meeting and formed a workshop. The program in pdf and Nomen Nescos presentations

Summary of the Summarization talks

Regarding the automatic text summarizer various presentation and ideas popped up.
Martin Hassel presented SweSum with SweNam the new Swedish text summarizer with Named Entity (NE) tagging, that made the summaries either toward the indicative (with NE) or informative (without NE) way. It seems that NE can improve summarization in some cases.

Jürgen Wedekind and Dorte Haltrup discussed the evaluation of the newly constructed DanSum-The Danish text summarizer. They had found some problems with Danish abbreviations that Martin already had fixed. Jürgen proposed for summarizing longer texts using some sort of passage level summarization and only summarize subsections. E.g. Title, Introduction, Conclusions, etc. An other idea whas to look at the keywords at different passages or paragraphs and hence guess topic shifts. Using topic shifts one can slant the summaries towards the different topics.

Trine Dahl showed how to use lexical cohesion to summarize texts (Tele-pattan II). This is slightly similar to the lexical chaining approach. Trine suggested to look into compounds and make them high ranked since the might have high information density.

Regarding the key word lists used in SweSum we had also a discussion if we should use only nouns (incl compounds) but not adverbs except time adverbials as yesterday, today, saturday...etc.

Sasan Fallahi finally presented a thorough evaluation of SweSum carried out at Sydsvenska Dagbladet. He compared the performance of SweSum as opposed to human editors in summarizing 334 news texts. He found that in some cases SweSum performed well but sometimes in cut sentences in the middle. Also when summarizing at the end of longer articles it cut remove the first sentence of a paragraph while keeping the second or third sentence, this decreased the quality of the summarized text. Yet another problem was when summarizing plain text the paragraphs where put in one line instead of have a carriage return between each paragraph.

Finally Sasan found that for cutting down news for SMS size (max 160 characters), SweSum performed extremely well and found possibly be used for that almost directly. Regarding editorial summarization to make SweSum used by the editors one should seamless integrate SweSum with tools as e.g. Illustrator and hence summarize in a drag and drop style.

We learned also from the Nomen Nescio group that the were using 6 categories for recognition
Person, Plats, Organisation, Verk (Konstnärligt), Händelse, Annat (dock ej tid). Dimitrios Kokkinakis that build the Swedish Name Recognizer made even more distinctions. E.g. Yoruba-folket is tagged as a people.

Demonstrations: SiteSeeker administration and SiteSeeker Voice

Hercules demonstrated the administration interface for SiteSeeker and how the Nordoknet and its subsites are indexed and maintained. Search SiteSeeker Nordoknet !

Hercules demonstrated also SiteSeeker Voice the speech interface to SiteSeeker where one call a phone number and search using voice - The finding are read in a summarized/extracted way.

Possible meeting date

* 3-6 April 2003, Åre, Sweden
* 30 May-1 June 2003, Reykjavik, Iceland in conjunction with NODALIDA 2003.
* Fall 2003 Denmark ?

Tasks by next meeting in Åre

Paul Maurer extracts a Norwegian keyword lexicon from the SCARRIE lexicon and send it Martin Hassel to be incorporated in SweSum to create NorSum. We will also need a list of common Norwegian abbreviations. The format can be found here.

Till Lech checks if he can put the Corporum text summarizer on the www.

Slides from meeting (in PDF)

Jurgen Wedekind OH-slides in PDF
Koenraad de Smedt OH-slides in HTML
Trine Dahl OH-slides in PDF
Till Lech OH-slides Corporum tools in PDF
Hercules Dalianis OH-slides SweNam in PDF
Martin Hassel OH-slides SweSum in PDF and Summarizations in PDF
Sasan Fallahi OH-slides in PDF



Latest change March 10, 2003