ScandSum
Hercules Dalianis, KTH
Jürgen Wedekind, CST
Koenraad de Smedt, UiB
Martin Hassel, KTH
Sasan Fallahi, LU Lunds Universitet
Till Lech, CognIT
Trine Dahl, NHH-Norges Handelshögskola Bergen.
Nomen Nescio
Janne Bondi Johannessen, UiO
Åsne Thea Fraser Haaland, UiO
Eckhard Bick, SDU
Dorte Haltrup Hansen, CST (also ScandSum)
Andra Bjork Jonsdottir, UiO
Botolv Helleland, UiO
Dimitrios Kokkinakis, GU
Gordana Ilic Holen, UiO
Lars Nygaard, UiO
Paul Meurer, UiB (also ScandSum)
Lilja Øvrelid, UiO
Kristin Hagen, UiO
Anders Nøklestad, UiO
The ScandSum meeting took place in conjunction with Nomen Nescio network meeting and formed a workshop. The program in pdf and Nomen Nescos presentations
Regarding the automatic text summarizer various presentation and ideas popped
up.
Martin Hassel presented SweSum with SweNam the new Swedish text summarizer with Named
Entity (NE) tagging, that made the summaries either toward the indicative (with NE)
or informative (without NE) way. It seems that NE can improve summarization in some
cases.
Jürgen Wedekind and Dorte Haltrup discussed the evaluation of the newly constructed
DanSum-The Danish text summarizer. They had
found some problems with Danish abbreviations that Martin already had fixed. Jürgen
proposed for summarizing longer texts using some sort of passage level summarization
and only summarize subsections. E.g. Title, Introduction, Conclusions, etc.
An other idea whas to look at the keywords at different passages or paragraphs and
hence guess topic shifts. Using topic shifts one can slant the summaries towards
the different topics.
Trine Dahl showed how to use lexical cohesion to summarize texts (Tele-pattan II). This is slightly similar to the lexical chaining approach. Trine suggested to look into compounds and make them high ranked since the might have high information density.
Regarding the key word lists used in SweSum we had also a discussion if we should
use only nouns (incl compounds) but not adverbs except time adverbials as yesterday,
today, saturday...etc.
Sasan Fallahi finally presented a thorough evaluation of SweSum carried out at Sydsvenska
Dagbladet. He compared the performance of SweSum as opposed to human editors in summarizing
334 news texts. He found that in some cases SweSum performed well but sometimes in
cut sentences in the middle. Also when summarizing at the end of longer articles
it cut remove the first sentence of a paragraph while keeping the second or third
sentence, this decreased the quality of the summarized text. Yet another problem
was when summarizing plain text the paragraphs where put in one line instead of have
a carriage return between each paragraph.
Finally Sasan found that for cutting down news for SMS size (max 160 characters), SweSum performed extremely well and found possibly be used for that almost directly. Regarding editorial summarization to make SweSum used by the editors one should seamless integrate SweSum with tools as e.g. Illustrator and hence summarize in a drag and drop style.
We learned also from the Nomen Nescio group that the were using 6 categories for
recognition
Person, Plats, Organisation, Verk (Konstnärligt), Händelse, Annat (dock
ej tid). Dimitrios Kokkinakis that build the Swedish Name Recognizer made even more
distinctions. E.g. Yoruba-folket is tagged as a people.
Hercules demonstrated the administration interface for SiteSeeker and how the Nordoknet and its subsites are indexed and maintained. Search SiteSeeker Nordoknet !
Hercules demonstrated also SiteSeeker Voice the speech interface to SiteSeeker where one call a phone number and search using voice - The finding are read in a summarized/extracted way.
* 3-6 April 2003, Åre, Sweden
* 30 May-1 June 2003, Reykjavik, Iceland in conjunction with NODALIDA 2003.
* Fall 2003 Denmark ?
Paul Maurer extracts a Norwegian keyword lexicon from the SCARRIE lexicon and send it Martin Hassel to be incorporated in SweSum to create NorSum. We will also need a list of common Norwegian abbreviations. The format can be found here.
Till Lech checks if he can put the Corporum text summarizer on the www.
Jurgen Wedekind OH-slides
in PDF
Koenraad de Smedt OH-slides
in HTML
Trine Dahl OH-slides
in PDF
Till Lech OH-slides
Corporum tools in PDF
Hercules Dalianis OH-slides
SweNam in PDF
Martin Hassel OH-slides
SweSum in PDF and Summarizations
in PDF
Sasan Fallahi OH-slides
in PDF
Latest change March 10, 2003