Program and Participants of
the Seventh ScandSum meeting 20-21 march 2004, Fjällgården
Åre
Participants
Hercules Dalianis - KTH Stockholm
Martin Hassel - KTH Stocholm
Nima Mazdak - KTH and Stockholm University
Koenraad de Smedt - University of Bergen
Anja Liseth - University of Bergen
Till Christopher Lech - Cognit and University of Bergen
Jürgen Wedekind - CST - Copenhagen University
Henrik Holmboe - Norfa - Aarhus School of Business
Kaili Müürisep - University of Tartu
Presentations
Farsisum by
Nima Mazdak
The summarizer for Persian - Master thesis link
Evaluation strategies by Martin
Hassel
Questions from Koenraad, why not using many queries per summary to find
the quality in the question-answering scheme.
NorSum evaluation – Anja Liseth
Problems in constructing an extract corpus: When two sentences get the
same selection frequency, is when two sentences get the same number of
votes, which one to select among? We choose the one with highest
position rank.
What happens if the ideal/gold standard contain mutual excluding
sentences.
Should we include them in ideal summary. We need methods to calculate
these
Bergen count compression rate on sentence level, Stockholm at word
level.
Anja informants felt that the text were full of air therefore the easy
to summary down to four sentences.
KunDoc project and demo of new Summarizer in MS-Office – Till
One can summarize a document in Word and obtain keywords that directly
are used to search with Google.
Semantic web
Make the webb structured using Ontologies,
XTM predicate logic, RFD, DAML+OIL, OWL,
Darpa homepage full of ontologies in different domains.
Using semantic web one ask question like “which project had a meeting
in Åre 2004?”
Tools for Semantic web: Text to Onto,
Protégé,
Results from automatic evaluation of DanSum and SweSum
Hercules & Martin
The results shows that Danish informants at average of 67% agree of
extracts at average summarization length
The results shows that Swedish informants at average of 61% agree of
extracts at average summarization length
Both Danish and Swedish human extracts have an average length of 32%
and 34% respectively.
Martin found the best extract compared to the majority votes for all
languages. Some of the extracts had 100% overlap to majority vote. Now
we have to compare these best extracts with SweSum.
Danish summarization –
Jürgen
Summarize more coherent. Users would prefer either 15% (news paper
editing) or 85% (news surveillance) compression rate. Summarize whole
article and take care of different segment in different way.
Demonstration - Grim -
http://skrutten.nada.kth.se/grim
a language learning environmentfor writers of Swedish (spell- and
grammar checking, translation, summarization) - Hercules
Language technology resources for Estonian text summarization-Kaili
Müürisep
In Tallinn there are two place for language technology and that is
Institute of Estonian Languages (dictionaries, morphology) and
Institute of Cybernetics (speech processing). In Tartu there is
University of Tartu that also works with text summarization in a small
scale. Two systems has been developed -AutoSum with a powerful
analysis. AutoSum was a short bachelor project. EstSum is a
smaller but more modular system that is written in Perl and that also
is continued with small means. Kaili described also that ! and ? give
penalty points in the summary. Questions can be removed. The same with
exclamations. EstSum uses a corpora of half million words to calculate
the frequency of words. Though no lemmatizer is used yet.
TvärSök proposal - Hercules
Hercules described a research proposal of of cross language information
retrievel in three languages Danish, Swedish and Norwegian, using three
different approaches
Lexikon-lockup, fuzzy matching, and Random Indexing.
Paper writing
Norfa Årbog contribution
Longer article, 20-25 pages, at Advanced popular scientific level
containing overview of ScandSum work Future direction, Graphs more than
numbers.
Koenraad is the editor Deadline for paper June 30, 2004,
Hercules will contact Sasan
New research proposal and
Fund rising for continuation of work
Deadlines
Nordpluss nabo deadline March 1, 2005
Nordpluss sprog decision April 15, 2004
Vetenskapsrådet April 20, 2004
Visitrain
NFR IT-funk
FET (cordis.lu)
ADVENTURE / NEST http://www.cordis.lu/nest/adventure.htm
Existing projects
KUNDOC
BATMULT, MULTILINGUA (Marie Curie) http://helmer.hit.uib.no/batmult
Nordic Graduate School in Language Technology
http://www.gslt.hum.gu.se/nordic/
Slides from the meeting (in PDF)
Hercules Dalianis OH-slides, 1,
2,
and 3
Nima Mazdak
OH-slides
Martin Hassel OH-slides
OH-slides
Anja Liseth OH-slides
Jürgen Wedekind OH-slides
Till Lech OH-slides
Kaili Müürisep OH-slides
Latest change 24 march 2004.