Pefna logoPefna

Example of thread classification and sorting

The list belows is a representation of the thread view in the Pefna client. The first column shows the number of articles in each thread. The second column contains the first seven characters of the strongest category in the thread. The rest of the line is the text of the subject line.

  1 Samplin DMX FAQ - REV 1.04  (9/3/96).
  1 Intelli best brand of Gaff tape
  1 Intelli Other possible FAQ subjects
  1 Intelli WTB: Strand and L-86 Dimmer modules
  1 Cogniti Flaming/sparkling necklace
  1 Cogniti THEATRE FAQ FOR THIS (& OTHER) NEWSGROUP - Example
  1 Cogniti EXAMPLE OF A FAQ WHICH COULD BE USED (Longish)
  2 Recordi Need to amplify sound port on mac.
  2 Recordi Re: FOG
  1 Recordi Smoking bird?
  2 Recordi Re: Gaffer Tape 
  1 Recordi List your service on the W3 - FREE!!!!!!
 32 Recordi Re: Gaffer Tape
  4 Recordi Re: Attaching foam to paneling?
  4 Recordi Burning Necklace?
  1 Recordi Things never said :-)
  2 Recordi Hum & Lights ---> Thanks
  3 Recordi Smoking Bird?
  1 Audio C Stagecraft for Bands
  1 Audio C TD/ Shop Foreman Leaving
  3 Audio C Design Software for IBM Compatibles
  1 Buildin Job Descriptions!!!
  1 Buildin Need license and certification info - ASAP
  1 Lightin Re:A most unusual request
  1 Lightin Anyone from the Irvine-Barclay Theatre
  1 Lightin SM & ASM postions in OPERA
  1 Lightin Gaffer tape
 11 Lightin WHY ASK FOR INFORMATION IF YOU CAN'T BE BOTHERED TO READ ??
  1 Lightin request subscribe
  1 Lightin Summer (and now) work
  4 Lightin Safety with Mirror balls
  2 Lightin Re: Scenic Design CAD software for the MAc
  1 Lightin Re: Stage blood recipe?
  9 Lightin Lighting Links and Logos
  1 Lightin RE: Lighting Links and Logos

The strongest category in a thread is simply defined as the shortest distance between an article and a category in the thread. This works very well for short threads (1 or 2 articles) but is increasingly less accurate for longer threads.

Another behaviour emerging from the decision to select exactly one category for each thread is that articles without strong resemblance to any existing category are forcibly affixed with a label. This is typical for very short messages. Categories with short or few examples also tend to attract these messages as the correspondance between common words overwhelm any topical keywords. Thus, newly created categories takes on the appearance of default categories.

Finally, categories should have scope, ie each category should only be used in a small number of selected newsgroups. Using specialized categories in newsgroups where corresponding material is not expected may be amusing but is also a source of misclassfication.