DrDubWiki | Smart Mailing List Reader

Version 0
Possible software to use
References

Keeping up with multiple, high traffic, mailing lists can be a fool's errand. Machine learning and automatic text classification have for many years promised a better solution to this problem. This project seeks to put that promise to the test, by building a custom model of emails of interest to a particular user.

In the long run, this can be turned into a multi-user website incorporating not only a trained model, but also hard constraints (show all messages that mention a particular person or project), plus thread-pattern-based filtering (see Joey's famous blog post referenced below).

In the future, this can be part of a personalized information dashboard incorporating RSS feeds, tweeter feeds and user activities via http://zeitgeist-project.com/ (and others). While the focus here is in 100% in mailing list, I do so want to live in such future!

See a demo and consider joining the SourceForge project.

Version 0

This a first version to get things moving and start collecting training material.

Single user
Backend code in perl
Using re-purposed spam detection technology.
~~Gmane for showing the message~~ (Gmane doesn't track all the mailing lists I follow)
SQLite3 backend
AJAX front-end written in Scala and NextApp Echo3

Details

An existing trainable spam classifier, trained once a week per mailing list.
~~A mailing list model, trained time of the day and number of interesting messages left~~ not for version 0
~~An existing ML mixing the different scores, retrained every time a new email is classified.~~ not for version 0
A rule engine, with rules written in Perl

Possible software to use

Lurker: http://lurker.sourceforge.net
- Try to collaborate with the author? http://www.dvs.tu-darmstadt.de/staff/terpstra/
- Lurker in action: https://lists.exim.org/lurker/message/20110911.031744.c6d23255.en.html
libbow: http://www.cs.cmu.edu/~mccallum/bow/
- Try to get it back into Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=525229
CRF++: http://crfpp.sourceforge.net/
a digramic Bayesian classifier: http://dbacl.sourceforge.net/
- Also used as a chess player! http://dbacl.sourceforge.net/spam_chess-12.html
http://jmap.io/software.html
https://james.apache.org/

References

Text Classiﬁcation from Labeled and Unlabeled Documents using EM by Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchel. Machine Learning , 1–34 (1999)

Experience with Rule Induction and k-Nearest Neighbor Methods for Interface Agents that Learn, by Terry R. Payne, Peter Edwards, and Claire L. Green. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 2, MARCH-APRIL 1997.

Joey's thread patterns: http://joey.kitenet.net/blog/entry/thread_patterns/

Smart Mailing List Reader

Table of contents

Version 0

Possible software to use

References

System Menu

Backlinks

Structures

Page actions

Smart Mailing List Reader

Table of contents

Version 0

Possible software to use

References

System Menu