Springer, 2008. — 243 p.
As we enter the third decade of the World Wide Web (WWW), the textual revolution has seen a tremendous change in the availability of online information. Finding information for just about any need has never been more automatic—just a keystroke or mouse click away. While the digitalization and creation of textual materials continues at light speed, the ability to navigate, mine, or casually browse through documents too numerous to read (or print) lags far behind.
What approaches to text mining are available to efficiently organize, classify, label, and extract relevant information for today’s information-centric users? What algorithms and software should be used to detect emerging trends from both text streams and archives? These are just a few of the important questions addressed at the Text Mining Workshop held on April 28, 2007, in Minneapolis, MN. This workshop, the fifth in a series of annual workshops on text mining, was held on the final day of the Seventh SIAM International Conference on Data Mining (April 26–28, 2007).
With close to 60 applied mathematicians and computer scientists representing universities, industrial corporations, and government laboratories, the workshop featured both invited and contributed talks on important topics such as the application of techniques of machine learning in conjunction with natural language processing, information extraction and algebraic/mathematical approaches to computational information retrieval. The workshop’s program also included an Anomaly Detection/Text Mining competition. NASA Ames Research Center of Moffett Field, CA, and SAS Institute Inc. of Cary, NC, sponsored the workshop.
Most of the invited and contributed papers presented at the 2007 Text Mining Workshop have been compiled and expanded for this volume. Several others are revised papers from the first edition of the book.
Part I ClusteringCluster-Preserving Dimension Reduction Methods for Document Classification
Automatic Discovery of Similar Words
Principal Direction Divisive Partitioning with Kernels and –Means Steering
Hybrid Clustering with Divergences
Text Clustering with Local Semantic Kernels
Part II Document Retrieval and RepresentationVector Space Models for Search and Cluster Mining
Applications of Semidefinite Programming in XML Document Classification
Part III Email Surveillance and FilteringDiscussion Tracking in Enron Email Using PARAFAC
Spam Filtering Based on Latent Semantic Indexing
Part IV Anomaly DetectionA Probabilistic Model for Fast and Confident Categorization of Textual Documents
Anomaly Detection Using Nonnegative Matrix Factorization
Document Representation and Quality of Text: An Analysis
A: SIAM Text Mining Competition 2007