Boca Raton: CRC Press, 2009. — 300 p. — (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series). — ISBN: 978‑1‑4200‑5940‑3.
Giving a broad perspective of the field from numerous vantage points, Text Mining: Classification, Clustering, and Applications focuses on statistical methods for text mining and analysis. It examines methods to automatically cluster and classify text documents and applies these methods in a variety of areas, including adaptive information filtering, information distillation, and text search.
Marco Turchi, Alessia Mammone, and Nello CristianiniAnalysis of Text Patterns Using Kernel MethodsGeneral Overview on Kernel Methods
Kernels for Text
Example
Conclusion and Further Reading
Blaz Fortuna, Carolina Galleguillos, and Nello CristianiniDetection of Bias in Media Outlets with Statistical Learning MethodsOverview of the Experiments
Data Collection and Preparation
News Outlet Identification
Topic-Wise Comparison of Term Bias
News Outlets Map
Related Work
Appendix A: Support Vector Machines
Appendix B: Bag of Words and Vector Space Models
Appendix C: Kernel Canonical Correlation Analysis
Appendix D: Multidimensional Scaling
Galileo Namata, Prithviraj Sen, Mustafa Bilgic, and Lise GetoorCollective Classification for Text ClassificationCollective Classification: Notation and Problem Definition
Approximate Inference Algorithms for Approaches Based on Local Conditional Classifiers
Approximate Inference Algorithms for Approaches Based on Global Formulations
Learning the Classifiers
Experimental Comparison
Related Work
David M. Blei and John D. LaffertyTopic ModelsLatent Dirichlet Allocation
Posterior Inference for LDA
Dynamic Topic Models and Correlated Topic Models
Discussion
Brett W. Bader, Michael W. Berry, and Amy N. LangvilleNonnegative Matrix and Tensor Factorization for Discussion TrackingNotation
Tensor Decompositions and Algorithms
Enron Subset
Observations and Results
Visualizing Results of the NMF Clustering
Future Work
Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, and Suvrit SraText Clustering with Mixture of von Mises-Fisher DistributionsRelated Work
Preliminaries
EMon a Mixture of vMFs (moVMF)
Handling High-Dimensional Text Datasets
Algorithms
Experimental Results
Discussion
Conclusions and Future Work
Sugato Basu and Ian DavidsonConstrained Partitional Clustering of Text Data: An OverviewUses of Constraints
Text Clustering
Partitional Clustering with Constraints
Learning Distance Function with Constraints
Satisfying Constraints and Learning Distance Functions
Experiments
Conclusions
Yi ZhangAdaptive Information FilteringStandard Evaluation Measures
Standard Retrieval Models and Filtering Approaches
Collaborative Adaptive Filtering
Novelty and Redundancy Detection
Other Adaptive Filtering Topics
Yiming Yang and Abhimanyu LadUtility-Based Information DistillationA Sample Task
Technical Cores
Evaluation Methodology
Data
Experiments and Results
Concluding Remarks
Soumen Chakrabarti, Sujatha Das, Vijay Krishnan, and Kriti PuniyaniText Search-Enhanced with Types and EntitiesEntity-Aware Search Architecture
Understanding the Question
Scoring Potential Answer Snippets
Indexing and Query Processing