Springer, 2018. — 493 p. — ISBN: 978-3319735306.
ext analytics is a field that lies on the interface of information retrieval,machine learning, and natural language processing, and this textbook carefully covers a coherently organized framework drawn from these intersecting topics. The chapters of this textbook is organized into three categories:
- Basic algorithms: Chapters 1 through 7 discuss the classical algorithms for machine learning from text such as preprocessing, similarity computation, topic modeling, matrix factorization, clustering, classification, regression, and ensemble analysis.
- Domain-sensitive mining: Chapters 8 and 9 discuss the learning methods from text when combined with different domains such as multimedia and the Web. The problem of information retrieval and Web search is also discussed in the context of its relationship with ranking and machine learning methods.
- Sequence-centric mining: Chapters 10 through 14 discuss various sequence-centric and natural language applications, such as feature engineering, neural language models, deep learning, text summarization, information extraction, opinion mining, text segmentation, and event detection.
This textbook covers machine learning topics for text in detail. Since the coverage is extensive,multiple courses can be offered from the same book, depending on course level. Even though the presentation is text-centric, Chapters 3 to 7 cover machine learning algorithms that are often used indomains beyond text data. Therefore, the book can be used to offer courses not just in text analytics but also from the broader perspective of machine learning (with text as a backdrop).
This textbook targets graduate students in computer science, as well as researchers, professors, and industrial practitioners working in these related fields. This textbook is accompanied with a solution manual for classroom teaching.
Machine Learning for Text: An IntroductionWhat Is Special About Learning from Text?
Analytical Models for Text
Bibliographic Notes
Exercises
Text Preparation and Similarity ComputationRaw Text Extraction and Tokenization
Extracting Terms from Tokens
Vector Space Representation and Normalization
Similarity Computation in Text
Bibliographic Notes
Exercises
Matrix Factorization and Topic ModelingSingular Value Decomposition
Nonnegative Matrix Factorization
Probabilistic Latent Semantic Analysis
A Bird’s Eye View of Latent Dirichlet Allocation
Nonlinear Transformations and Feature Engineering
Bibliographic Notes
Exercises
Text ClusteringFeature Selection and Engineering
Topic Modeling and Matrix Factorization
Generative Mixture Models for Clustering
The k-Means Algorithm
Hierarchical Clustering Algorithms
Clustering Ensembles
Clustering Text as Sequences
Transforming Clustering into Supervised Learning
Clustering Evaluation
Bibliographic Notes
Exercises
Text Classification: Basic ModelsFeature Selection and Engineering
The Naїve Bayes Model
Nearest Neighbor Classifier
Decision Trees and Random Forests
Rule-Based Classifiers
Bibliographic Notes
Exercises
Linear Classification and Regression for TextLeast-Squares Regression and Classification
Support Vector Machines
Logistic Regression
Nonlinear Generalizations of Linear Models
Bibliographic Notes
Exercises
Classifier Performance and EvaluationThe Bias-Variance Trade-Off
Implications of Bias-Variance Trade-Off on Performance
Systematic Performance Enhancement with Ensembles
Classifier Evaluation
Bibliographic Notes
Exercises
Joint Text Mining with Heterogeneous DataThe Shared Matrix Factorization Trick
Factorization Machines
Joint Probabilistic Modeling Techniques
Transformation to Graph Mining Techniques
Bibliographic Notes
Exercises
Information Retrieval and Search EnginesIndexing and Query Processing
Scoring with Information Retrieval Models
Web Crawling and Resource Discovery
Query Processing in Search Engines
Link-Based Ranking Algorithms
Bibliographic Notes
Exercises
Text Sequence Modeling and Deep LearningStatistical Language Models
Kernel Methods
Word-Context Matrix Factorization Models
Graphical Representations of Word Distances
Neural Language Models
Recurrent Neural Networks
Bibliographic Notes
Exercises
Text SummarizationTopic Word Methods for Extractive Summarization
Latent Methods for Extractive Summarization
Machine Learning for Extractive Summarization
Multi-Document Summarization
Abstractive Summarization
Bibliographic Notes
Exercises
Information ExtractionNamed Entity Recognition
Relationship Extraction
Bibliographic Notes
Exercises
Opinion Mining and Sentiment AnalysisDocument-Level Sentiment Classification
Phrase- and Sentence-Level Sentiment Classification
Aspect-Based Opinion Mining as Information Extraction
Opinion Spam
Opinion Summarization
Bibliographic Notes
Exercises
Text Segmentation and Event DetectionText Segmentation
Mining Text Streams
Event Detection
Bibliographic Notes
Exercises