Springer, 2009. Corrected 12th printing, 2017. — 745 p. — ISBN: 0387848576, 978-0387848570.
Исправлены более 100 опечаток, обнаруженных в издании 2009 года.During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting-the first comprehensive treatment of this topic in any book.
This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for "wide'' data (p bigger than n), including multiple testing and false discovery rates.
Overview of Supervised LearningVariable Types and Terminology, Two Simple Approaches to Prediction,
Least Squares and Nearest Neighbors, Statistical Decision Theory, Local Methods in High Dimensions,
Statistical Models, Supervised Learning and Function Approximation, Structured Regression Models,
Classes of Restricted Estimators, Model Selection and the Bias–Variance Tradeoff
Linear Methods for RegressionLinear Regression Models and Least Squares, Subset Selection, Shrinkage Methods,
Methods Using Derived Input Directions, Discussion: A Comparison of the Selection and Shrinkage Methods, Multiple Outcome Shrinkage and Selection,
More on the Lasso and Related Path Algorithms
Linear Methods for ClassificationLinear Regression of an Indicator Matrix, Linear Discriminant Analysis, Logistic Regression, Separating Hyperplanes
Basis Expansions and RegularizationPiecewise Polynomials and Splines, Filtering and Feature Extraction, Smoothing Splines,
Automatic Selection of the Smoothing Parameters, Nonparametric Logistic Regression,
Multidimensional Splines, Regularization and Reproducing Kernel Hilbert Spaces, Wavelet Smoothing
Kernel Smoothing MethodsOne-Dimensional Kernel Smoothers, Selecting the Width of the Kernel, Local Regression in IRp,
Structured Local Regression Models in IRp, Local Likelihood and Other Models,
Kernel Density Estimation and Classification, Radial Basis Functions and Kernels, Mixture Models for Density Estimation and Classification
Model Assessment and SelectionBias, Variance and Model Complexity, The Bias–Variance Decomposition, Optimism of the Training Error Rate,
Estimates of In-Sample Prediction Error, The Effective Number of Parameters, The Bayesian Approach and BIC,
Minimum Description Length, Vapnik–Chervonenkis Dimension, Cross-Validation, Bootstrap Methods, Conditional or Expected Test Error?
Model Inference and AveragingThe Bootstrap and Maximum Likelihood Methods, Bayesian Methods, Relationship Between the Bootstrap and Bayesian Inference,
The EM Algorithm MCMC for Sampling from the Posterior, Bagging, Model Averaging and Stacking, Stochastic Search: Bumping
Additive Models, Trees, and Related MethodsGeneralized Additive Models, Tree-Based Methods, PRIM: Bump Hunting, MARS: Multivariate Adaptive Regression Splines,
Hierarchical Mixtures of Experts, Missing Data
Boosting and Additive TreesBoosting Methods, Boosting Fits an Additive Model, Forward Stagewise Additive Modeling, Exponential Loss and AdaBoost,
Why Exponential Loss?, Loss Functions and Robustness, Off-the-Shelf Procedures for Data Mining, Example: Spam Data, Boosting Trees,
Numerical Optimization via Gradient Boosting, Right-Sized Trees for Boosting, Regularization
Neural NetworksProjection Pursuit Regression, Neural Networks, Fitting Neural Networks, Some Issues in Training Neural Networks,
Example: Simulated Data, Example: ZIP Code Data, Discussion, Bayesian Neural Nets and the NIPS 2003 Challenge
Support Vector Machines and Flexible DiscriminantsThe Support Vector Classifier, Support Vector Machines and Kernels, Generalizing Linear Discriminant Analysis, Flexible Discriminant Analysis,
Penalized Discriminant Analysis, Mixture Discriminant Analysis
Prototype Methods and Nearest-NeighborsPrototype Methods, k-Nearest-Neighbor Classifiers, Adaptive Nearest-Neighbor Methods
Unsupervised LearningAssociation Rules, Cluster Analysis, Self-Organizing Maps, Principal Components, Curves and Surfaces,
Non-negative Matrix Factorization, Independent Component Analysis and Exploratory Projection Pursuit,
Multidimensional Scaling, Nonlinear Dimension Reduction and Local Multidimensional Scaling, The Google PageRank Algorithm
Random ForestsDefinition of Random Forests, Details of Random Forests, Analysis of Random Forests
Ensemble LearningBoosting and Regularization Paths, Learning Ensembles
Undirected Graphical ModelsMarkov Graphs and Their Properties, Undirected Graphical Models for Continuous Variables, Undirected Graphical Models for Discrete Variables
High-Dimensional ProblemsWhen p is Much Bigger than N, Diagonal Linear Discriminant Analysis and Nearest Shrunken Centroids,
Linear Classifiers with Quadratic Regularization, Linear Classifiers with L1 Regularization,
Classification When Features are Unavailable, High-Dimensional Regression:
Supervised Principal Components, Feature Assessment and the Multiple-Testing Problem
The website for this book is located at web.stanford.edu/~hastie/ElemStatLearn/