Attewell P., Monaghan D. Data Mining for the Social Sciences: An Introduction

Файл формата pdf
размером 28,31 МБ

Добавлен пользователем Евгений Машеров 22.08.2016 18:21
Описание отредактировано 09.11.2017 02:53

Attewell P., Monaghan D. Data Mining for the Social Sciences: An Introduction

Los Angeles: University of California Press, 2015. — 252 p. — ISBN: 978–0–520–28097–7

We live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Additionally, powerful algorithms are capable of churning through seas of data to uncover patterns. Providing a simple and accessible introduction to data mining, Paul Attewell and David B. Monaghan discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists. The authors also empower social scientists to tap into these new resources and incorporate data mining methodologies in their analytical toolkits. Data Mining for the Social Sciences demystifies the process by describing the diverse set of techniques available, discussing the strengths and weaknesses of various approaches, and giving practical demonstrations of how to carry out analyses using tools in various statistical software packages.

Concept
What Is Data Mining?
The Goals of This Book
Software and Hardware for Data Mining
Basic Terminology
Contrasts with the Conventional Statistical Approach
Predictive Power in Conventional Statistical Modeling
Hypothesis Testing in the Conventional Approach
Heteroscedasticity as a Threat to Validity in Conventional Modeling
The Challenge of Complex and Nonrandom Samples
Bootstrapping and Permutation Tests
Nonlinearity in Conventional Predictive Models
Statistical Interactions in Conventional Models

Some General Strategies Used in Data Mining
Cross-Validation
Overfi tting
Boosting
Calibrating
Measuring Fit: The Confusion Matrix and ROC Curves
Identifying Statistical Interactions and Eff ect Heterogeneity in Data Mining
Bagging and Random Forests
The Limits of Prediction
Big Data Is Never Big Enough
Important Stages in a Data Mining Project
When to Sample Big Data
Building a Rich Array of Features
Feature Selection
Feature Extraction
Constructing a Model
Worked Examples
Preparing Training and Test Datasets
The Logic of Cross-Validation
Cross-Validation Methods: An Overview
Variable Selection Tools
Stepwise Regression
The LASSO
VIF Regression
Creating New Variables Using Binning and Trees
Discretizing a Continuous Predictor
Continuous Outcomes and Continuous Predictors
Binning Categorical Predictors
Using Partition Trees to Study Interactions
Extracting Variables
Principal Component Analysis
Independent Component Analysis
Classifiers
K-Nearest Neighbors
Naive Bayes
Support Vector Machines
Optimizing Prediction across Multiple Classifiers
Classification Trees
Partition Trees
Boosted Trees and Random Forests
Neural Networks
Clustering
Hierarchical Clustering
K-Means Clustering
Normal Mixtures
Self-Organized Maps
Latent Class Analysis and Mixture Models
Latent Class Analysis
Latent Class Regression
Mixture Models
Association Rules

Notes

Attewell P., Monaghan D. Data Mining for the Social Sciences: An Introduction

Смотри также

Nisbet R., Elder J., Miner G. Handbook of Statistical Analysis and Data Mining Applications