Cary: SAS Institute, 2017. — 426 p.
About The BookAbout the Author
Data Preparation for Survival Models
Step 1: Collect Three Raw Variables
Step 2: Create Three Required Variables
Step 3: Collect and Process Many Predictor Variables
Data Set Structure
Characteristics of the Data Set Used in This Book
Special Considerations for Survival Analysis Data
Input
Plan for the Collection of the Required Variables
Five Fundamental Questions for Time-to-NextPurchase Survival Models
Data Set Creation
(Start, End) Calendar Timeline
Time and Censor Variables
Strata by Previous Purchases
Categorical Variables
Data Preparation with Base SAS, Macros, and SAS/STAT
Multiple Imputation
Winsorization
Binning Using Macros
Data Preparation with SAS Enterprise Miner
Exploratory Data Analysis and BasicStatistical Concepts
Overview of Continuous and Discrete Distributions
Deciding Whether to Use Continuous Time or Discrete Time
Understanding the Concept of a Customer Tie
Continuous Distribution
Survival Function for Continuous Time
Hazard Function for Continuous Time
Discrete Distribution
Data Formats
Customer-Level Data Set (Standard Data Format)
Calculation of the Discrete Hazard—An Example
Interpretation of the Results
Month Data Set (Expanded Data Format)
Creation of Dummy Variables
A Simple Example
A Time-to-Next-Purchase Example
PROC LOGISTIC to Compute Hazard
Odds Calculation (January)
Hazard Calculation (January)
Survival Function
Hazard and Survival Relationship
Summary of Distributions
Life Table Method
Step 1: PROC LIFETEST with METHOD = LIFE
Step 2: At-Risk Adjustment
Step 3: Empirical Hazard Calculation
Step 4: Graphics and Interpretation
Data Storage Formats
Customer-Level Data Set (Standard Format)
Month Data Set (Expanded Format)Chapter 3: Stratified Sampling Method for
Survival Analysis
The Sampling Process with PROC SURVEYSELECT
Step 1: Analyze Sample Proportion for Strata
Step 2: Create the Model_Shopping Sample
Step 3: Create the Score_Shopping Sample
Step 4: Compare the Results from the
Model_Shopping and Score_Shopping Data Sets
SAS Enterprise Miner Nodes.
The Sample Node
The Data Partition Node.
The Flowchart
SAS Enterprise Miner Use from
Source to Customer-Level Model Output
Creating the Process Flow
Step 1: Create a New Project
Step 2: Create a New Library
Step 3: Create Start Code
Step 4: Create a New Data Source
Step 5: Create a Process Flow Diagram
Step 6: Insert the Data Partition Node
Step 7: Create the Survival Process Flow
Running a Model and Examining Results
Survival Node Properties Panel
Results
The Cubic Spline Regression and
Model Interpretation
SAS Enterprise Miner Strategy for Standard Data
Format
The Problem: The Nonlinearity of the HazardFunction
The Solution: The Cubic Spline Basis Functions
Two Challenges: Number of Knots to Choose and Placement of the Knots
The Mathematical Definition of a Knot
The Default Five-Knots Cubic Spline Basis Method without Stepwise Selection
The Default Five-Knots Cubic Splines Basis Method
Using the Stepwise Selection
Example 1: Interpretation of Numeric Variables
Understanding the Stepwise Regression Spline
Model in SAS Enterprise Miner
Writing the Model
Understanding the Odds Ratio
Example 2: Interpretation of Categorical Variables
Odds Calculation of Categorical Variables
Interpretations
Model Output
Model Information
Strata Analysis
Survival and Hazard Functions Saved in a Data Set
The Fully Expanded Data Set Model
Saving the Expanded Data Set
Include a SAS Code Node
Rename the SAS Code Node
Create a Permanent Library
Explore the SURV_EXPCENDATA Data Set
Running the Expanded Model
Create a Second Diagram and Assign Variable Roles
Set the Properties and Run
Comparing the Results
Statistics for Performance and Validation
Twofold Focus
SAS Enterprise Miner for Practical Solutions
Comparison of Model Validation Approaches
Validation: An Explanation
Traditional Logistic Regression
Survival Regression (Steps 1 through 4)
Statistics for Performance and Validation
Step 5: Compute the Hazard for Only the First Month of the Period
Step 6: Create a Hit Variable Indicating Whether Repurchase Occurs in January
Step 7: Compute the Depth of the Hazard in Quantiles
Step 8: Compute the Sensitivity
Step 9: Create the Concentration Curve
Step 10: Compute Lift
Step 11: Create the Lift Curve
Step 12: Compute Benefit
Step 13: Create the Benefit Curve
Step 14: Select the Cutoff=Depth Where the Benefit Is Maximum
The Depth Measure and the Hazard Measure: Which to Choose
Two Business Situations
Business Situation 1: Use of Depth as Cutoff
Business Situation 2: Use of Average Hazard
Function Value as Cutoff
Specificity and Suboptimal Cutoffs
Gini Concentration Ratio
Kolmogorov-Smirnov
Method 1—PROC NPAR1WAY
Method 2—PROC SQLDensity Analysis and PROC KDE
A Comprehensive Validation Approach
Scoring New and Old Customers
Business Situation 1: Scoring Customers
Preparing the Scoring Data Set
Scoring Using SAS Enterprise Miner
Analyzing the Optimized Score Code
Scoring Using SAS DATA Steps
Analyzing Customer ID = 4
Using PROC COMPARE for Validation
Business Situation 2: Calibrating Scoring Results
Step 1: Compute Actual and Forecast of Part_TRAIN
Step 2: Plot an Actual and Forecast Graph—Part_TRAIN
Step 3: Compute the Calibration (Prior Proportion—Part_TRAIN)
Step 4: Compute the Full-Year Forecast—Part_VALIDATE
Step 5: Calibrate the Forecast of Part_VALIDATE
Step 6: Plot an Actual by Calibrated Forecast Graph