1. Overview
Survival analysis studies the time until an event occurs while handling censored observations. In this project, customer tenure is used as the duration variable, and customer churn is treated as the event of interest. A record with Churn = Yes is an observed event, while Churn = No is right-censored.
Compared with ordinary churn classification, survival analysis is more informative because it uses both event occurrence and event timing.
2. Dataset and Preprocessing
The analysis follows the Spark survival-analysis tutorial with the Telco Customer Churn dataset. The cohort was restricted to month-to-month customers with internet service. The event indicator is defined as 1 for churned customers and 0 for censored customers.
| Item | Value |
|---|---|
| Rows in raw dataset | 7043 |
| Rows in analysis cohort | 3351 |
| Churned customers in cohort | 1556 |
| Non-churned customers in cohort | 1795 |
| Mean tenure in cohort | 19.43 months |
| Median tenure in cohort | 13 months |
| Mean monthly charges | 73.59 |
| Churn indicator | Customer count | Avg. tenure | Avg. monthly charges |
|---|---|---|---|
| 0 (censored) | 1795 | 23.62 | 71.17 |
| 1 (churned) | 1556 | 14.60 | 76.38 |
3. Kaplan–Meier Survival Analysis
The Kaplan–Meier estimator was used to estimate the customer survival function over tenure months. The overall median survival time of the filtered cohort was 34 months.
| Result | Statistic / value | p-value |
|---|---|---|
| Overall median survival time | 34 months | — |
| Gender log-rank test | 2.0389 | 0.1533 |
| OnlineSecurity log-rank test | 141.6032 | 1.19 × 10−32 |
The gender difference was not statistically significant, while OnlineSecurity created a very strong separation in survival experience.
4. Cox Proportional Hazards Model
The Cox model estimates how covariates affect churn hazard. A hazard ratio below 1 indicates lower churn risk.
| Covariate | Coefficient | Hazard ratio | p-value |
|---|---|---|---|
dependents_Yes | -0.33 | 0.72 | < 0.005 |
internetService_DSL | -0.22 | 0.80 | < 0.005 |
onlineBackup_Yes | -0.78 | 0.46 | < 0.005 |
techSupport_Yes | -0.64 | 0.53 | < 0.005 |
The most protective variables in this Cox model were onlineBackup_Yes and techSupport_Yes.
The proportional hazards assumption check suggested possible non-proportional behavior for some variables, so the Cox model should be interpreted as useful but not assumption-perfect.
5. Accelerated Failure Time Model
The log-logistic AFT model describes how covariates accelerate or decelerate time until churn. In this model, exp(coef) greater than 1 suggests longer time until churn.
| Covariate | Coefficient | exp(coef) | p-value |
|---|---|---|---|
internetService_DSL | 0.38 | 1.47 | < 0.005 |
onlineSecurity_Yes | 0.86 | 2.37 | < 0.005 |
onlineBackup_Yes | 0.81 | 2.25 | < 0.005 |
techSupport_Yes | 0.69 | 1.99 | < 0.005 |
6. Customer Lifetime Value
The survival model was also used to estimate expected customer lifetime value. Predicted survival probabilities were multiplied by monthly profit and discounted over time.
| Contract month | Survival probability | Discounted expected profit | Cumulative NPV |
|---|---|---|---|
| 1 | 0.9481 | 28.21 | 28.21 |
| 12 | 0.8137 | 22.10 | 297.51 |
| 24 | 0.7235 | 17.78 | 533.25 |
7. Main Findings
- The filtered cohort has a high churn rate of 46.43%, so it is a suitable high-risk segment for survival modeling.
- Churned customers have shorter average tenure than censored customers.
OnlineSecurityis strongly associated with better survival, while gender is not significant.TechSupportandOnlineBackupare associated with lower churn hazard.OnlineSecurityandTechSupportincrease estimated time until churn in the AFT model.