Model

Framing the Problem

This project is framed as a binary classification problem, where the goal is to predict whether a power outage will last longer than 48 hours.

Target variable: severe_48h
Type: Classification

A key constraint is that only features available at the start of the outage are used. This avoids data leakage and ensures the model reflects a realistic prediction scenario.

Features Used

The model uses a combination of numerical and categorical features:

Numerical features:

start year
population
anomaly level

Categorical features:

cause category
cause category detail
climate category
NERC region
state
month, hour, and day of week

These features capture both environmental and infrastructure-related factors that may influence outage severity.

Baseline Model

The baseline model is a logistic regression model.

Preprocessing

Missing numerical values were filled using the median
Missing categorical values were filled using the most frequent value
Categorical variables were one-hot encoded
Numerical features were standardized

Performance

Accuracy ≈ 0.81
AUC ≈ 0.83

Because severe outages are less common, recall is especially important. A balanced version of the model improved recall for severe outages, helping reduce false negatives.

Model Improvements

Several improvements were explored to enhance performance:

1. Feature Engineering

Applied a log transformation to population to reduce skew
Created an interaction feature combining outage cause and climate

These changes aimed to provide the model with more meaningful patterns.

2. Hyperparameter Tuning

Cross-validation was used to tune logistic regression parameters:

Regularization strength (C)
Penalty type (L1 vs L2)

The best model used L2 regularization with default strength.

Final Model Performance

Final AUC ≈ 0.83

The improvements resulted in only a slight increase in performance, suggesting that the baseline model already captured most of the useful signal in the data.

Fairness Analysis

To evaluate fairness, model performance was compared across regions using recall.

West recall: 0.58
East recall: 0.79
Difference: -0.21
p-value = 0.026

Since the p-value is below 0.05, we reject the null hypothesis of equal performance.

This indicates that the model performs significantly worse at identifying severe outages in Western regions, suggesting potential regional bias.

Key Takeaways

Logistic regression performs well for this problem, achieving strong AUC
Feature engineering provided only marginal improvements
Recall is more important than accuracy due to class imbalance
The model shows evidence of regional bias, which would need to be addressed before real-world use