Model

Framing the Problem

This project is framed as a binary classification problem, where the goal is to predict whether a power outage will last longer than 48 hours.

A key constraint is that only features available at the start of the outage are used. This avoids data leakage and ensures the model reflects a realistic prediction scenario.


Features Used

The model uses a combination of numerical and categorical features:

Numerical features:

Categorical features:

These features capture both environmental and infrastructure-related factors that may influence outage severity.


Baseline Model

The baseline model is a logistic regression model.

Preprocessing

Performance

Because severe outages are less common, recall is especially important. A balanced version of the model improved recall for severe outages, helping reduce false negatives.


Model Improvements

Several improvements were explored to enhance performance:

1. Feature Engineering

These changes aimed to provide the model with more meaningful patterns.


2. Hyperparameter Tuning

Cross-validation was used to tune logistic regression parameters:

The best model used L2 regularization with default strength.


Final Model Performance

The improvements resulted in only a slight increase in performance, suggesting that the baseline model already captured most of the useful signal in the data.


Fairness Analysis

To evaluate fairness, model performance was compared across regions using recall.

Since the p-value is below 0.05, we reject the null hypothesis of equal performance.

This indicates that the model performs significantly worse at identifying severe outages in Western regions, suggesting potential regional bias.


Key Takeaways