## Description

Introduction

My solution splits the problem into 3 parts:

1. Defining a non-paying household

2. Classifying them against future-paying households

3. Optimizing their power shut-offs

Part 1: Determining Thresholds

Given

Payment history: amount and time stamps of transactions, billing, and contact with power company

Duration of residancy and service

Use

Cusum model

Sample Probability Distributions

K Means Clustering

To

Determine a threshold for determining when a customer is not going to pay

Probability Distribution

Later, we can use this as part of our cost calculations.

CUSUM

We can also try to build CUSUM models on the scaled data. For example, rather than looking at the time between payments in days, we can look at the relative distance from each payment to model. That way, houses with a tendency to be consistently very late but always pay will have leeway.

After building CUSUM models, we can decide which houses are to be considered non-paying. By adding a categorical variable for each of these houses, we can further out analysis by looking for patterns in the data in relation to this new variable. This can involve correlation analysis, information coefficients, or even simply performing the distribution method above to see how these non-paying houses compare to the rest of the data. This can enforce the threshold from before or help create a new one.

K-Means Clustering

K-Means clustering can be used both in Part 1 and in Part 2. In Part 1, it can be used to reinforce the decision threshold for setting up our classification data either by refining the houses that should be deemed as non-paying or by adding houses to the list. Similar to the CUSUM model, it can group non-paying houses and the results can be used to determine non-paying vs future-paying.

Part 2: Classifying Non-Paying Customers

Given

Threshold found in Part 1

Same data as Part 1

Location (zip codes, streets, latitude/longitude), number of residents

Use

CUSUM

Logistic Regression SVM

To

Classify Non-Paying vs Future-Paying Households

CUSUM

Classification Modeling

Another approach is to treat this as a true classification problem. Since we have a threshold for whether or not households are non-paying vs future-paying, we can train, validate, and test a basic machine learning model such as SVM or Logistic Regression.

Part 3: Optimizing Shut-Offs

Given

Location of shut-off houses

Number of workers

Time it takes for workers to shut off power in each home Amount of resources available

Use

Optimization modeling

Simulation if required

To

Determine which houses to shut-off

Optimizing

Producing an optimization model might be fairly complicated for this. We have to account for the cost of shutting off a house, the cost of re-installing based on probability of it needing to be reinstalled, the cost of travel including going between houses (non-memoryless) in the network, and the additional cost accrued as time passes for each household.

The objective function will be the sum of keeping the house binary (on or off) multiplied by the cost. The cost, however, will be a combination of the factors listed above. We are contrained by the number of workers available and the time it takes to shut-off power in the homes. If the process takes weeks, our model might change the decision of several houses and we must adjust our optimization model accordingly if the cost is worth it.

Simulation

## Reviews

There are no reviews yet.