Description
Introduction
My solution splits the problem into 3 parts:
1. Defining a non-paying household
2. Classifying them against future-paying households
3. Optimizing their power shut-offs
Part 1: Determining Thresholds
Given
Payment history: amount and time stamps of transactions, billing, and contact with power company
Duration of residancy and service
Use
Cusum model
Sample Probability Distributions
K Means Clustering
To
Determine a threshold for determining when a customer is not going to pay
Probability Distribution
Later, we can use this as part of our cost calculations.
CUSUM
We can also try to build CUSUM models on the scaled data. For example, rather than looking at the time between payments in days, we can look at the relative distance from each payment to model. That way, houses with a tendency to be consistently very late but always pay will have leeway.
After building CUSUM models, we can decide which houses are to be considered non-paying. By adding a categorical variable for each of these houses, we can further out analysis by looking for patterns in the data in relation to this new variable. This can involve correlation analysis, information coefficients, or even simply performing the distribution method above to see how these non-paying houses compare to the rest of the data. This can enforce the threshold from before or help create a new one.
K-Means Clustering
K-Means clustering can be used both in Part 1 and in Part 2. In Part 1, it can be used to reinforce the decision threshold for setting up our classification data either by refining the houses that should be deemed as non-paying or by adding houses to the list. Similar to the CUSUM model, it can group non-paying houses and the results can be used to determine non-paying vs future-paying.
Part 2: Classifying Non-Paying Customers
Given
Threshold found in Part 1
Same data as Part 1
Location (zip codes, streets, latitude/longitude), number of residents
Use
CUSUM
Logistic Regression SVM
To
Classify Non-Paying vs Future-Paying Households
CUSUM
Classification Modeling
Another approach is to treat this as a true classification problem. Since we have a threshold for whether or not households are non-paying vs future-paying, we can train, validate, and test a basic machine learning model such as SVM or Logistic Regression.
Part 3: Optimizing Shut-Offs
Given
Location of shut-off houses
Number of workers
Time it takes for workers to shut off power in each home Amount of resources available
Use
Optimization modeling
Simulation if required
To
Determine which houses to shut-off
Optimizing
Producing an optimization model might be fairly complicated for this. We have to account for the cost of shutting off a house, the cost of re-installing based on probability of it needing to be reinstalled, the cost of travel including going between houses (non-memoryless) in the network, and the additional cost accrued as time passes for each household.
The objective function will be the sum of keeping the house binary (on or off) multiplied by the cost. The cost, however, will be a combination of the factors listed above. We are contrained by the number of workers available and the time it takes to shut-off power in the homes. If the process takes weeks, our model might change the decision of several houses and we must adjust our optimization model accordingly if the cost is worth it.
Simulation





Reviews
There are no reviews yet.