This is a dimensionality reduction algorithm which has the goal of maintaining interpretability i.e we eliminate variables directly from potential models that don't seem to add any predictive power. This is accomplished by the use of decision trees to approximate a function between two variables. This is a modified version of the *Predictive Power Score* inspired by Florian Wetschoreck's article

We'll start with a set of observations which can be further split into a set of features **things we want to use to predict**) and targets **things we want to predict**). The elements

Decision Trees are universal function approximators which basically means, we can split two dimensional subset of our data into different bins which are chosen based on minimizing a cost function. In this case the boundaries of the bins are chosen so as to minimize the error of the tree model makes when making predicitons. Spliting the data into different bins is constructing a function, but we need to understand how well this function does compared to a more naive model of prediction: taking the median of the target

If we have two different models

We can compare the how well the "smart model" does as compared to the "naive model" by looking at the ratio of $\text{MAE}*{\text{smart}}$ to $\text{MAE}*{\text{naive}}$ which is defined as

as the smart model does better, this ratio becomes smaller and as the smart model starts doing as good or worse than the naive model, this ratio becomes larger. Up to this point, this is pretty much just the predictive power score. If our smart model is doing better than the naive model, then we have at least established that constructing a function between

There are a number of features that would be nice to have to make the process for judging how well a variable does at predicting another

The main thing we can do is to use a gaussian to map

The main advantage of doing this beside bounding our score between

which implies that

Low Score | Intermediate | High Score |
---|---|---|

$$r=\frac{\text{MAE}*\text{smart}}{\text{MAE}*\text{naive}}$$

[1]
Wetschoreck, Florian. (Apr 23, 2020).
*RIP correlation. Introducing the Predictive Power Score.*
https://towardsdatascience.com/rip-correlation-introducing-the-predictive-power-score-3d90808b9598

[2]
Mathonline
*The Simple Function Approximation Theorem.*
http://mathonline.wikidot.com/the-simple-function-approximation-theorem

[3]
kenndanielso Blog
*Universal Function Approximation.*
https://kenndanielso.github.io/mlrefined/blog_posts/12_Nonlinear_intro/12_5_Universal_approximation.html