A package providing composite models wrapping class imbalance algorithms from Imbalance.jl with classifiers from MLJ.
import Pkg;
Pkg.add("MLJBalancing")This package allows chaining of resampling methods from Imbalance.jl with classification models from MLJ. Simply construct a BalancedModel object while specifying the model (classifier) and an arbitrary number of resamplers (also called balancers - typically oversamplers and/or undersamplers).
SMOTENC = @load SMOTENC pkg=Imbalance verbosity=0
TomekUndersampler = @load TomekUndersampler pkg=Imbalance verbosity=0
LogisticClassifier = @load LogisticClassifier pkg=MLJLinearModels verbosity=0
oversampler = SMOTENC(k=5, ratios=1.0, rng=42)
undersampler = TomekUndersampler(min_ratios=0.5, rng=42)
logistic_model = LogisticClassifier()balanced_model = BalancedModel(model=logistic_model, balancer1=oversampler, balancer2=undersampler)Here training data will be passed to balancer1 then balancer2, whose output is used to train the classifier model.  When balanced_model is used for prediction, the resamplers balancer1 and blancer2 are bypassed.
In general, any number of balancers can be passed to the function, and the user can give the balancers arbitrary names while passing them.
You can fit, predict, cross-validate and hyperparamter tune it like any other MLJ model. Here is an example for hyperparameter tuning:
r1 = range(balanced_model, :(balancer1.k), lower=3, upper=10)
r2 = range(balanced_model, :(balancer2.min_ratios), lower=0.1, upper=0.9)
tuned_balanced_model = TunedModel(
    model=balanced_model,
    tuning=Grid(goal=4),
    resampling=CV(nfolds=4),
    range=[r1, r2],
    measure=cross_entropy
);
mach = machine(tuned_balanced_model, X, y);
fit!(mach, verbosity=0);
fitted_params(mach).best_modelThe package also offers an implementation of bagging over probabilistic classifiers where the majority class is repeatedly undersampled T times down to the size of the minority class. This undersampling scheme was proposed in the EasyEnsemble algorithm found in the paper Exploratory Undersampling for Class-Imbalance Learning. by Xu-Ying Liu, Jianxin Wu, & Zhi-Hua Zhou where an Adaboost model was used and the output scores were averaged.
In this you must specify some probabilistic model, and optionally specify the number of bags T and the random number generator rng. If T is not specified it is set as the ratio between the majority and minority counts. If rng isn't specified then default_rng() is used.
LogisticClassifier = @load LogisticClassifier pkg=MLJLinearModels verbosity=0
logistic_model = LogisticClassifier()
bagging_model = BalancedBaggingClassifier(model=logistic_model, T=10, rng=Random.Xoshiro(42))You can fit, predict, cross-validate and hyperparameter-tune it like any other probabilistic MLJ model where X must be a table input (e.g., a dataframe).
mach = machine(bagging_model, X, y)
fit!(mach)
pred = predict(mach, X)