# Bayesian Linear Regression in Julia

This is a simple package that does one thing, Bayesian Linear Regression, in around 100 lines of code.

It *is* actively maintained, but it might appear inactive as it's one of those packages which requires very little maintenance because it's very simple.

## Intended Use and Functionality

The interface sits at roughly the same level as that of Distributions.jl. This means that while you won't find a scikit-learn-style `fit`

function, you will find all of the primitives that you need to construct such a function to suit your particular problem. In particular, one can:

- Construct a
`BayesianLinearRegressor`

(BLR) object by providing a mean-vector and precision matrix for the weights of said regressor. This object represents a distribution over (linear) functions. - Use a
`BayesianLinearRegressor`

as an`AbstractGP`

, as it implements the primary AbstractGP API. - Think of an instance of
`BayesianLinearRegressor`

as a very restricted GP, where the time complexity of inference scales linearly in the number of observations`N`

. - Draw function samples from a
`BayesianLinearRegressor`

using`rand`

. - Construct a
`BasisFunctionRegressor`

object which is a thin wrapper around a`BayesianLinearRegressor`

to allow a non-linear feature mapping`ϕ`

to act on the input.

## Conventions

`BayesianLinearRegressors`

is consistent with `AbstractGPs`

.
Consequently, a `BayesianLinearRegressor`

in `D`

dimensions can work with the following input types:

`ColVecs`

-- a wrapper around an`D x N`

matrix of`Real`

s saying that each column should be interpreted as an input.`RowVecs`

s -- a wrapper around an`N x D`

matrix of`Real`

s, saying that each row should be interpreted as an input.`Matrix{<:Real}`

-- must be`D x N`

. Prefer using`ColVecs`

or`RowVecs`

for the sake of being explicit.

Consult the `Design`

section of the KernelFunctions.jl docs for more info on these conventions.

Outputs for a BayesianLinearRegressor should be an `AbstractVector{<:Real}`

of length `N`

.

## Example Usage

```
# Install the packages if you don't already have them installed
] add AbstractGPs BayesianLinearRegressors LinearAlgebra Random Plots Zygote
using AbstractGPs, BayesianLinearRegressors, LinearAlgebra, Random, Plots, Zygote
# Fix seed for re-producibility.
rng = MersenneTwister(123456)
# Construct a BayesianLinearRegressor prior over linear functions of `X`.
mw, Λw = zeros(2), Diagonal(ones(2))
f = BayesianLinearRegressor(mw, Λw)
# Index into the regressor and assume heteroscedastic observation noise `Σ_noise`.
N = 10
X = ColVecs(hcat(range(-5.0, 5.0, length=N), ones(N))')
Σ_noise = Diagonal(exp.(randn(N)))
fX = f(X, Σ_noise)
# Generate some toy data by sampling from the prior.
y = rand(rng, fX)
# Compute the adjoint of `rand` w.r.t. everything given random sensitivities of y′.
_, back_rand = Zygote.pullback(
(X, Σ_noise, mw, Λw)->rand(rng, BayesianLinearRegressor(mw, Λw)(X, Σ_noise), 5),
X, Σ_noise, mw, Λw,
)
back_rand(randn(N, 5))
# Compute the `logpdf`. Read as `the log probability of observing `y` at `X` under `f`, and
# Gaussian observation noise with zero-mean and covariance `Σ_noise`.
logpdf(fX, y)
# Compute the gradient of the `logpdf` w.r.t. everything.
Zygote.gradient(
(X, Σ_noise, y, mw, Λw)->logpdf(BayesianLinearRegressor(mw, Λw)(X, Σ_noise), y),
X, Σ_noise, y, mw, Λw,
)
# Perform posterior inference. Note that `f′` has the same type as `f`.
f′ = posterior(fX, y)
# Compute `logpdf` of the observations under the posterior predictive.
logpdf(f′(X, Σ_noise), y)
# Sample from the posterior predictive distribution.
N_plt = 1000
X_plt = ColVecs(hcat(range(-6.0, 6.0, length=N_plt), ones(N_plt))')
# Compute some posterior marginal statisics.
normals = marginals(f′(X_plt, eps()))
m′X_plt = mean.(normals)
σ′X_plt = std.(normals)
# Plot the posterior. This uses the default AbstractGPs plotting recipes.
posterior_plot = plot();
plot!(posterior_plot, X_plt.X[1, :], f′(X_plt, eps()); color=:blue, ribbon_scale=3);
sampleplot!(posterior_plot, X_plt.X[1, :], f′(X_plt, eps()); color=:blue, samples=10);
scatter!(posterior_plot, X.X[1, :], y; # Observations.
markercolor=:red,
markershape=:circle,
markerstrokewidth=0.0,
markersize=4,
markeralpha=0.7,
label="",
);
display(posterior_plot);
```

## Basis Function Regression

Any instance of a `BayesianLinearRegressor`

can be replaced by a `BasisFunctionRegressor`

(BFR). A `BasisFunctionRegressor`

is a thin wrapper around a `BayesianLinearRegressor`

, but includes a potentially non-linear feature mapping `ϕ`

which is applied to the input before it is passed to the underlying BLR. It is essentially defined as `bfr(X) = blr(ϕ(X))`

.

```
using AbstractGPs, BayesianLinearRegressors, LinearAlgebra
X = RowVecs(hcat(range(-1.0, 1.0, length=5)))
blr = BayesianLinearRegressor(zeros(2), Diagonal(ones(2)))
# N.B. ϕ must accept one of the allowed input types and
# must return the same type (in this case RowVecs)
ϕ(x::RowVecs) = RowVecs(hcat(ones(length(x)), prod.(x)))
bfr = BasisFunctionRegressor(blr, ϕ)
# These are equivalent
var(bfr(X)) == var(blr(ϕ(X)))
```

## Drawing Function Samples

There are two ways of drawing samples from `f::Union{BayesianLinearRegressor,BasisFunctionRegressor}`

. The first is by using the `AbstractGPs`

API (as in the above examples) where a `FiniteBLR`

projection is first produced at a fixed set of input locations `X`

with some specified observation noise `Σy`

: `fx = f(X, Σy)::FiniteBLR`

. Then, observations (including noise) at `X`

can be sampled directly with `rand(rng, fx)`

.
The other way is to draw a sample of an entire function from `f`

using `g = rand(rng, f)`

. This samples a value for the weights `w ~ N(mw, Λw)`

and produces a function `g(X) = w'X`

which can be evaluated at any input locations. Note that this method corresponds to drawing samples of noiseless observations.

```
using AbstractGPs, BayesianLinearRegressors, LinearAlgebra, Random, Plots
# Fix seed for re-producibility.
rng = MersenneTwister(123456)
X = RowVecs(hcat(range(-1.0, 1.0, length=5)))
f = BayesianLinearRegressor(zeros(2), Diagonal(ones(2)))
## The first method of drawing samples - using the AbstractGPs API:
# Index into the regressor at fixed inputs X and assume homoscedastic observation noise `Σ_noise`.
N = 10
X = ColVecs(hcat(range(-5.0, 5.0, length=N), ones(N))')
Σ_noise = 0.1
fX = f(X, Σ_noise)
rand(rng, fX)
## The second method - sampling an entire function:
g = rand(rng, f)
# This sample can now be evaulated at any input locations
g(X)
X′ = ColVecs(hcat(range(10.0, 15.0, length=N), ones(N))')
g(X′)
# Sample multiple functions for plotting
gs = rand(rng, f, 100)
N_plt = 1000
X_plt = ColVecs(hcat(range(-6.0, 6.0, length=N_plt), ones(N_plt))')
y_plt = reduce(hcat, [g(X_plt) for g in gs])
sample_plot = plot();
plot!(sample_plot, X_plt.X[1,:], y_plt;
label="",
color=:black,
linealpha=0.4,
);
display(sample_plot)
```

## Up For Grabs

- Scikit-learn style interface: it wouldn't be too hard to implement a scikit-learn - style interface to handle basic regression tasks, so please feel free to make a PR that implements this.
- Monte Carlo VI (MCVI): i.e. variational inference using the reparametrisation trick. This could be very useful when working with large data sets and applying big non-linear transformations, such as neural networks, to the inputs as it would enable mini-batching. I would envisage at least supporting both a dense approximate posterior covariance and diagonal (i.e. mean-field), where the former is for small-moderate dimensionalities and the latter for very high-dimensional problems.

## Bugs, Issues, and PRs

Please do report any bugs you find by raising an issue. Please also feel free to raise PRs, especially if for one of the above `Up For Grabs`

items. Raise an issue to discuss the extension in detail before opening a PR if you prefer, though.

## Related Work

BayesianLinearRegression.jl is closely related, but appears to be a WIP and hasn't been touched in around a year or so (as of 27-03-2019).