## GLFixedEffectModels.jl

Fast estimation of generalized linear models with high dimensional categorical variables in Julia
Author jmboehm
Popularity
26 Stars
Updated Last
1 Year Ago
Started In
January 2019

# GLFixedEffectModels.jl

This package estimates generalized linear models with high dimensional categorical variables. It builds on Matthieu Gomez's FixedEffects.jl, Amrei Stammann's Alpaca, and Sergio Correia's ppmlhdfe.

## Installation

``````] add GLFixedEffectModels
``````

## Example use

```using GLFixedEffectModels, GLM, Distributions
using RDatasets

df = dataset("datasets", "iris")
df.binary = zeros(Float64, size(df,1))
df[df.SepalLength .> 5.0,:binary] .= 1.0
df.SpeciesStr = string.(df.Species)
idx = rand(1:3,size(df,1),1)
a = ["A","B","C"]
df.Random = vec([a[i] for i in idx])

m = @formula binary ~ SepalWidth + fe(Species)
x = nlreg(df, m, Binomial(), LogitLink(), start = [0.2] )

m = @formula binary ~ SepalWidth + PetalLength + fe(Species)
nlreg(df, m, Binomial(), LogitLink(), Vcov.cluster(:SpeciesStr,:Random) , start = [0.2, 0.2] )```

## Documentation

The main function is `nlreg()`, which returns a `GLFixedEffectModel <: RegressionModel`.

```nlreg(df, formula::FormulaTerm,
distribution::Distribution,
vcov::CovarianceEstimator; ...)```

The required arguments are:

• `df`: a Table
• `formula`: A formula created using `@formula`.
• `distribution`: A `Distribution`. See the documentation of GLM.jl for valid distributions.
• `link`: A `GLM.Link` function. See the documentation of GLM.jl for valid link functions.
• `vcov`: A `CovarianceEstimator` to compute the variance-covariance matrix.

The optional arguments are:

• `save::Union{Bool, Symbol} = false`: Should residuals and eventual estimated fixed effects saved in a dataframe? Use `save = :residuals` to only save residuals. Use `save = :fe` to only save fixed effects.
• `method::Symbol`: A symbol for the method. Default is `:cpu`. Alternatively, `:gpu` requires `CuArrays`. In this case, use the option `double_precision = false` to use `Float32`. This option is the same as for the FixedEffectModels.jl package.
• `double_precision::Bool = true`: Uses 64-bit floats if `true`, otherwise 32-bit.
• `drop_singletons = true` : drop observations that are perfectly classified.
• `contrasts::Dict = Dict()` An optional Dict of contrast codings for each categorical variable in the `formula`. Any unspecified variables will have `DummyCoding`.
• `maxiter::Integer = 1000`: Maximum number of iterations in the Newton-Raphson routine.
• `maxiter_center::Integer = 10000`: Maximum number of iterations for centering procedure.
• `double_precision::Bool`: Should the demeaning operation use Float64 rather than Float32? Default to true.
• `dev_tol::Real` : Tolerance level for the first stopping condition of the maximization routine.
• `rho_tol::Real` : Tolerance level for the stephalving in the maximization routine.
• `step_tol::Real` : Tolerance level that accounts for rounding errors inside the stephalving routine
• `center_tol::Real` : Tolerance level for the stopping condition of the centering algorithm. Default to 1e-8 if `double_precision = true`, 1e-6 otherwise.
• `separation::Vector{Symbol} = Symbol[]` : Method to detect/deal with separation. Supported elements are `:mu`, `:fe`, `:ReLU`, and in the future, `:simplex`. `:mu` truncates mu at `separation_mu_lbound` or `separation_mu_ubound`. `:fe` finds categories of the fixed effects that only exist when y is at the separation point. `ReLU` detects separation using ReLU, with the maxiter being `separation_ReLU_maxiter` and tolerance being `separation_ReLU_tol`.
• `separation_mu_lbound::Real = -Inf` : Lower bound for the separation detection/correction heuristic (on mu). What a reasonable value would be depends on the model that you're trying to fit.
• `separation_mu_ubound::Real = Inf` : Upper bound for the separation detection/correction heuristic.
• `separation_ReLU_tol::Real = 1e-4` : Tolerance level for the ReLU algorithm.
• `separation_ReLU_maxiter::Integer = 1000` : Maximal number of iterations for the ReLU algorithm.
• `verbose::Bool = false` : If `true`, prints output on each iteration.

The function returns a `GLFixedEffectModel` object which supports the `StatsBase.RegressionModel` abstraction. It can be displayed in table form by using RegressionTables.jl.

## Bias correction methods

The package experimentally supports bias correction methods for the following models:

• Binomial regression, Logit link, Two-way, Classic (Fernández-Val and Weidner (2016, 2018))
• Binomial regression, Probit link, Two-way, Classic (Fernández-Val and Weidner (2016, 2018))
• Binomial regression, Logit link, Two-way, Network (Hinz, Stammann and Wanner (2020) & Fernández-Val and Weidner (2016))
• Binomial regression, Probit link, Two-way, Network (Hinz, Stammann and Wanner (2020) & Fernández-Val and Weidner (2016))
• Binomial regression, Logit link, Three-way, Network (Hinz, Stammann and Wanner (2020))
• Binomial regression, Probit link, Three-way, Network (Hinz, Stammann and Wanner (2020))
• Poisson regression, Log link, Three-way, Network (Weidner and Zylkin (2021))
• Poisson regression, Log link, Two-way, Network (Weidner and Zylkin (2021))

## Things that still need to be implemented

• Better default starting values
• Weights
• Better StatsBase interface & prediction
• Better benchmarking

## Related Julia packages

• FixedEffectModels.jl estimates linear models with high dimensional categorical variables (and with or without endogeneous regressors).
• FixedEffects.jl is a package for fast pseudo-demeaning operations using LSMR. Both this package and FixedEffectModels.jl build on this.
• Alpaca.jl is a wrapper to the Alpaca R package, which solves the same tasks as this package.
• GLM.jl estimates generalized linear models, but without explicit support for categorical regressors.
• Econometrics.jl provides routines to estimate multinomial logit and other models.
• RegressionTables.jl supports pretty printing of results from this package.

## References

Correia, S. and Guimarães, P, and Zylkin, T., 2019. Verifying the existence of maximum likelihood estimates for generalized linear models. Working paper, https://arxiv.org/abs/1903.01633

Fernández-Val, I. and Weidner, M., 2016. Individual and time effects in nonlinear panel models with large N, T. Journal of Econometrics, 192(1), pp.291-312.

Fernández-Val, I. and Weidner, M., 2018. Fixed effects estimation of large-T panel data models. Annual Review of Economics, 10, pp.109-138.

Fong, DC. and Saunders, M. (2011) LSMR: An Iterative Algorithm for Sparse Least-Squares Problems. SIAM Journal on Scientific Computing

Hinz, J., Stammann, A. and Wanner, J., 2021. State dependence and unobserved heterogeneity in the extensive margin of trade.

Stammann, A. (2018) Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-way Fixed Effects. Mimeo, Heinrich-Heine University Düsseldorf

Weidner, M. and Zylkin, T., 2021. Bias and consistency in three-way gravity models. Journal of International Economics, 132, p.103513.

### Required Packages

View all packages

### Used By Packages

No packages found.