Simple Julia library to fetch R dataset from CRAN
Alternative to RDatasets.jl which simply grabs datasets directly from CRAN.

The basic usage is similar to RDatasets.jl. You can install it as follows:


After installing the RDataGet package, you can then load data sets using the dataset() function, which takes the name of a package and a data set as arguments:

using RDataGet
harman_political = dataset("psych", "Harman.political")
neuro = dataset("boot", "neuro")


This package currently just downloads source packages from CRAN and loads its dataset into memory in Julia. It does not depend on R itself.

The package has a few limitation, some of which are caused by this design, while others could be addressed in future:

  • Does not support built-in R datasets, including the datasets package, only ones which can be downloaded from CRAN
  • Can only load rda/RData/csv.gz files in the data directory
    • As such it does not support packages which generate their data using a build script
  • Cannot get any descriptions or further documentation related to the datasets from Julia (maybe TODO but needs .Rd parsing)
  • Only supports getting the latest version of each package (TODO)
  • Fixed, very-limited caching strategy
    • The package index is re-downloaded every time we need to download any package (so as to find the latest version number) (TODO: should be by-default cached per session + longer caching allowed)
    • Packages are downloaded exactly once per session, after which the same data is reused until Julia is restarted (TODO: should be customisable for longer caching)