ReadWriteDlm2 functions readdlm2(), writedlm2(), readcsv2() and writecsv2() are similar to those of stdlib.DelimitedFiles, but with additional support for Dates formats, Complex, Rational, Missing types and special decimal marks. ReadWriteDlm2 supports the Tables.jl interface.
-
For "decimal dot" users the functions
readcsv2()andwritecsv2()have the respective defaults: Delimiter is','(fixed) anddecimal='.'. -
The basic idea of
readdlm2()andwritedlm2()is to support the decimal comma countries. These functions use';'as default delimiter and','as default decimal mark. "Decimal dot" users of these functions need to definedecimal='.'. -
Alternative package:
CSV(supports also special decimal marks)
This package is registered and can be installed within the Pkg REPL-mode: Type ] in the REPL and then:
pkg> add ReadWriteDlm2
Basic Example(-> more): How To Use ReadWriteDlm2
julia> using ReadWriteDlm2, Dates # activate modules ReadWriteDlm2, Dates
julia> a = ["text" 1.2; Date(2017,1,1) 1]; # create array with: String, Date, Float64 and Int eltype
julia> writedlm2("test.csv", a) # test.csv(decimal comma): "text;1,2\n2017-01-01;1\n"
julia> readdlm2("test.csv") # read `CSV` data: All four eltypes are parsed correctly!
2×2 Array{Any,2}:
"text" 1.2
2017-01-01 1
julia> using DataFrames # Tables interface: auto Types for DataFrame columns
julia> DataFrame(readdlm2("test.csv", tables=true))
2×2 DataFrame
│ Row │ Column1 │ Column2 │
│ │ Any │ Real │
├─────┼────────────┼─────────┤
│ 1 │ text │ 1.2 │
│ 2 │ 2017-01-01 │ 1 │
Read a matrix from source. The source can be a text file, stream or byte array.
Each line, separated by eol (default is '\n'), gives one row.
The columns are separated by ';', another delim can be defined.
readdlm2(source; options...)
readdlm2(source, T::Type; options...)
readdlm2(source, delim::Char; options...)
readdlm2(source, delim::Char, T::Type; options...)
readdlm2(source, delim::Char, eol::Char; options...)
readdlm2(source, delim::Char, T::Type, eol::Char; options...)
Pre-processing of source with regex substitution changes the decimal marks
from d,d to d.d. For default rs the keyword argument decimal=',' sets
the decimal Char in the r-string of rs. When a special regex substitution
tuple rs=(r.., s..) is defined, the argument decimal is not used (
-> Example). Pre-processing
can be switched off with: rs=().
In addition to stdlib readdlm(), data is also parsed for Dates formats (ISO),
theTime format HH:MM[:SS[.s{1,9}]] and for complex and rational numbers.
To deactivate parsing dates/time set: dfs="", dtfs="".
locale defines the language of day (E, e) and month (U, u) names.
The result will be a (heterogeneous) array of default element type Any. If
header=true it will be a tuple containing the data array and a vector for
the columnnames. Other (abstract) types for the data array elements could be
defined. If data is empty, a 0×0 Array{T,2} is returned.
With tables=true[, header=true] option[s] a Tables interface compatible
MatrixTable with individual column types is returned, which for example can
be used as argument for DataFrame().
decimal=',': Decimal mark Char used by defaultrs, irrelevant ifrs-tuple is not the default oners=(r"(\d),(\d)", s"\1.\2"): Regex (r,s)-tuple, the default change d,d to d.d ifdecimal=','dtfs="yyyy-mm-ddTHH:MM:SS.s": Format string for DateTime parsingdfs="yyyy-mm-dd": Format string for Date parsinglocale="english": Language for parsing dates names, default is englishtables=false: ReturnTablesinterface compatible MatrixTable iftruedfheader=false:dfheader=trueis shortform fortables=true, header=truemissingstring="na": How missing values are represented, default is"na"
readcsv2(source, T::Type=Any; opts...)
Equivalent to readdlm2() with delimiter ',' and decimal='.'.
More information about Base functionality and (keyword) arguments - which are also
supported by readdlm2() and readcsv2() - is available in the documentation for readdlm().
| Module | Function | Delimiter | Dec. Mark | Element Type | Ext. Parsing |
|---|---|---|---|---|---|
| DelimitedFiles | readdlm() |
' ' |
'.' |
Float64/Any | No (String) |
| ReadWriteDlm2 | readdlm2() |
';' |
',' |
Any | Yes |
| ReadWriteDlm2 | readcsv2() |
',' |
'.' |
Any | Yes |
| ReadWriteDlm2 | readdlm2(opt:tables=true) |
';' |
',' |
Column spec. | Yes, + col T |
| ReadWriteDlm2 | readcsv2(opt:tables=true) |
',' |
'.' |
Column spec. | Yes, + col T |
Write A (a vector, matrix, or an iterable collection of iterable rows, a
Tables source) as text to f (either a filename or an IO stream). The columns
are separated by ';', another delim (Char or String) can be defined.
writedlm2(f, A; options...)
writedlm2(f, A, delim; options...)
By default, a pre-processing of values takes place. Before writing as strings,
decimal marks are changed from '.' to ','.
With a keyword argument another decimal mark can be defined.
To switch off this pre-processing set: decimal='.'.
In writedlm2() the output format for Date and DateTime data can be
defined with format strings. Defaults are the ISO formats. Day (E, e) and
month (U, u) names are written in the locale language. For writing
Complex numbers the imaginary component suffix can be selected with the
imsuffix= keyword argument.
decimal=',': Character for writing decimal marks, default is a commadtfs="yyyy-mm-ddTHH:MM:SS.s": Format string, DateTime write formatdfs="yyyy-mm-dd": Format string, Date write formatlocale="english": Language for writing date names, default is englishimsuffix="im": Complex - imaginary component suffix"im"(=default),"i"or"j"missingstring="na": How missing values are written, default is"na"
writecsv2(f, A; opts...)
Equivalent to writedlm2() with fixed delimiter ',' and decimal='.'.
| Module | Function | Delimiter | Decimal Mark |
|---|---|---|---|
| DelimitedFiles | writedlm() |
'\t' |
'.' |
| ReadWriteDlm2 | writedlm2() |
';' |
',' |
| ReadWriteDlm2 | writecsv2() |
',' |
'.' |
julia> using ReadWriteDlm2
julia> a = Any[1 complex(1.5,2.7);1.0 1//3]; # create array with: Int, Complex, Float64 and Rational type
julia> writecsv2("test.csv", a) # test.csv(decimal dot): "1,1.5+2.7im\n1.0,1//3\n"
julia> readcsv2("test.csv") # read CSV data: All four types are parsed correctly!
2×2 Array{Any,2}:
1 1.5+2.7im
1.0 1//3
julia> using ReadWriteDlm2
julia> a = Float64[1.1 1.2;2.1 2.2]
2×2 Array{Float64,2}:
1.1 1.2
2.1 2.2
julia> writedlm2("test.csv", a; decimal='€') # '€' is decimal Char in 'test.csv'
julia> readdlm2("test.csv", Float64; decimal='€') # a) standard: use keyword argument
2×2 Array{Float64,2}:
1.1 1.2
2.1 2.2
julia> readdlm2("test.csv", Float64; rs=(r"(\d)€(\d)", s"\1.\2")) # b) more flexible: rs-Regex-Tupel
2×2 Array{Float64,2}:
1.1 1.2
2.1 2.2
julia> using ReadWriteDlm2
julia> a = Union{Missing, Float64}[1.1 0/0;missing 2.2;1/0 -1/0]
3×2 Array{Union{Missing, Float64},2}:
1.1 NaN
missing 2.2
Inf -Inf
julia> writedlm2("test.csv", a; missingstring="???") # use "???" for missing data
julia> read("test.csv", String)
"1,1;NaN\n???;2,2\nInf;-Inf\n"
julia> readdlm2("test.csv", Union{Missing, Float64}; missingstring="???")
3×2 Array{Union{Missing, Float64},2}:
1.1 NaN
missing 2.2
Inf -Inf
julia> using ReadWriteDlm2, Dates
julia> Dates.LOCALES["french"] = Dates.DateLocale(
["janvier", "février", "mars", "avril", "mai", "juin",
"juillet", "août", "septembre", "octobre", "novembre", "décembre"],
["janv", "févr", "mars", "avril", "mai", "juin",
"juil", "août", "sept", "oct", "nov", "déc"],
["lundi", "mardi", "mercredi", "jeudi", "vendredi", "samedi", "dimanche"],
["lu", "ma", "me", "je", "ve", "sa", "di"],
);
julia> a = hcat([Date(2017,1,1), DateTime(2017,1,1,5,59,1,898), 1, 1.0, "text"])
5x1 Array{Any,2}:
2017-01-01
2017-01-01T05:59:01.898
1
1.0
"text"
julia> writedlm2("test.csv", a; dfs="E, d.U yyyy", dtfs="e, d.u yyyy H:M:S,s", locale="french")
julia> read("test.csv", String) # to see what have been written in "test.csv" file
"dimanche, 1.janvier 2017\ndi, 1.janv 2017 5:59:1,898\n1\n1,0\ntext\n"
julia> readdlm2("test.csv"; dfs="E, d.U yyyy", dtfs="e, d.u yyyy H:M:S,s", locale="french")
5×1 Array{Any,2}:
2017-01-01
2017-01-01T05:59:01.898
1
1.0
"text"
See -> DataFrames for installation and more information.
julia> using ReadWriteDlm2, Dates, DataFrames, Statistics
julia> df = DataFrame( # Create DataFrame `df`
date = [Date(2017,1,1), Date(2017,1,2), nothing],
value_1 = [1.4, 1.8, missing],
value_2 = [2, 3, 4]
)
3×3 DataFrame
│ Row │ date │ value_1 │ value_2 │
│ │ Union… │ Float64⍰ │ Int64 │
├─────┼────────────┼──────────┼─────────┤
│ 1 │ 2017-01-01 │ 1.4 │ 2 │
│ 2 │ 2017-01-02 │ 1.8 │ 3 │
│ 3 │ │ missing │ 4 │
julia> writedlm2("testdf_com.csv", df) # decimal comma: write DataFrame df
julia> read("testdf_com.csv", String) # check csv data
"date;value_1;value_2\n2017-01-01;1,4;2\n2017-01-02;1,8;3\nnothing;na;4\n"
julia> df2 = DataFrame(readdlm2("testdf_com.csv", header=true, tables=true))
3×3 DataFrame
│ Row │ date │ value_1 │ value_2 │
│ │ Union… │ Float64⍰ │ Int64 │
├─────┼────────────┼──────────┼─────────┤
│ 1 │ 2017-01-01 │ 1.4 │ 2 │
│ 2 │ 2017-01-02 │ 1.8 │ 3 │
│ 3 │ │ missing │ 4 │
julia> mean(skipmissing(df2[!, :value_1]))
1.6
julia> mean(df2[!, :value_2])
3.0