CSV Files
CSV files is one of the most common ways to exchange data. Loading and writing CSV files is therefore quite customizable in GreyCat to support the various possible combinations.
CsvReader
CSV files can be processed (read) using the CsvReader
of GreyCat. The CsvReader constructor takes the path to the file, along with a CsvFormat
specifying how the content is formatted.
Reading a CSV file with common values for the various delimiters is as simple as:
fn run() {
var format = CsvFormat {}; // Default format, comma separated value
// Custom csv format
format = CsvFormat {
separator: ';',
string_delimiter: '\'',
decimal_separator: ',',
};
var reader = CsvReader{ path: "./data.csv", format: format };
while(reader.can_read()) {
// Do read
}
}
You can also store the position and restart from where you left of
var prevPos: node<int?>;
fn run(){
var reader = CsvReader { path: "./data.csv", pos: *prevPos };
while(reader.can_read()){
// Do stuff
}
prevPos.set(reader.pos);
}
The CsvReader
offers several utilities:
- available(): int provides the remaining bytes to read
- can_read(): bool if there is content available to be read (to be used as a condition for a while loop)
- lastLine(): String? returns the last line read, as a String
- read(): any? | T returns the line as an object if type specified or an Array
- pos: int provides the current position of the reader, within the file, from its beginning.
CsvReader
offers two means to read a file:
Reading without a type
Allows to read one line. The line is parsed according to the CsvFormat
specified, and each cell is parsed according to the column format specified, if any, or inferred or string if the inference is deactivated.
With this method, typing is weak, because the method returns an Array<any?>. It is also a bit costly in memory because an array is allocated for each line. Finally, it will require moving all indexes around if the format evolves.
Reading using types
The CsvReader
can also be typed and directly read lines as the declared type.
The template object fields (attributes) are filled in the order, from top to bottom, with the cells of the line, from left to right. The object in the example is reused, saving memory, and increasing speed and the fields are directly typed.
@volatile
type Entry {
id: int;
name: String;
values: Array<int>;
}
fn main() {
var reader = CsvReader<Entry> {
path: "files/entries.csv",
format: CsvFormat {
header_lines: 1,
},
};
while (reader.can_read()) {
var entry = reader.read();
println(entry); // Entry { id: 0, name: "aaa", values: [1, 2, 3] }
}
}
Which would work with a csv file like the following:
id,name,value_0,value_1,value_2
0,aaa,1,2,3
1,bbb,4,5,6
2,ccc,7,8,9
Note that if the custom type includes an Array
attribute, the array will be consumed greedily.
Hence, there should only be 1 Array
and it must target the trail columns of a csv definition.
Notice the volatile @volatile
pragma on top the type, this i used to facilitate upgrades when changing the underlying type attributes and types for more information Volatile
Special types
geo
type Record {
position: geo;
}
- consumes two columns
- order is important, consumes a latitude first (
float
) and then the longitude (float
)
Note that the following would also work:
type Record {
lat: float;
lng: float;
}
As well as:
type Record {
position: Tuple<float, float>;
}
null
You can also define a null type, in which case the column will be skipped, leverage this to speed up read speed.
@volatile
type Entry {
id: int;
name: null; // will be skipped
values: Array<int>;
}
T
“nesting”
type Record {
column_0: int;
child: RecordChild; // nested parsing from the flat columns
column_3: float;
}
type RecordChild {
name: String; // column 1
value: int; // column 2
}
- nesting types will work as expected, consuming the columns to produce the intermediate instances:
0,a,1000,0.1
Record { // will yield this instance
column_0: 0,
child: RecordChild {
name: "a",
value: 1000,
},
column_3: 0.1,
}
Enum
foo
by_value
baz
Enums are matched by field key first, then the associated value.
type Record {
value: MyEnum;
}
enum MyEnum {
foo;
bar("by_value");
baz;
}
fn main() {
var reader = CsvReader<Record> { /*...*/ };
println(reader.read()); // Record { value: MyEnum::foo }
println(reader.read()); // Record { value: MyEnum::bar }
println(reader.read()); // Record { value: MyEnum::baz }
}
time
The @format
annotation can be used on time
fields to fine-tune the behavior of the parser
type Record {
@format("%d/%m/%y %H:%M") // only accept strings like: "01/11/25 15:42"
date: time;
}
It accepts the following signatures, where the dateformat is the GNU libc standard
// interprets the time (String) respecting the given dateformat
@format("%d/%m/%y %H:%M")
// interprets the time (String) respecting the given dateformat in the given timezone
@format("%d/%m/%y %H:%M", TimeZone::"Europe/Luxembourg")
// interprets the time (String) in the given timezone
@format(TimeZone::"Europe/Luxembourg")
// interprets the time (int) as a UNIX epoch in milliseconds
@format(DurationUnit::milliseconds)
duration
type Record {
@format(DurationUnit::hours) // interprets the parsed int as hours
elapsed: duration;
}
The @format
annotation can be used on duration
fields to fine-tune the behavior of the parser
It accepts the following signature:
// interprets the duration (int) as seconds
@format(DurationUnit::seconds)
Other
type Record {
a: Tuple<int, String>; // consumes 2 columns, an `int` and a `String`
b: bool; // TRUEISH: "true", "1", "yes", "y", "t" (ignores case)
// FALSEISH: "false", "0", "no", "n", "f" (ignores case)
d: t2; // consumes 2 `int` columns, `t2f` consumes 2 `float`
e: t3; // consumes 3 `int` columns, `t3f` consumes 3 `float`
f: t4; // consumes 4 `int` columns, `t4f` consumes 4 `float`
}
CSVFormat
Reading and writing CSV files rely on the format of the file to read or write. The CsvFormat
object makes it possible to describe the internal format of the file through various attributes.
Attribute | Type | Description |
---|---|---|
header_lines | int? |
Allows to specify how many of the top lines of the file have to be considered header lines, and therefore ignored when reading the content |
separator | char? |
Specifies the character used to separate the fields/columns within a line. Usually , (default) or ; |
decimal_separator | char? |
Specifies the character used to integer and the decimal parts of numbers (defaults to . ) |
thousands_separator | char? |
Defined the character used to separate thousands in big numbers, if any |
string_delimiter | char? |
Defines the characters used to delimit strings. This allows to ignore the separators that may appear in the strings |
format | String? |
The format to parse date in, defaults to ISO8601/epoch timestamp in milliseconds |
tz | TimeZone? |
The timezone to interpret times in, defaults to the host global timezone |
var format_a = CsvFormat {
header_lines: 2, //2 first lines are headers to be ignored
separator: ',',
decimal_separator: '.',
}
var format_b = CsvFormat {
separator: ';',
string_delimiter: '"',
decimal_separator: ',',
thousands_separator: '_';
}
CsvWriter
The CsvWriter
works quite similarly to the CsvReader
, expecting the path of the file you want to write (or append to), and a definition of the internal format of the CSV file you want to produce. You can then call the write(data: any?) function to push data to the file.
In the following example, we write a CSV file which fields are separated with ‘,’ and strings delimited with ‘"’. We also specify that the third column will be of type time, and the time must be serialized in milliseconds.
fn run() {
var format = CsvFormat {
separator: ';',
string_delimiter: '"', // optional, default value
decimal_separator: ',',
};
var writer = CsvWriter {path: "./data/myFile.csv", format: format };
if(writer != null) {
writer.write(["John", "Doe", time::now(), 56]);
writer.write(["Jane", "Doe", time::now(), 34]);
}
}
The string_delimiter
attribute of CsvFormat
works as follows:
- when set, all strings will be enclosed with the separator,
- when not set, strings will not be enclosed, unless the field to be written requires it.
It is required to conform to parsing rules, such as when a column separator is part of the field (if set withseparator
).
In this case, the default"
character is used.