Loading data into a model

At this point, we assume you followed the guide to have GreyCat running on your system (here).

The goal

Our goal for this section, is to load data from a JSON file into a typed graph structure. We will start with opening the file, read its content, create the typed graph structure (model) to organize the data, then load.

The data

The data at hand have been collected from an open data service by JCDecaux, providing the live state of their bike stations worldwide (for more details see here). The live data have been polled every minute, then compacted in daily JSON files per station.

Bruxelles-20180301.json contains the information of all stations in Bruxelles, Belgium, throughout the day of March 1st, 2018. This is what we will load.

Reading the file

Let’s have a peek in the file. Here are the first lines.

It is a succession of JSON arrays, each containing a single JSON object. This object represents a station with its static information.
Each station has an attribute records that contain the dynamic data collected during the day for this station, including the number of bikes stands, number of available bikes, and number of available stands.
You can note that the total of stands does not equal the number of available bikes + available stands, since there might be some stands out of order.

Let’s open the file in GreyCat. Open your favorite editor (we recommend VSCode) and direct it to an empty folder for your project. Create a project.gcl file. Create a data folder and place the JSON file of Bruxelles in the data folder. You should end up with a structure as presented below.

To manipulate files, we will need to use the io package of GreyCat. Then in a main function, we open the file using the JsonReader, verify its existence and size, then close the file. The full project.gcl file should look like this.

use io;

fn main() {
    var reader = JsonReader::new("data/Bruxelles-20180301.json");
    if(reader != null) {
        println("File opened. Size is ${reader.available()} chars.");
        reader = null;
    } else {
        println("Could not read the file.");
    }
}

Now, open a Terminal, cd into your project folder, and go for a greycat run. It should display something like

% greycat run
File opened. Size is 6896197 chars.
% 

Working with JSON

Now, let dive into the JSON file and loop to read all the stations. We have seen stations are stored as objects in a succession of arrays in the file. To be sure we read properly, we will display the name of the station for each of them.
Replace the print in the previous code snippet by this loop. You should see the names of the stations displayed in your Terminal.

//While the reader is not empty
while(reader.available() > 0) {

    //Read the content as a JSON Array
    var jsonArray = reader.read() as Array;

    //Loop over all the elements in the array
    for(positionInArray, stationObject in  jsonArray) {
        //Retieve the name
        var stationName = stationObject.get("name") as String;
        //Print the content
        println("${stationName}");
    }
}

It is time to model

It is time to create a structure in which we will store the stations. In GreyCat, the collection of type that you will create for your project is referred to as the Model. It is like a database schema, with the explicit difference that you can also have functions on the GreyCat types!
By convention, we create a model folder that will host your types. Inside, create a file station.gcl in which we will define what a Station is for our case. The content of the file model/station.gcl should look something like this.

type Station {
    name: String;
    number: int;
    address: String;
    position: geo;
    last_update: time;
    bikes_stands: nodeTime<int>;
    available_bikes: nodeTime<int>;
    available_stands: nodeTime<int>;
    status: nodeTime<StationStatus>;
}

Here we declare a type Station having a name of type String, a number of type int, an address of type String, a geographical position of type geo, a last_update to record the time (of last update received from the station).
All these information are mostly static and we are not interested to track their evolution in time (time series). This is however the case for the following attributes. The number of bikes_stands of the station is stored in a nodeTime to track the evolution of the number of total stands on the station, as a primitive integer, over time. Same goes for available_bikes and available_stands.

Last, the status of the station can be open or close which we choose to model as a StationStatus enum. We will also create a companion utility type to help the parsing. Here goes the code for StationStatus that we implement in the same file, since it is closely linked to the Stations.

enum StationStatus {
    OPEN; CLOSE;
}

abstract type StationStatusUtil {
    static fn parse(val: String): StationStatus {
        if(val == "OPEN") {
            return StationStatus::OPEN;
        } else {
            return StationStatus::CLOSE;
        }
    }
}

The enum defined only two possible states OPEN or CLOSE. The StationStatusUtil abstract type cannot be instantiated, but provides a utility function to parse the String from the JSON file into a StationStatus enum value.

From JSON Object to Model instance

All is in place now to transform the JSON objects we get from the file, into instances of our model.
First of all, we need to indicate that the model folder contains .gcl files that are part of our project. This is done with a @include directive.
We also need to declare that we want to use the types on contained in the station.gcl file. This is achieved by adding a use station instruction.
The header of our project.gcl file at the root of our project should look like follows:

@include("model");

use io;
use station;

fn main() {
...

We can now use and instantiate Stations.
In our main function, where we lastly printed the names of the stations, we will now create instances of Stations and fill the fields with the data from JSON. Here it goes:

var stationName = stationObject.get("name") as String;
var stationNumber = stationObject.get("number") as int;
var position = stationObject.get("position");
var station = Station{
    number: stationNumber,
    name: stationObject.get("name") as String,
    address: stationObject.get("address") as String,
    position: geo::new(position.get("lat") as float, position.get("lng") as float),
    last_update: time::new(0, DurationUnit::milliseconds),
    bikes_stands: nodeTime<int>::new(),
    available_bikes: nodeTime<int>::new(),
    available_stands: nodeTime<int>::new(),
    status: nodeTime<StationStatus>::new(),
};
println("Processed station: ${station.name}");

We first collect the station name, number and position because we will later reuse these elements, and getting the values from JSON again is costly.
We then create an instance of Station, and initialize all fields. Data from JSON are used when available. last_update is initialized at time 0 (January 1st 1900) since it will be updated later each time we get data.
The time series of primitive int and StationStatus are also initialized with nodeTime::new<T>().
We finally keep a print to see what is happening, but we pick the value from the instantiated object rather than from JSON.

At this stage, a greycat run should display this in your console:

% greycat run
Processed station: 001 - LEOPOLD II
Processed station: 002 - ELOY/ELOY
Processed station: 003 - PORTE DE FLANDRE / VLAAMSEPOORT
Processed station: 004 - JARDIN AUX FLEURS / BLOEMENHOF
Processed station: 005 - BOURSE / BEURS
...
%

Insert dynamic data

It is now time to loop onto the records fields of our JSON stations, that provide the dynamic status of the station. Each object in this array represents an update sent by the station on the status of available bikes or stands.
Therefore, just after the creation of the station, we will iterate on the array of records and fill the time series of the current station with the information. We insert the following code after the creation of the station, but before the print.

var stationRecords = stationObject.get("records") as Array;
for( _, record in stationRecords) {
    var lastUpdate = time::new(record.get("last_update") as int, DurationUnit::milliseconds);
    var status = StationStatusUtil::parse(record.get("status") as String);
    station.bikes_stands.setAt(lastUpdate, record.get("bike_stands") as int);
    station.available_bikes.setAt(lastUpdate, record.get("available_bikes") as int);
    station.available_stands.setAt(lastUpdate, record.get("available_bike_stands") as int);
    station.status.setAt(lastUpdate, status);
    station.last_update = lastUpdate;
}

Here, we first parse and transform some data infor GreyCat types. It is the case for the last_update to transform from milliseconds to time, and the status to parse from the String representation.
We then insert, in each time series, the value collected from JSON, at the last_update time.

You can now run. This main script will read through the file, create the Stations and fill their static data, then fill the time series of each station for the day.

Graph entry points

So, the script we built so far loads the JSON data into model instances for the entire file that represents a day. But when we have to process the next day, we should not create stations again and rather get them from the graph to continue filling the time series.

To this end, we will declare three nodes as entry points to our graph.
Stations are uniquely identified by their names, their numbers, and have a geographical position. We are therefore creating entry points to look for the stations by any of these means.

To this end we add three module variables. Module variables are variables not contained in any type or function. They are placed in the project.gcl file, before the main function.

[...]
var stations_by_name: nodeIndex<String, node<Station>>;
var stations_by_number: nodeList<node<Station>>;
var stations_locations: nodeGeo<node<Station>>;

fn main() {
    if(stations_by_name == null) {
        stations_by_name = nodeIndex<String, node<Station>>::new();
        stations_by_number = nodeList<node<Station>>::new();
        stations_locations = nodeGeo<node<Station>>::new();
    }
[...]

stations_by_name will provide a mean to store and retrieve stations by their name (of type String).
stations_by_number will provide a mean to store and retrieve stations by their number (of type int).
stations_locations will provide a mean to store and retrieve stations by their geographical position (of type geo).

These nodes declared as module variables must be initialized at least once. To this end, we test at the very first line of the main function, if one of them is null. If that is the case, we initialize them. This will prevent that the indexes are overridden each time we launch the script (to load each day for instance).

Because instances of types, can only be contained at one place, we cannot directly store the Station instance we create in our loop. To be able to reference one station in several places, we will first have to encapsulate each station in a node. The reference to this node can be related to the mechanism of pointers in C and can be used to point to a station from multiple places in the graph.

In the code snippet below, we adapt our code related to the creation of stations, to first check if the station exists, and if not, creates the station and sets it into a node. Additionally, the station has now to be resolved (loaded from the node) to be able to manipulate its fields and values with the dotted notation we used.

//Loop over all the elements in the array
for(positionInArray, stationObject in  jsonArray) {   
    var stationName = stationObject.get("name") as String;
    //Look for the station in the entrypoint (global index by name)
    var stationNode = stations_by_name.get(stationName);
    if(stationNode == null) {
        //If null, station is not found, and therefore created
        var stationNumber = stationObject.get("number") as int;
        var position = stationObject.get("position");
        //Station is wrapped in a node
        stationNode = node<Station>::new(Station{
            [...]
        });
        //Station is added to the index by its name
        stations_by_name.set(stationName, stationNode);
    }
    //Station is resolved (loaded) from its node container
    var station = *stationNode;
    [...]

Wrap up of this section

Here we are. This section has shown how to read data from a JSON file, how to define and describe the types for your use case, how to instantiate the elements to be stored in the graph, and how to index and store them.
For reminder, we put here the full content of the files we created.

project.gcl

@include("model");

use io;
use station;

var stations_by_name: nodeIndex<String, node<Station>>;
var stations_by_number: nodeList<node<Station>>;
var stations_locations: nodeGeo<node<Station>>;

fn main() {
    if(stations_by_name == null) {
        stations_by_name = nodeIndex<String, node<Station>>::new();
        stations_by_number = nodeList<node<Station>>::new();
        stations_locations = nodeGeo<node<Station>>::new();
    }

    var reader = JsonReader::new("data/Bruxelles-20180301.json");
    if(reader != null) {
        //While the reader is not empty
        while(reader.available() > 0) {

            //Read the content as a JSON Array
            var jsonArray = reader.read() as Array;

            //Loop over all the elements in the array
            for(positionInArray, stationObject in  jsonArray) {   
                var stationName = stationObject.get("name") as String;
                //Look for the station in the entrypoint (global index by name)
                var stationNode = stations_by_name.get(stationName);
                if(stationNode == null) {
                    //If null, station is not found, and therefore created
                    var stationNumber = stationObject.get("number") as int;
                    var position = stationObject.get("position");
                    //Station is wrapped in a node
                    stationNode = node<Station>::new(Station{
                        number: stationNumber,
                        name: stationObject.get("name") as String,
                        address: stationObject.get("address") as String,
                        position: geo::new(position.get("lat") as float, position.get("lng") as float),
                        last_update: time::new(0, DurationUnit::milliseconds),
                        bikes_stands: nodeTime<int>::new(),
                        available_bikes: nodeTime<int>::new(),
                        available_stands: nodeTime<int>::new(),
                        status: nodeTime<StationStatus>::new(),
                    });
                    //Station is added to the index by its name
                    stations_by_name.set(stationName, stationNode);
                }
                //Station is resolved (loaded) from its node container
                var station = *stationNode;

                var stationRecords = stationObject.get("records") as Array;
                for( _, record in stationRecords) {
                    var lastUpdate = time::new(record.get("last_update") as int, DurationUnit::milliseconds);
                    var status = StationStatusUtil::parse(record.get("status") as String);
                    station.bikes_stands.setAt(lastUpdate, record.get("bike_stands") as int);
                    station.available_bikes.setAt(lastUpdate, record.get("available_bikes") as int);
                    station.available_stands.setAt(lastUpdate, record.get("available_bike_stands") as int);
                    station.status.setAt(lastUpdate, status);
                    station.last_update = lastUpdate;
                }

                println("Processed station: ${station.name}");

            }
        }
        reader = null;
    } else {
        println("Could not read the file.");
    }
}

model/station.gcl

type Station {
    name: String;
    number: int;
    address: String;
    position: geo;
    last_update: time;
    bikes_stands: nodeTime<int>;
    available_bikes: nodeTime<int>;
    available_stands: nodeTime<int>;
    status: nodeTime<StationStatus>;
}

enum StationStatus {
    OPEN; CLOSE;
}

abstract type StationStatusUtil {
    static fn parse(val: String): StationStatus {
        if(val == "OPEN") {
            return StationStatus::OPEN;
        } else {
            return StationStatus::CLOSE;
        }
    }
}