7.0.1685-testing

Data Structures

GreyCat offers a selection of data structures to support organizing and computing on large data.

Tuple

Simple association data structure to handle couple of values. Can be specialized by generic type T and U respectively for left and right-hand values.

  var tupleA = Tuple{x:0.5,y:"a"}; // Tuple{x:0.5,y:"a"}
  var tupleB = (0.5,"b"); // Tuple{x:"b",y:0.5}

Arrays and maps

GreyCat provides Array and Map, which are in-memory and meant for small amounts of data

fn main() {
    var arrayA = Array<float>{1.2, 3.4, 5.0, 4.1};
    for (k, v in arrayA) {
        println("Index: ${k}, value ${v}");
    }
    // short notation, drawback is that the typing stays unknown
    var arrayB = [1.2, 3.4, 5.0, 4.1];
    for (k, v in arrayB) {
        println("Index: ${k}, value ${v}");
    }
    // Index: 0, value 1.2
    // Index: 1, value 3.4
    // Index: 2, value 5.0
    // Index: 3, value 4.1
    // Index: 0, value 1.2
    // Index: 1, value 3.4
    // Index: 2, value 5.0
    // Index: 3, value 4.1
}

Similarly:

fn main() {
    var map = Map<String, int>{};
    map.set("Hello", 5);
    map.set("Test", 2);

    println(map.get("Test"));
}

Arrays and Maps are useful for small amounts of data. For large datasets, use nodeList and nodeIndex respectively.


Windows

Windows are FIFO (First In First Out) structures with a fixed size. They are used to collect a number of numerical values, and provide handy methods to get statistics on this set of values.

There exist two types of windows in GreyCat, TimeWindow where the size is defined in time (thus the number of values can vary), and SlidingWindow where the number of elements is fixed.

Both are presented next.

TimeWindow

Time windows are convenient to collect values within a given period of time. Developers would simply create a TimeWindow and specify the maximum time separating the first and last value of the set. The TimeWindow will automatically discard old elements when the max duration between elements is reached.

In the following example, a TimeWindow is used to compute the average of a value by periods of 5 seconds:

fn main() {
  var tw = TimeWindow<float> {span: 5s };

  for (var t = 0; t < 51; t++) {
    // add the value to the time window.
    tw.add(time::new(t, DurationUnit::seconds), t as float);

    // every five seconds
    if (t != 0 && t % 5 == 0) {
      // displays structure of TimeWindow:
      var t_start = tw.values.get_cell(0, 0);
      var t_start_s = t_start.to(DurationUnit::seconds);
      var t_end = tw.values.get_cell(tw.size() - 1, 0);
      var t_end_s = t_end.to(DurationUnit::seconds);
      println("window size: ${tw.size()}, range: [${t_start_s}, ${t_end_s}], ");

      // displays the average computed over all values from the last 5 seconds
      println("average: ${tw.avg()}");
      // display first/ last element in the window
      println("first: ${tw.min()}, last: ${tw.max()}");
    }
  }
}

// window size: 6, range: [0, 5],
// average: 2.5first: Tuple{x:'1970-01-01T00:00:00Z',y:0.0}, last: Tuple{x:'1970-01-01T00:00:05Z',y:5.0}
// window size: 6, range: [5, 10],
// average: 7.5first: Tuple{x:'1970-01-01T00:00:05Z',y:5.0}, last: Tuple{x:'1970-01-01T00:00:10Z',y:10.0}
// window size: 6, range: [10, 15],
// average: 12.5first: Tuple{x:'1970-01-01T00:00:10Z',y:10.0}, last: Tuple{x:'1970-01-01T00:00:15Z',y:15.0}
// window size: 6, range: [15, 20],
// average: 17.5first: Tuple{x:'1970-01-01T00:00:15Z',y:15.0}, last: Tuple{x:'1970-01-01T00:00:20Z',y:20.0}
// window size: 6, range: [20, 25],
// average: 22.5first: Tuple{x:'1970-01-01T00:00:20Z',y:20.0}, last: Tuple{x:'1970-01-01T00:00:25Z',y:25.0}
// window size: 6, range: [25, 30],
// average: 27.5first: Tuple{x:'1970-01-01T00:00:25Z',y:25.0}, last: Tuple{x:'1970-01-01T00:00:30Z',y:30.0}
// window size: 6, range: [30, 35],
// average: 32.5first: Tuple{x:'1970-01-01T00:00:30Z',y:30.0}, last: Tuple{x:'1970-01-01T00:00:35Z',y:35.0}
// window size: 6, range: [35, 40],
// average: 37.5first: Tuple{x:'1970-01-01T00:00:35Z',y:35.0}, last: Tuple{x:'1970-01-01T00:00:40Z',y:40.0}
// window size: 6, range: [40, 45],
// average: 42.5first: Tuple{x:'1970-01-01T00:00:40Z',y:40.0}, last: Tuple{x:'1970-01-01T00:00:45Z',y:45.0}
// window size: 6, range: [45, 50],
// average: 47.5first: Tuple{x:'1970-01-01T00:00:45Z',y:45.0}, last: Tuple{x:'1970-01-01T00:00:50Z',y:50.0}

Sliding Window

Sliding windows are convenient to collect several values. Developers would simply create a SlidingWindow and specify the maximum number of values in the window. The SlidingWindow will automatically discard the last element when the max size is reached.

In the following example, a SlidingWindow is used to compute the average over 5 values:

fn main() {
    var sw = SlidingWindow<float>{ span: 5 };

    for (var i = 0; i < 51; i++) {
        // add the value to the SlidingWindow (as floats)
        sw.add(i as float);

        // every five values
        if (i != 0 && i % 5 == 0) {
            // displays the average computed the last 5 values
            println("average over ${sw.size()}: ${sw.avg()}");

            // displays the average computed over all values from the last 5 seconds
            println("average: ${sw.avg()}");
            // display first/ last element in the window
            println("first: ${sw.min()}, last: ${sw.max()}");
        }
    }
}

// average over 5: 3.0
// average: 3.0
// first: 1.0, last: 5.0
// average over 5: 8.0
// average: 8.0
// first: 6.0, last: 10.0
// average over 5: 13.0
// average: 13.0
// first: 11.0, last: 15.0
// average over 5: 18.0
// average: 18.0
// first: 16.0, last: 20.0
// average over 5: 23.0
// average: 23.0
// first: 21.0, last: 25.0
// average over 5: 28.0
// average: 28.0
// first: 26.0, last: 30.0
// average over 5: 33.0
// average: 33.0
// first: 31.0, last: 35.0
// average over 5: 38.0
// average: 38.0
// first: 36.0, last: 40.0
// average over 5: 43.0
// average: 43.0
// first: 41.0, last: 45.0
// average over 5: 48.0
// average: 48.0
// first: 46.0, last: 50.0

Table

Table is a core GreyCat type, that serves as a generic two-dimensional container. It is typically used to return a result set. For example, web components can handle Table objects returned by the GreyCat backend. Also, the Explorer can display Table objects.

Sampling results can also be expressed as Tables.

Data elements

Tables are populated one cell at a time. Not all cells need to contain values (null in this case).

fn main() {
    var t = Table{}; // creates empty table
    t.init(2,4); // initiates table with 2 rows and 4 columns

    t.set_cell(0, 1, "onetwothree...");  // 1st row, 2nd column
    t.set_cell(0, 2, time::now());
    info(t.get_cell(0,0));
    var row = ["...threefive", 0.0, time::now()];
    t.set_row(1,row);

    info(t.rows());  // 2

    t.remove_row(0);  // removes row 0

    info(t.get_cell(0, 0));  // "...threefive"
}
It is not necessary to Table::init(), however without it you might experience out of bounds exceptions when setting cells.

A Table can be sorted along one column.

t.sort(1,SortOrder::asc);  // sorts by ascending order

Applying mappings

A Table can be transformed by applying mappings to its columns.

A mapping is a series of extractors to apply to a specific column on a Table.

type MyObject{
    a:String;
    b:NestedObject;
}
type NestedObject{
    c:String;
}
fn main() {
    var t = Table {};
    t.init(0, 3);
    var mappings = Array<TableColumnMapping>{
        TableColumnMapping { column: 0, extractors: Array<any> {"*", "a"} }, // resolve the node get the attribute
        TableColumnMapping { column: 1, extractors: Array<any> {"a"} }, // resolve the field
        TableColumnMapping { column: 1, extractors: Array<any> {"c", "d"} }, // resolve the nested field
        TableColumnMapping { column: 2, extractors: Array<any> {0} } // resolve the offset
    };

    var nestedObj = NestedObject{c: "nested value"};

    var obj = MyObject{
        a: "attribute a",
        b: nestedObj
    };

    t.set_cell(0, 0, node<MyObject>{obj});
    t.set_cell(0, 1, obj);
    t.set_cell(0, 2, ["array index 0"]);


    var newTable = Table::applyMappings(t, mappings);
    info(newTable);
}

The new Table wil contain 4 new columns with our specified mappings.

[
  {
    "_type": "core.node",
    "ref": "0440000000000000"
  },
  {
    "a": "attribute a",
    "c": {
      "d": "nested value"
    }
  },
  [
    "array index 0"
  ],
  "attribute a", // resolved from the node
  "attribute a", // resolved from the object
  "nested value", // resolved from the nested object
  "array index 0" // resolved from the array
]

Tensor

One powerful feature of GreyCat is its ability to run with a limited amount of RAM complex computations even on big datasets. In order to achieve this goal, we need to split the data into small chunks that can fit in RAM and process them. In machine learning, we call this batch processing. In GreyCat we have re-implemented the most useful machine learning algorithms in a streamable/batch-able way in order to be able to treat billions of observations without requiring a large IT infrastructure.

Since most machine learning algorithms deal with multidimensional numerical data, the most suitable structure to organize such data is a Tensor. You can view the Tensor as multidimensional compact array

Creating a tensor

This is the code to create and initialize a 2D Tensor, with 4 rows and 3 columns. The data in the Tensor will be of type float 64 bits.



fn main(){
    var t = Tensor{};
    t.init(TensorType::f64,Array<int> {4, 3});  //Creates a 2 dimensional tensor of 4 rows and 3 columns = 12 elements in total

    Assert::equals(t.dim(),2);
    Assert::equals(t.size(),12);

    println(t);
}

Other supported data types for Tensors are: i32 (integer 32 bits), i64 (integer 64 bits), f32 (float 32 bits), f64 (float 64 bits), c64 (complex numbers 64 bits - 32 for real and 32 for imaginary parts), c128 (complex numbers 128 bits - 64 for real and 64 for imaginary parts)

Set and get

In this example, we create a 3D Tensor of 5 x 4 x 3 = 60 elements size. We set the first element to 42.3, then we get the value to verify it. In the last line we fill the whole Tensor with 50.3



fn main(){
    var t = Tensor{};
    t.init(TensorType::f64,Array<int> {5, 4, 3}); //Creates a 3 dimensional tensor of 5 x 4 x 3 = 60 elements in total

    t.set(Array<int> {0, 0, 0}, 42.3);
    Assert::equals(t.get(Array<int> {0, 0, 0}),42.3);

    t.fill(50.3);
}

To iterate on all the elements a multidimensional Tensor, here is what to do:



fn main() {
  var t = Tensor{};
  t.init(TensorType::f64, Array<int> {2, 2, 3}); //Creates a 3 dimensional tensor of 2 x 2 x 3 = 12 elements in total
  var random = Random{};

  var index = t.initPos();  //init the array to the correct shape of the tensor, in this case [0,0,0]
  do {
      t.set(index, random.uniformf(-5.0, 5.0));
      println(index);  // to see how the N dimensional index follows the shape of the tensor
  } while (t.incPos(index)); // the incPos will increase the ND array 1 step
}

Utility methods

A first useful method in Tensor is the append method. Since many times we get the data in a streamable way we can append the data to the Tensor as it arrives. If the Tensor has 1 dimension, the append method takes as an argument a number, an array of numbers, or a Tensor 1D.



fn main() {
    var t = Tensor{};
    t.init(TensorType::f64, Array<int> {0}); //Creates a 1 dimensional tensor with 0 elements in it

    var t2 = Tensor{};
    t2.init(TensorType::f64, Array<int> {3});  //T2 is a 1D tensor of 3 elements filled with 5.0
    t2.fill(5.0);

    t.append(3.0);          //Appends 1 value
    t.append([4.0,4.0]);    //Appends 2 values coming from an array
    t.append(t2);           //appends 3 values coming from a 1D tensor

    println(t.toTable());
}

For tensors with more than 1 dimensions, let’s say N = 4, we can only append a Tensor of N-1 dimensions. For example:



fn main() {
var t = Tensor{};
t.init(TensorType::f64, Array<int> {0, 2, 3, 4}); //Creates a 4 dimensional tensor of 0 x 2 x 3 x 4 = 0 elements in total, however the tensor has now the mandatory shapes of the last 3 dimensions 2 x 3 x 4

    var t2 = Tensor{};
    t2.init(TensorType::f64, Array<int> {2, 3, 4});

    t2.fill(5.0);
    t.append(t2);           //appends the t2 to t

    t2.fill(9.0);
    t.append(t2);           //appends the t2 to t
}

Notice how we can initialize always the first dimension of the Tensor to 0, actually this is the only dimension that we allow to change with each append.

For performance and in order to avoid re-allocating the Tensor when its size increases, there is the setCapacity methods. It allows us to set the capacity of a Tensor even if the first dimension is 0. If we add this line after init in the previous example, the Tensor will have directly a capacity to hold 1000 elements before the appends happen.

t.setCapacity(1000);

Finally in order to re-use the Tensor memory but with different shape, a method reset exists to allow changing the shape of the Tensor. As an example:



fn main() {
    var t = Tensor{};
    t.init(TensorType::f64, Array<int> {2, 3, 2});
    t.fill(5.0);

    t.reset();
    t.init(TensorType::f64, Array<int> {1, 2, 3});
}

This Tensor will reuse its same memory space, with different shapes.


Buffer

Buffer is an efficient string buffer: it allows you to create and append data to a String type.



fn main() {
  var b = Buffer{};
  b.add(1);
  b.add(" one ");
  b.add([1, 2]);
  println(b.toString()); //   1 one Array{1,2}
}