Tensor

One powerful feature of Greycat is its ability to run with a limited amount of RAM complex computations even on big datasets. In order to achieve this goal, we need to split the data into small chunks that can fit in RAM and process them. In machine learning, we call this batch processing. In Greycat we have re-implemented the most useful machine learning algorithms in a streamable/batch-able way in order to be able to treat billions of observations without requiring a large IT infrastructure.

Since most machine learning algorithms deal with multidimensional numerical data, the most suitable structure to organize such data is a Tensor. You can view the tensor as multidimensional compact array

Creating a tensor

This is the code to create and initialize a 2D tensor, with 4 rows and 3 columns. The data in the tensor will be of type float 64 bits.

use util;

fn main(){
    var t = Tensor::new();
    t.init(TensorType::f64,[4,3]);  //Creates a 2 dimensional tensor of 4 rows and 3 columns = 12 elements in total

    Assert::equals(t.dim(),2);
    Assert::equals(t.size(),12);

    println(t);
}

Other supported data types for tensors are: i32 (integer 32 bits), i64 (integer 64 bits), f32 (float 32 bits), f64 (float 64 bits), c64 (complex numbers 64 bits - 32 for real and 32 for imaginary parts), c128 (complex numbers 128 bits - 64 for real and 64 for imaginary parts)

Set and get

In this example, we create a 3D tensor of 5 x 4 x 3 = 60 elements size. We set the first element to 42.3, then we get the value to verify it. In the last line we fill the whole tensor with 50.3

use util;

fn main(){
    var t = Tensor::new();
    t.init(TensorType::f64,[5,4,3]); //Creates a 3 dimensional tensor of 5 x 4 x 3 = 60 elements in total

    t.set([0,0,0], 42.3);
    Assert::equals(t.get([0,0,0]),42.3);
    
    t.fill(50.3);
}

To iterate on all the elements a multidimensional tensor, here is what to do:

use util;

fn main() {
  var t = Tensor::new();
  t.init(TensorType::f64, [2, 2, 3]); //Creates a 3 dimensional tensor of 2 x 2 x 3 = 12 elements in total
  var random = Random::new();

  var index = t.initPos();  //init the array to the correct shape of the tensor, in this case [0,0,0]
  do {
      t.set(index, random.uniformf(-5.0, 5.0));
      println(index);  //just to see how the N dimensional index follows the shape of the tensor
  } while (t.incPos(index)); // the incPos will increase the ND array 1 step 
  
  println("Final tensor: ${t}");
}

The result of the execution will be

Final tensor: {
    "_type":"core.Tensor",
    "dim":3,"shape":[2,2,3],
    "type":{"_type":"core.TensorType","field":"f64"},
    "data":[[[0.759629615,0.248973051,-3.768835556],[-2.246558586,4.141566827,1.888580251]],[[2.939783301,-1.110272969,-0.218231997],[4.150688923,-3.332956027,-3.502008323]]]
    }

Utility methods

A first useful method in tensor is the append method. Since many times we get the data in a streamable way we can append the data to the tensor as it arrives. If the tensor has 1 dimension, the append method takes as an argument a number, an array of numbers, or a tensor 1D.

use util;

fn main() {
    var t = Tensor::new(); 
    t.init(TensorType::f64, [0]); ////Creates a 1 dimensional tensor with 0 elements in it
 
    var t2 = Tensor::new();
    t2.init(TensorType::f64, [3]);  //T2 is a 1D tensor of 3 elements filled with 5.0
    t2.fill(5.0);

    t.append(3.0);          //Appends 1 value
    t.append([4.0,4.0]);    //Appends 2 values coming from an array
    t.append(t2);           //appends 3 values coming from a 1D tensor

    println(t);    
}

This code generates the following tensor:

{"_type":"core.Tensor","dim":1,"shape":[6],"type":{"_type":"core.TensorType","field":"f64"},"data":[3.0,4.0,4.0,5.0,5.0,5.0]}

For tensors with more than 1 dimensions, let’s say N = 4, we can only append a tensor of N-1 dimensions. For example:

use util;

fn main() {
var t = Tensor::new();
t.init(TensorType::f64, [0,2,3,4]); //Creates a 4 dimensional tensor of 0 x 2 x 3 x 4 = 0 elements in total, however the tensor has now the mandatory shapes of the last 3 dimensions 2 x 3 x 4

    var t2 = Tensor::new();
    t2.init(TensorType::f64, [2,3,4]);

    t2.fill(5.0);
    t.append(t2);           //appends the t2 to t

    t2.fill(9.0);
    t.append(t2);           //appends the t2 to t

    println(t);    
}

This code generates the following tensor:

{
    "_type":"core.Tensor",
    "dim":4,
    "shape":[2,2,3,4],
    "type":{"_type":"core.TensorType","field":"f64"},
    "data":[[[[5.0,5.0,5.0,5.0],[5.0,5.0,5.0,5.0],[5.0,5.0,5.0,5.0]],[[5.0,5.0,5.0,5.0],[5.0,5.0,5.0,5.0],[5.0,5.0,5.0,5.0]]],[[[9.0,9.0,9.0,9.0],[9.0,9.0,9.0,9.0],[9.0,9.0,9.0,9.0]],[[9.0,9.0,9.0,9.0],[9.0,9.0,9.0,9.0],[9.0,9.0,9.0,9.0]]]]
}

Notice how we can initialize always the first dimension of the tensor to 0, actually this is the only dimension that we allow to change with each append.

For performance and in order to avoid re-allocating the tensor when its size increases, there is the setCapacity methods. It allows us to set the capacity of a tensor even if the first dimension is 0. If we add this line after init in the previous example, the tensor will have directly a capacity to hold 1000 elements before the appends happen.

t.setCapacity(1000);

Finally in order to re-use the tensor memory but with different shape, a method reset exists to allow changing the shape of the tensor. As an example:

use util;

fn main() {
    var t = Tensor::new(); 
    t.init(TensorType::f64, [2,3,2]); 
    t.fill(5.0);

    t.reset();
    t.init(TensorType::f64, [1,2,3]);      
}

This tensor will reuse its same memory space, with different shapes.