7.0.1685-testing

Statistics

This page presents GreyCat types which can be used to gather general distribution statistics. The following types are available and will be explained below:

  • Gaussian
  • Quantizer<T>
  • Histogram<T>
  • GaussianProfile<T>

Gaussian

The Gaussian type is a specialized container, exposed by the utils module, which:

  • accumulates data points, maintains their sum, sum squared, min, max
  • computes basic statistics on the points collected: average, standard deviation
  • normalizes (and inverse), standardizes (and inverse)
  • computes probability, confidence, with collected data considered part of a normal distribution.

Example



fn main() {
    var g = Gaussian {};  // default object construction
    var rng = Random {};
    var avg = 5.0;
    var std = 1.0;

    for (var i = 0; i < 1000; i++) {
        // Add a random value to the histogram
        // rng.normal: Generates a random float from the normal distribution with given avg and std
        g.add(rng.normal(avg, std));
    }

    println("min=${g.min}, max=${g.max}, avg=${g.avg()}, std=${g.std()}");
    // Returns the probability distribution function (PDF) at a certain `value`
    println("pdf(4.0) = ${g.pdf(4.0)}");
    println("conf(4.0) = ${g.confidence(4.0)}");
    var normalizedValue = g.normalize(4.0)!!;
    println("normalize(4.0) = ${normalizedValue}");
    println("inverse_normalize(4.0) = ${g.inverse_normalize(normalizedValue)}");
}

// min=2.3036208769, max=7.5829338665, avg=4.983027888, std=0.9555194504
// pdf(4.0) = 0.2459463726
// conf(4.0) = 0.303578876
// normalize(4.0) = 0.3213257343
// inverse_normalize(4.0) = 4.0

Quantizer

A Quantizer<T>, once configured with index values in multiple dimensions, allows you to transform one multi-dimensional index point into a single integer representation.

The Quantizer<T> in GreyCat is an abstract type and we offer several concrete types to initialize a quantizer:

  • LinearQuantizer
  • LogQuantizer
  • CustomQuantizer
  • MultiQuantizer

LinearQuantizer and LogQuantizer expect a min, max and number of bins as input and automatically generate the bins on initialization. CustomQuantizer also uses a min and max, but requires the user to set the starting points of bins as an Array<float>. The extrema bins (first and last) can optionally be defined as open, by default we use closed bins, which can lead to ignored values that land outside the given min and max.

MultiQuantizer extends Quantizer<Array<T>> and allows to define a set of quantizers.

Quantizer<T> is also used to instantiate the bins of Histogram<T> and GaussianProfile<T>.

Example



fn main(){
    var min = 10;
    var max = 100;
    var bins = 10;

    var linearQuantizer = LinearQuantizer<int> { min: min, max: max, bins: bins, open: true };
    var logQuantizer = LogQuantizer<int> { min: min, max: max, bins: bins, open: true };
    var customQuantizer = CustomQuantizer<int> { min: min, max: max, step_starts: Array<int> {0, 50, 100}, open: true };


    // print min max and center of first mid and last quantizer bound
    println("Linear First: ${linearQuantizer.bounds(0)}, Mid: ${linearQuantizer.bounds(4)}, Last: ${linearQuantizer.bounds(9)}");
    println("Log First: ${logQuantizer.bounds(0)}, Mid: ${logQuantizer.bounds(4)}, Last: ${logQuantizer.bounds(9)}");
    println("Custom First: ${customQuantizer.bounds(0)}, Mid: ${customQuantizer.bounds(1)}, Last: ${customQuantizer.bounds(2)}");
}
// Linear First: Table<QuantizerSlotBound<int>>{[10,19,14]}, Mid: Table<QuantizerSlotBound<int>>{[46,55,50]}, Last: Table<QuantizerSlotBound<int>>{[91,100,95]}
// Log First: Table<QuantizerSlotBound<int>>{[10,12,11]}, Mid: Table<QuantizerSlotBound<int>>{[25,31,28]}, Last: Table<QuantizerSlotBound<int>>{[79,100,89]}
// Custom First: Table<QuantizerSlotBound<int>>{[10,0,5]}, Mid: Table<QuantizerSlotBound<int>>{[0,50,25]}, Last: Table<QuantizerSlotBound<int>>{[50,100,75]}

Histogram

The Histogram<T> type is used to compute statistics on a set of numeric values. Exposed in the utils module

To create a new histogram we need to create a Quantizer<T> first that defines the bounds of the histogram. The example below shows how to create a Histogram<flaot> using a LinearQuantizer<float> to define the bounds.

The Histogram<T> type offers function to add(value: T) a single value or increase the count of a value directly by the indicated number with addx(value: T, count: int). A HistogramStats object can be accessed via the native stats(dim:int) function of the histogram object.

Example

Let’s create a histogram of float values and add some random normal distributed values to it.

fn main{
  var rng = Random {};
  var avg = 5.0;
  var std = 1.0;

  // Create a new histogram using linear quantizer
  var quantizer = LinearQuantizer<float> { min: avg-3*std, max: avg+3*std, bins: 20, open:true };
  var histogram = Histogram<float> { quantizer: quantizer };

  for (var i = 0; i < 1000; i++) {
      // Add a random value to the histogram
      // rng.normal: Generates a random float from the normal distribution with given avg and std
      histogram.add(rng.normal(avg, std));
  }

  println(histogram.stats());
  // Histogram<float>{quantizer:LinearQuantizer<float>{min:2.0,max:8.0,bins:20,open:true},
  // bins:Array<int?>{3,6,10,27,33,42,69,81,122,116,122,97,99,67,44,36,15,4,5,2},nb_rejected:null,nb_accepted:1000}
}

We can also create a multidimensional histogram using the MultiQuantizer<T>:


fn main() {
    var rng = Random{seed:42};
    var xQuantizer = LinearQuantizer<float>{min:0.0, max:100.0, bins:3,open:false};
    var yQuantizer = LinearQuantizer<float>{min:0.0, max:100.0, bins:3, open:false};
    var multQuant = MultiQuantizer<float>{quantizers:Array<Quantizer<float>>{xQuantizer,yQuantizer}};
    var hist = Histogram<Array<float>>{quantizer:multQuant};
 

    for (var i=0; i<1e6; i++){
        var xVal = rng.normal(50.0, 10.0);
        var yVal = rng.normal(50.0, 10.0);
        var binIdx = multQuant.quantize(Array<float>{xVal,yVal});
        if(binIdx<0){
            continue;
        }
        hist.add(Array<float>{xVal,yVal});    
    }
    println(hist.stats());
    // Histogram<Array<float>>{quantizer:MultiQuantizer<float>
    // {quantizers:Array<Quantizer<float>>{LinearQuantizer<float>{min:0.0,max:100.0,bins:3,open:false},
    // LinearQuantizer<float>{min:0.0,max:100.0,bins:3,open:false}}},
    // bins:Array<int?>{2300,43186,2268,43580,816772,43712,2325,43573,2281},nb_rejected:null,nb_accepted:999997}
}


Gaussian Profile

A GaussianProfile<T> represents a collection of gaussian distributions. It uses a Quantizer<T> to define the slots, i.e. the indices for the distributions. The type also requires to set the FloatPrecision on definition.

To update the individual distributions or get statistics of them we can use the following functions:

  • add(key: T, value: float) - Add the value to the distribution given by key
  • avg(key: T):float - Returns the average of the distribution given by key
  • std(key: T):float - Returns the standard deviation of the distribution given by key

Example



fn main() {
    var quantizer = LinearQuantizer<int>{min:0, max:2, bins:3};
    var gprof = GaussianProfile<int>{quantizer:quantizer, precision: FloatPrecision::p1000000};
    var rng = Random{};
    var avg = 5.0;
    var std = 1.0;

    for (var i = 0; i < 10000; i++) {
        var x = rng.normal(avg, std);
        gprof.add(0, x );
        gprof.add(1, 2*x);
        gprof.add(2, 3*x);
    }

    for (var i = 0; i < gprof.quantizer.size(); i++) {
        println("${i}: avg=${gprof.avg(i)}, std=${gprof.std(i)}");
    }
}

// 0: avg=5.0244880593, std=5.1089283511
// 1: avg=10.0489766223, std=10.2449037714
// 2: avg=15.0734651803, std=15.3688562146