In this page
Statistics
This page presents GreyCat types which can be used to gather general distribution statistics. The following types are available and will be explained below:
Gaussian
Quantizer<T>
Histogram<T>
GaussianProfile<T>
Gaussian
The Gaussian
type is a specialized container, exposed by the utils
module, which:
- accumulates data points, maintains their sum, sum squared, min, max
- computes basic statistics on the points collected: average, standard deviation
- normalizes (and inverse), standardizes (and inverse)
- computes probability, confidence, with collected data considered part of a normal distribution.
Example
fn main() {
var g = Gaussian {}; // default object construction
var rng = Random {};
var avg = 5.0;
var std = 1.0;
for (var i = 0; i < 1000; i++) {
// Add a random value to the histogram
// rng.normal: Generates a random float from the normal distribution with given avg and std
g.add(rng.normal(avg, std));
}
println("min=${g.min}, max=${g.max}, avg=${g.avg()}, std=${g.std()}");
// Returns the probability distribution function (PDF) at a certain `value`
println("pdf(4.0) = ${g.pdf(4.0)}");
println("conf(4.0) = ${g.confidence(4.0)}");
var normalizedValue = g.normalize(4.0)!!;
println("normalize(4.0) = ${normalizedValue}");
println("inverse_normalize(4.0) = ${g.inverse_normalize(normalizedValue)}");
}
// min=2.3036208769, max=7.5829338665, avg=4.983027888, std=0.9555194504
// pdf(4.0) = 0.2459463726
// conf(4.0) = 0.303578876
// normalize(4.0) = 0.3213257343
// inverse_normalize(4.0) = 4.0
Quantizer
A Quantizer<T>
, once configured with index values in multiple dimensions,
allows you to transform one multi-dimensional index point into a single integer representation.
The Quantizer<T>
in GreyCat is an abstract type and we offer several concrete types to initialize a quantizer:
LinearQuantizer
LogQuantizer
CustomQuantizer
MultiQuantizer
LinearQuantizer
and LogQuantizer
expect a min, max and number of bins as input and automatically generate the bins on initialization.
CustomQuantizer
also uses a min and max, but requires the user to set the starting points of bins as an Array<float>
.
The extrema bins (first and last) can optionally be defined as open, by default we use closed bins, which can lead to ignored values that land outside the given min and max.
MultiQuantizer
extends Quantizer<Array<T>>
and allows to define a set of quantizers.
Quantizer<T>
is also used to instantiate the bins of Histogram<T>
and GaussianProfile<T>
.
Example
fn main(){
var min = 10;
var max = 100;
var bins = 10;
var linearQuantizer = LinearQuantizer<int> { min: min, max: max, bins: bins, open: true };
var logQuantizer = LogQuantizer<int> { min: min, max: max, bins: bins, open: true };
var customQuantizer = CustomQuantizer<int> { min: min, max: max, step_starts: Array<int> {0, 50, 100}, open: true };
// print min max and center of first mid and last quantizer bound
println("Linear First: ${linearQuantizer.bounds(0)}, Mid: ${linearQuantizer.bounds(4)}, Last: ${linearQuantizer.bounds(9)}");
println("Log First: ${logQuantizer.bounds(0)}, Mid: ${logQuantizer.bounds(4)}, Last: ${logQuantizer.bounds(9)}");
println("Custom First: ${customQuantizer.bounds(0)}, Mid: ${customQuantizer.bounds(1)}, Last: ${customQuantizer.bounds(2)}");
}
// Linear First: Table<QuantizerSlotBound<int>>{[10,19,14]}, Mid: Table<QuantizerSlotBound<int>>{[46,55,50]}, Last: Table<QuantizerSlotBound<int>>{[91,100,95]}
// Log First: Table<QuantizerSlotBound<int>>{[10,12,11]}, Mid: Table<QuantizerSlotBound<int>>{[25,31,28]}, Last: Table<QuantizerSlotBound<int>>{[79,100,89]}
// Custom First: Table<QuantizerSlotBound<int>>{[10,0,5]}, Mid: Table<QuantizerSlotBound<int>>{[0,50,25]}, Last: Table<QuantizerSlotBound<int>>{[50,100,75]}
Histogram
The Histogram<T>
type is used to compute statistics on a set of numeric values. Exposed in the utils
module
To create a new histogram we need to create a Quantizer<T>
first that defines the bounds of the histogram.
The example below shows how to create a Histogram<flaot>
using a LinearQuantizer<float>
to define the bounds.
The Histogram<T>
type offers function to add(value: T)
a single value or increase the count of a value directly by the indicated number with addx(value: T, count: int)
.
A HistogramStats
object can be accessed via the native stats(dim:int)
function of the histogram object.
Example
Let’s create a histogram of float values and add some random normal distributed values to it.
fn main{
var rng = Random {};
var avg = 5.0;
var std = 1.0;
// Create a new histogram using linear quantizer
var quantizer = LinearQuantizer<float> { min: avg-3*std, max: avg+3*std, bins: 20, open:true };
var histogram = Histogram<float> { quantizer: quantizer };
for (var i = 0; i < 1000; i++) {
// Add a random value to the histogram
// rng.normal: Generates a random float from the normal distribution with given avg and std
histogram.add(rng.normal(avg, std));
}
println(histogram.stats());
// Histogram<float>{quantizer:LinearQuantizer<float>{min:2.0,max:8.0,bins:20,open:true},
// bins:Array<int?>{3,6,10,27,33,42,69,81,122,116,122,97,99,67,44,36,15,4,5,2},nb_rejected:null,nb_accepted:1000}
}
We can also create a multidimensional histogram using the MultiQuantizer<T>
:
fn main() {
var rng = Random{seed:42};
var xQuantizer = LinearQuantizer<float>{min:0.0, max:100.0, bins:3,open:false};
var yQuantizer = LinearQuantizer<float>{min:0.0, max:100.0, bins:3, open:false};
var multQuant = MultiQuantizer<float>{quantizers:Array<Quantizer<float>>{xQuantizer,yQuantizer}};
var hist = Histogram<Array<float>>{quantizer:multQuant};
for (var i=0; i<1e6; i++){
var xVal = rng.normal(50.0, 10.0);
var yVal = rng.normal(50.0, 10.0);
var binIdx = multQuant.quantize(Array<float>{xVal,yVal});
if(binIdx<0){
continue;
}
hist.add(Array<float>{xVal,yVal});
}
println(hist.stats());
// Histogram<Array<float>>{quantizer:MultiQuantizer<float>
// {quantizers:Array<Quantizer<float>>{LinearQuantizer<float>{min:0.0,max:100.0,bins:3,open:false},
// LinearQuantizer<float>{min:0.0,max:100.0,bins:3,open:false}}},
// bins:Array<int?>{2300,43186,2268,43580,816772,43712,2325,43573,2281},nb_rejected:null,nb_accepted:999997}
}
Gaussian Profile
A GaussianProfile<T>
represents a collection of gaussian distributions.
It uses a Quantizer<T>
to define the slots, i.e. the indices for the distributions.
The type also requires to set the FloatPrecision
on definition.
To update the individual distributions or get statistics of them we can use the following functions:
add(key: T, value: float)
- Add the value to the distribution given by keyavg(key: T):float
- Returns the average of the distribution given by keystd(key: T):float
- Returns the standard deviation of the distribution given by key
Example
fn main() {
var quantizer = LinearQuantizer<int>{min:0, max:2, bins:3};
var gprof = GaussianProfile<int>{quantizer:quantizer, precision: FloatPrecision::p1000000};
var rng = Random{};
var avg = 5.0;
var std = 1.0;
for (var i = 0; i < 10000; i++) {
var x = rng.normal(avg, std);
gprof.add(0, x );
gprof.add(1, 2*x);
gprof.add(2, 3*x);
}
for (var i = 0; i < gprof.quantizer.size(); i++) {
println("${i}: avg=${gprof.avg(i)}, std=${gprof.std(i)}");
}
}
// 0: avg=5.0244880593, std=5.1089283511
// 1: avg=10.0489766223, std=10.2449037714
// 2: avg=15.0734651803, std=15.3688562146