2021-01-29 12:31:58 +01:00
|
|
|
|
|
|
|
[![Arduino CI](https://github.com/RobTillaart/Histogram/workflows/Arduino%20CI/badge.svg)](https://github.com/marketplace/actions/arduino_ci)
|
2021-11-04 12:32:04 +01:00
|
|
|
[![Arduino-lint](https://github.com/RobTillaart/Histogram/actions/workflows/arduino-lint.yml/badge.svg)](https://github.com/RobTillaart/Histogram/actions/workflows/arduino-lint.yml)
|
|
|
|
[![JSON check](https://github.com/RobTillaart/Histogram/actions/workflows/jsoncheck.yml/badge.svg)](https://github.com/RobTillaart/Histogram/actions/workflows/jsoncheck.yml)
|
2021-01-29 12:31:58 +01:00
|
|
|
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/RobTillaart/Histogram/blob/master/LICENSE)
|
|
|
|
[![GitHub release](https://img.shields.io/github/release/RobTillaart/Histogram.svg?maxAge=3600)](https://github.com/RobTillaart/Histogram/releases)
|
|
|
|
|
2021-11-04 12:32:04 +01:00
|
|
|
|
2021-01-29 12:31:58 +01:00
|
|
|
# Histogram
|
2017-07-27 13:28:53 +02:00
|
|
|
|
2020-11-27 11:16:22 +01:00
|
|
|
Arduino library for creating histograms math.
|
|
|
|
|
2021-11-04 12:32:04 +01:00
|
|
|
|
2017-07-27 13:28:53 +02:00
|
|
|
## Description
|
|
|
|
|
|
|
|
One of the main applications for the Arduino board is reading and logging of sensor data.
|
|
|
|
We often want to make a histogram of this data to get insight of the distribution of the
|
|
|
|
measurements. This is where this Histogram library comes in.
|
|
|
|
|
2022-11-09 10:42:12 +01:00
|
|
|
The Histogram distributes the values added to it into buckets and keeps count per bucket.
|
2017-07-27 13:28:53 +02:00
|
|
|
|
2021-01-29 12:31:58 +01:00
|
|
|
If you need more quantitative analysis, you might need the statistics library,
|
2021-11-04 12:32:04 +01:00
|
|
|
- https://github.com/RobTillaart/Statistic
|
2021-01-29 12:31:58 +01:00
|
|
|
|
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
#### Related
|
|
|
|
|
|
|
|
- https://github.com/RobTillaart/Correlation
|
|
|
|
- https://github.com/RobTillaart/GST - Golden standard test metrics
|
|
|
|
- https://github.com/RobTillaart/Histogram
|
|
|
|
- https://github.com/RobTillaart/RunningAngle
|
|
|
|
- https://github.com/RobTillaart/RunningAverage
|
|
|
|
- https://github.com/RobTillaart/RunningMedian
|
|
|
|
- https://github.com/RobTillaart/statHelpers - combinations & permutations
|
|
|
|
- https://github.com/RobTillaart/Statistic
|
|
|
|
|
|
|
|
|
2021-12-19 13:52:01 +01:00
|
|
|
#### Working
|
2017-07-27 13:28:53 +02:00
|
|
|
|
|
|
|
When the class is initialized an array of the boundaries to define the borders of the
|
|
|
|
buckets is passed to the constructor. This array should be declared global as the
|
|
|
|
Histogram class does not copy the values to keep memory usage low. This allows to change
|
2021-01-29 12:31:58 +01:00
|
|
|
the boundaries runtime, so after a **clear()**, a new Histogram can be created.
|
|
|
|
|
2021-11-04 12:32:04 +01:00
|
|
|
The values in the boundary array do not need to be equidistant (equal in size)
|
|
|
|
but they need to be in ascending order.
|
2017-07-27 13:28:53 +02:00
|
|
|
|
|
|
|
Internally the library does not record the individual values, only the count per bucket.
|
2023-07-24 12:51:25 +02:00
|
|
|
If a new value is added - **add(value)** - the class checks in which bucket it
|
2021-11-04 12:32:04 +01:00
|
|
|
belongs and the buckets counter is increased.
|
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
The **sub(value)** function is used to decrease the count of a bucket and it can
|
|
|
|
cause the count to become below zero.
|
|
|
|
Although seldom used but still depending on the application it can be useful.
|
|
|
|
E.g. when you want to compare two value generating streams, you let
|
2021-11-04 12:32:04 +01:00
|
|
|
one stream **add()** and the other **sub()**. If the histogram of both streams is
|
|
|
|
similar they should cancel each other out (more or less), and the value of all buckets
|
|
|
|
should be around 0. \[not tried\].
|
|
|
|
|
|
|
|
The **frequency()** function may be removed to reduce footprint as it can be calculated
|
|
|
|
with the formula **(1.0 \* bucket(i))/count()**.
|
|
|
|
|
|
|
|
|
2021-12-19 13:52:01 +01:00
|
|
|
#### Experimental: Histogram8 Histogram16
|
2021-11-04 12:32:04 +01:00
|
|
|
|
2023-02-22 10:34:45 +01:00
|
|
|
Histogram8 and Histogram16 are derived classes with same interface but smaller buckets.
|
|
|
|
Histogram can count to ± 2^31 while often ± 2^15 or even ± 2^7 is sufficient.
|
|
|
|
Saves substantial memory.
|
|
|
|
|
|
|
|
| class name | length | count/bucket | max memory |
|
|
|
|
|:--------------|---------:|---------------:|-------------:|
|
|
|
|
| Histogram | 65534 | ± 2147483647 | 260 KB |
|
|
|
|
| Histogram8 | 65534 | ± 127 | 65 KB |
|
|
|
|
| Histogram16 | 65534 | ± 32767 | 130 KB |
|
2021-11-04 12:32:04 +01:00
|
|
|
|
|
|
|
|
|
|
|
The difference is the **\_data** array, to reduce the memory footprint.
|
|
|
|
|
2021-12-19 13:52:01 +01:00
|
|
|
Note: max memory is without the boundary array.
|
2021-11-04 12:32:04 +01:00
|
|
|
|
|
|
|
Performance optimizations are possible too however not essential for
|
|
|
|
the experimental version.
|
|
|
|
|
|
|
|
|
|
|
|
## Interface
|
|
|
|
|
2022-11-09 10:42:12 +01:00
|
|
|
```cpp
|
|
|
|
#include "histogram.h"
|
|
|
|
```
|
2021-11-04 12:32:04 +01:00
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
#### Constructor
|
2017-07-27 13:28:53 +02:00
|
|
|
|
2021-12-19 13:52:01 +01:00
|
|
|
- **Histogram(uint16_t length, float \*bounds)** constructor, get an array of boundary values and array length.
|
|
|
|
Length should be less than 65534.
|
2023-02-22 10:34:45 +01:00
|
|
|
- **Histogram8(uint16_t length, float \*bounds)** idem as above.
|
|
|
|
- **Histogram16(uint16_t length, float \*bounds)** idem as above.
|
2021-11-04 12:32:04 +01:00
|
|
|
- **~Histogram()** destructor.
|
2023-02-22 10:34:45 +01:00
|
|
|
- **~Histogram8()** destructor.
|
|
|
|
- **~Histogram16()** destructor.
|
2021-01-29 12:31:58 +01:00
|
|
|
|
2017-07-27 13:28:53 +02:00
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
#### maxBucket
|
|
|
|
|
|
|
|
Default the maxBucket size is defined as 255 (8 bit), 65535 (16 bit) or
|
|
|
|
2147483647 (32 bit) depending on class used.
|
|
|
|
The functions below allow to set and get the maxBucket so the **add()** and
|
|
|
|
**sub()** function will reach **FULL** faster.
|
|
|
|
Useful in some applications e.g. games.
|
|
|
|
|
|
|
|
- **void setMaxBucket(uint32_t value)** to have a user defined maxBucket level e.g 25
|
|
|
|
- **uint32_t getMaxBucket()** returns the current maxBucket.
|
|
|
|
|
|
|
|
Please note it makes no sense to set maxBucket to a value larger than
|
|
|
|
the histogram type can handle.
|
|
|
|
Setting maxBucket to 300 for **Histogram8** will always fail as data can only
|
|
|
|
handle values between 0 .. 255.
|
|
|
|
|
|
|
|
|
|
|
|
#### Base
|
|
|
|
|
|
|
|
- **uint8_t clear(float value = 0)** reset all bucket counters to value (default 0).
|
|
|
|
Returns status, see below.
|
|
|
|
- **uint8_t setBucket(const uint16_t index, int32_t value = 0)** store / overwrite a value of bucket.
|
|
|
|
Returns status, see below.
|
|
|
|
- **uint8_t add(float value)** add a value, increase count of bucket.
|
|
|
|
Returns status, see below.
|
|
|
|
- **uint8_t sub(float value)** 'add' a value, decrease (subtract) count of bucket.
|
|
|
|
This is less used and has some side effects, see **frequency()**.
|
|
|
|
Returns status, see below.
|
|
|
|
|
|
|
|
|
|
|
|
| Status | Value | Description |
|
|
|
|
|:------------------:|:-------:|:------------:|
|
|
|
|
| HISTO_OK | 0x00 | all is well
|
|
|
|
| HISTO_FULL | 0x01 | add() / sub() caused bucket full ( + or - )
|
|
|
|
| HISTO_ERR_FULL | 0xFF | cannot add() / sub(), overflow / underflow
|
|
|
|
| HISTO_ERR_LENGTH | 0xFE | length = 0 error (constructor)
|
|
|
|
|
2021-11-04 12:32:04 +01:00
|
|
|
|
|
|
|
- **uint16_t size()** returns number of buckets.
|
|
|
|
- **uint32_t count()** returns total number of values added (or subtracted).
|
2023-07-24 12:51:25 +02:00
|
|
|
- **int32_t bucket(uint16_t index)** returns the count of single bucket.
|
|
|
|
Can be negative if one uses **sub()**
|
|
|
|
- **float frequency(uint16_t index)** returns the relative frequency of a bucket.
|
|
|
|
This is always between -1.0 and 1.0.
|
|
|
|
|
|
|
|
Some notes about **frequency()**
|
|
|
|
- can return a negative value if an application uses **sub()**
|
|
|
|
- sum of all buckets will not add up to 1.0 if one uses **sub()**
|
|
|
|
- value (and thus sum) will deviate if **HISTO_ERR_FULL** has occurred.
|
2021-11-04 12:32:04 +01:00
|
|
|
|
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
#### Helper functions
|
2021-11-04 12:32:04 +01:00
|
|
|
|
|
|
|
- **uint16_t find(float value)** returns the index of the bucket for value.
|
|
|
|
- **uint16_t findMin()** returns the (first) index of the bucket with the minimum value.
|
|
|
|
- **uint16_t findMax()** returns the (first) index of the bucket with the maximum value.
|
|
|
|
- **uint16_t countLevel(int32_t level)** returns the number of buckets with exact that level (count).
|
|
|
|
- **uint16_t countAbove(int32_t level)** returns the number of buckets above level.
|
|
|
|
- **uint16_t countBelow(int32_t level)** returns the number of buckets below level.
|
|
|
|
|
2017-07-27 13:28:53 +02:00
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
#### Probability Distribution Functions
|
2021-01-29 12:31:58 +01:00
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
There are three experimental functions:
|
2021-11-04 12:32:04 +01:00
|
|
|
|
2023-02-22 10:34:45 +01:00
|
|
|
- **float PMF(float value)** Probability Mass Function.
|
|
|
|
Quite similar to **frequency()**, but uses a value as parameter.
|
2021-11-04 12:32:04 +01:00
|
|
|
- **float CDF(float value)** Cumulative Distribution Function.
|
|
|
|
Returns the sum of frequencies <= value. Always between 0.0 and 1.0.
|
2023-02-22 10:34:45 +01:00
|
|
|
- **float VAL(float probability)** Value Function, is **CDF()** inverted.
|
2021-11-04 12:32:04 +01:00
|
|
|
Returns the value of the original array for which the CDF is at least probability.
|
2023-07-24 12:51:25 +02:00
|
|
|
- **int32_t sum()** returns the sum of all buckets. (not experimental).
|
|
|
|
Just as with **frequency()** it is affected by the use of **sub()**,
|
|
|
|
including returning a negative value.
|
2017-07-27 13:28:53 +02:00
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
As most Arduino sketches typical uses a small number of buckets these functions
|
|
|
|
are quite coarse and/or inaccurate, so indicative at best.
|
|
|
|
Linear interpolation within "last" bucket needs to be investigated, however it
|
|
|
|
introduces its own uncertainty. Alternative is to add last box for 50%.
|
2017-07-27 13:28:53 +02:00
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
Note **PDF()** is a continuous function and therefore not applicable in a discrete histogram.
|
2017-07-27 13:28:53 +02:00
|
|
|
|
|
|
|
|
2023-02-22 10:34:45 +01:00
|
|
|
- https://en.wikipedia.org/wiki/Probability_mass_function PMF()
|
|
|
|
- https://en.wikipedia.org/wiki/Cumulative_distribution_function CDF() + VAL()
|
2023-07-24 12:51:25 +02:00
|
|
|
- https://en.wikipedia.org/wiki/Probability_density_function PDF()
|
2017-07-27 13:28:53 +02:00
|
|
|
|
2021-11-04 12:32:04 +01:00
|
|
|
|
|
|
|
## Future
|
|
|
|
|
2022-11-09 10:42:12 +01:00
|
|
|
|
2023-02-22 10:34:45 +01:00
|
|
|
#### Must
|
|
|
|
|
2022-11-09 10:42:12 +01:00
|
|
|
- improve documentation
|
|
|
|
|
2023-02-22 10:34:45 +01:00
|
|
|
#### Should
|
2022-11-09 10:42:12 +01:00
|
|
|
|
|
|
|
- investigate performance - **find()** the right bucket.
|
2023-07-24 12:51:25 +02:00
|
|
|
- Binary search is faster (above 20)
|
2021-12-19 13:52:01 +01:00
|
|
|
- need testing.
|
2023-07-24 12:51:25 +02:00
|
|
|
- mixed search, last part (< 20) linear?
|
2021-11-04 12:32:04 +01:00
|
|
|
- improve accuracy - linear interpolation for **PMF()**, **CDF()** and **VAL()**
|
|
|
|
- performance - merge loops in **PMF()**
|
|
|
|
- performance - reverse loops - compare to zero.
|
2022-11-09 10:42:12 +01:00
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
|
2023-02-22 10:34:45 +01:00
|
|
|
#### Could
|
2022-11-09 10:42:12 +01:00
|
|
|
|
2023-07-24 12:51:25 +02:00
|
|
|
- **saturation()** indication of the whole histogram
|
|
|
|
- count / nr of bins?
|
2022-11-09 10:42:12 +01:00
|
|
|
- percentage readOut == frequency()
|
2023-07-24 12:51:25 +02:00
|
|
|
- **float getBucketPercent(idx)**
|
|
|
|
- template class <bucketsizeType>.
|
|
|
|
|
2022-11-09 10:42:12 +01:00
|
|
|
|
2023-02-22 10:34:45 +01:00
|
|
|
#### Wont
|
|
|
|
|
2022-11-09 10:42:12 +01:00
|
|
|
- merge bins
|
2023-07-24 12:51:25 +02:00
|
|
|
- 2D histograms ? e.g. positions on a grid.
|
|
|
|
- see SparseMatrix
|
2021-11-04 12:32:04 +01:00
|
|
|
|