91 lines
3.8 KiB
Markdown
Raw Normal View History

2021-01-29 12:31:58 +01:00
[![Arduino CI](https://github.com/RobTillaart/Histogram/workflows/Arduino%20CI/badge.svg)](https://github.com/marketplace/actions/arduino_ci)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/RobTillaart/Histogram/blob/master/LICENSE)
[![GitHub release](https://img.shields.io/github/release/RobTillaart/Histogram.svg?maxAge=3600)](https://github.com/RobTillaart/Histogram/releases)
# Histogram
2017-07-27 13:28:53 +02:00
2020-11-27 11:16:22 +01:00
Arduino library for creating histograms math.
2017-07-27 13:28:53 +02:00
## Description
One of the main applications for the Arduino board is reading and logging of sensor data.
We often want to make a histogram of this data to get insight of the distribution of the
measurements. This is where this Histogram library comes in.
2021-01-29 12:31:58 +01:00
The Histogram distributes the values added to it into buckets and keeps count.
2017-07-27 13:28:53 +02:00
2021-01-29 12:31:58 +01:00
If you need more quantitative analysis, you might need the statistics library,
a- https://github.com/RobTillaart/Statistic
2017-07-27 13:28:53 +02:00
2021-01-29 12:31:58 +01:00
## Interface
### Constructor
- **Histogram(uint8_t len, float \*bounds)** constructor, get an array of boundary values and array length
- **~Histogram()** destructor
### Base
- **void clear()** reset all counters
- **void add(float val)** add a value, increase count of bucket
- **void sub(float val)** 'add' a value, but decrease count
- **uint8_t size()** number of buckets
- **unsigned long count()** total number of values added
- **long bucket(uint8_t idx)** count of single bucket, can be negative due to **sub()**
- **float frequency(uint8_t idx)** the relative frequency of a bucket
- **uint8_t find(float f)** find the bucket for value f
2017-07-27 13:28:53 +02:00
When the class is initialized an array of the boundaries to define the borders of the
buckets is passed to the constructor. This array should be declared global as the
Histogram class does not copy the values to keep memory usage low. This allows to change
2021-01-29 12:31:58 +01:00
the boundaries runtime, so after a **clear()**, a new Histogram can be created.
The values in the boundary array do not need to be equidistant (equal in size).
2017-07-27 13:28:53 +02:00
Internally the library does not record the individual values, only the count per bucket.
2021-01-29 12:31:58 +01:00
If a new value is added - **add()** or **sub()** - the class checks in which bucket it belongs
2017-07-27 13:28:53 +02:00
and the buckets counter is increased.
2021-01-29 12:31:58 +01:00
The **sub()** function is used to decrease the count of a bucket and it can cause the count
2017-07-27 13:28:53 +02:00
to become below zero. ALthough seldom used but still depending on the application it can
be useful. E.g. when you want to compare two value generating streams, you let one stream
2021-01-29 12:31:58 +01:00
**add()** and the other **sub()**. If the histogram of both streams is similar they should cancel
each other out (more or less), and the value of all buckets should be around 0. \[not tried\].
The **frequency()** function may be removed to reduce footprint as it can be calculated with
the formula **(1.0 \* bucket(i))/count()**.
2017-07-27 13:28:53 +02:00
2021-01-29 12:31:58 +01:00
### Experimental
2017-07-27 13:28:53 +02:00
2021-01-29 12:31:58 +01:00
- **float PMF(float val)** Probability Mass Function
- **float CDF(float val)** Cumulative Distribution Function
- **float VAL(float prob)** Value Function
There are three experimental functions:
- **PMF()** is quite similar to frequency, but uses a value as parameter.
- **CDF()** gives the sum of frequencies <= value.
- **VAL()** is **CDF()** inverted.
2017-07-27 13:28:53 +02:00
As the Arduino typical uses a small number of buckets these functions are quite
coarse/inaccurate (linear interpolation within bucket is still to be investigated)
## Todo list
2021-01-29 12:31:58 +01:00
- Copy the boundaries array?
- Additional values per bucket.
- Sum, Min, Max, (average can be derived)
- separate bucket-array for sub()
- improve strategy for **find()** the right bucket..
- investigate linear interpolation for **PMF()**, **CDF()** and **VAL()** functions to improve accuracy.
- explain **PMF()**, **CDF()** and **VAL()** functions
- clear individual buckets
- merge buckets
- bucket full / overflow warning.
- make github issues of the above...
2017-07-27 13:28:53 +02:00
2021-01-29 12:31:58 +01:00
## Operation
2017-07-27 13:28:53 +02:00
2021-01-29 12:31:58 +01:00
See examples
2017-07-27 13:28:53 +02:00