2021-01-29 12:31:58 +01:00
|
|
|
|
|
|
|
[![Arduino CI](https://github.com/RobTillaart/Correlation/workflows/Arduino%20CI/badge.svg)](https://github.com/marketplace/actions/arduino_ci)
|
2021-08-26 17:18:52 +02:00
|
|
|
[![Arduino-lint](https://github.com/RobTillaart/Correlation/actions/workflows/arduino-lint.yml/badge.svg)](https://github.com/RobTillaart/Correlation/actions/workflows/arduino-lint.yml)
|
|
|
|
[![JSON check](https://github.com/RobTillaart/Correlation/actions/workflows/jsoncheck.yml/badge.svg)](https://github.com/RobTillaart/Correlation/actions/workflows/jsoncheck.yml)
|
2023-10-19 09:38:48 +02:00
|
|
|
[![GitHub issues](https://img.shields.io/github/issues/RobTillaart/Correlation.svg)](https://github.com/RobTillaart/Correlation/issues)
|
|
|
|
|
2021-01-29 12:31:58 +01:00
|
|
|
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/RobTillaart/Correlation/blob/master/LICENSE)
|
|
|
|
[![GitHub release](https://img.shields.io/github/release/RobTillaart/Correlation.svg?maxAge=3600)](https://github.com/RobTillaart/Correlation/releases)
|
2023-10-19 09:38:48 +02:00
|
|
|
[![PlatformIO Registry](https://badges.registry.platformio.org/packages/robtillaart/library/Correlation.svg)](https://registry.platformio.org/libraries/robtillaart/Correlation)
|
2021-01-29 12:31:58 +01:00
|
|
|
|
|
|
|
|
2020-05-18 14:53:34 +02:00
|
|
|
# Correlation
|
|
|
|
|
2021-01-29 12:31:58 +01:00
|
|
|
Arduino Library to determine linear correlation between X and Y datasets.
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2021-01-29 12:31:58 +01:00
|
|
|
|
|
|
|
## Description
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
This library calculates the coefficients of the linear correlation
|
|
|
|
between two (relative small) datasets. The size of these datasets is
|
|
|
|
20 by default. The size can be set in the constructor.
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2021-01-29 12:31:58 +01:00
|
|
|
Please note that the correlation uses about ~50 bytes per instance,
|
|
|
|
and 2 floats == 8 bytes per pair of elements.
|
|
|
|
So ~120 elements will use up 50% of the RAM of an UNO.
|
|
|
|
|
2021-12-14 16:39:48 +01:00
|
|
|
The formula of the correlation is expressed as **Y = A + B \* X**.
|
2021-08-27 16:16:35 +02:00
|
|
|
|
|
|
|
If all points are on a vertical line, the parameter B will be NAN,
|
2023-01-22 15:55:51 +01:00
|
|
|
This will happen if the **sumXi2** is zero or very small.
|
2021-08-27 16:16:35 +02:00
|
|
|
|
2020-05-18 14:53:34 +02:00
|
|
|
Use with care.
|
|
|
|
|
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Related
|
|
|
|
|
|
|
|
- https://github.com/RobTillaart/Correlation
|
|
|
|
- https://github.com/RobTillaart/GST - Golden standard test metrics
|
2023-10-19 09:38:48 +02:00
|
|
|
- https://github.com/RobTillaart/Histogram
|
2023-01-22 15:55:51 +01:00
|
|
|
- https://github.com/RobTillaart/RunningAngle
|
|
|
|
- https://github.com/RobTillaart/RunningAverage
|
|
|
|
- https://github.com/RobTillaart/RunningMedian
|
|
|
|
- https://github.com/RobTillaart/statHelpers - combinations & permutations
|
|
|
|
- https://github.com/RobTillaart/Statistic
|
|
|
|
|
|
|
|
|
2021-01-29 12:31:58 +01:00
|
|
|
## Interface
|
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
```cpp
|
|
|
|
#include "Correlation.h"
|
|
|
|
```
|
2021-01-29 12:31:58 +01:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Constructor
|
2021-01-29 12:31:58 +01:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
- **Correlation(uint8_t size = 20)** allocates the array needed and resets internal admin.
|
2021-12-14 16:39:48 +01:00
|
|
|
Size should be between 1 and 255. Size = 0 will set the size to 20.
|
2021-01-29 12:31:58 +01:00
|
|
|
- **~Correlation()** frees the allocated arrays.
|
2020-05-18 14:53:34 +02:00
|
|
|
|
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Base functions
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2021-12-14 16:39:48 +01:00
|
|
|
- **bool add(float x, float y)** adds a pair of **floats** to the internal storage array's.
|
2021-01-29 12:31:58 +01:00
|
|
|
Returns true if the value is added, returns false when internal array is full.
|
2021-08-27 16:16:35 +02:00
|
|
|
When running correlation is set, **add()** will replace the oldest element and return true.
|
|
|
|
Warning: **add()** does not check if the floats are NAN or INFINITE.
|
2023-01-22 15:55:51 +01:00
|
|
|
- **uint8_t count()** returns the amount of items in the internal arrays.
|
2021-08-27 16:16:35 +02:00
|
|
|
This number is always between 0 ..**size()**
|
2021-01-29 12:31:58 +01:00
|
|
|
- **uint8_t size()** returns the size of the internal arrays.
|
2021-12-14 16:39:48 +01:00
|
|
|
- **void clear()** resets the data structures to the start condition (zero elements added).
|
2023-01-22 15:55:51 +01:00
|
|
|
- **bool calculate()** does the math to calculate the correlation parameters A, B and R.
|
2021-01-29 12:31:58 +01:00
|
|
|
This function will be called automatically when needed.
|
2023-01-22 15:55:51 +01:00
|
|
|
You can call it on a more convenient time.
|
2021-01-29 12:31:58 +01:00
|
|
|
Returns false if nothing to calculate **count == 0**
|
2021-08-27 16:16:35 +02:00
|
|
|
- **void setR2Calculation(bool)** enables / disables the calculation of Rsquared.
|
|
|
|
- **bool getR2Calculation()** returns the flag set.
|
|
|
|
- **void setE2Calculation(bool)** enables / disables the calculation of Esquared.
|
|
|
|
- **bool getE2Calculation()** returns the flag set.
|
|
|
|
|
|
|
|
After the calculation the following functions can be called to return the core values.
|
2021-01-29 12:31:58 +01:00
|
|
|
- **float getA()** returns the A parameter of formula **Y = A + B \* X**
|
|
|
|
- **float getB()** returns the B parameter of formula **Y = A + B \* X**
|
2021-08-27 16:16:35 +02:00
|
|
|
- **float getR()** returns the correlation coefficient R which is always between -1 .. 1
|
2023-01-22 15:55:51 +01:00
|
|
|
The closer to 0 the less correlation there is between X and Y.
|
|
|
|
Correlation can be positive or negative.
|
2021-12-14 16:39:48 +01:00
|
|
|
Most often the Rsquared **R x R** is used.
|
2021-08-27 16:16:35 +02:00
|
|
|
- **float getRsquare()** returns **R x R** which is always between 0.. 1.
|
2021-01-29 12:31:58 +01:00
|
|
|
- **float getEsquare()** returns the error squared to get an indication of the
|
2021-08-27 16:16:35 +02:00
|
|
|
quality of the correlation.
|
2022-06-21 08:22:05 +02:00
|
|
|
- **float getAverageX()** returns the average of all elements in the X dataset.
|
|
|
|
- **float getAverageY()** returns the average of all elements in the Y dataset.
|
2021-08-26 17:18:52 +02:00
|
|
|
- **float getEstimateX(float y)** use to calculate the estimated X for a given Y.
|
|
|
|
- **float getEstimateY(float x)** use to calculate the estimated Y for a given X.
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2021-01-29 12:31:58 +01:00
|
|
|
|
2021-08-27 16:16:35 +02:00
|
|
|
#### Correlation Coefficient R
|
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
Indicative description of the correlation value.
|
2021-08-27 16:16:35 +02:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
| R | correlation |
|
|
|
|
|:--------------:|:--------------|
|
|
|
|
| +1.0 | Perfect |
|
|
|
|
| +0.8 to +1.0 | Very strong |
|
|
|
|
| +0.6 to +0.8 | Strong |
|
|
|
|
| +0.4 to +0.6 | Moderate |
|
|
|
|
| +0.2 to +0.4 | Weak |
|
|
|
|
| 0.0 to +0.2 | Very weak |
|
|
|
|
| 0.0 to -0.2 | Very weak |
|
|
|
|
| -0.2 to -0.4 | Weak |
|
|
|
|
| -0.4 to -0.6 | Moderate |
|
|
|
|
| -0.6 to -0.8 | Strong |
|
|
|
|
| -0.8 to -1.0 | Very strong |
|
|
|
|
| -1.0 | Perfect |
|
2021-08-27 16:16:35 +02:00
|
|
|
|
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Running correlation
|
2021-01-29 12:31:58 +01:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
- **void setRunningCorrelation(bool rc)** sets the internal variable runningMode
|
|
|
|
which allows **add()** to overwrite old elements in the internal arrays.
|
2021-08-27 16:16:35 +02:00
|
|
|
- **bool getRunningCorrelation()** returns the runningMode flag.
|
2021-01-29 12:31:58 +01:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
The running correlation will be calculated over the last **count** elements.
|
2021-12-14 16:39:48 +01:00
|
|
|
If the array is full, count will be size.
|
2023-01-22 15:55:51 +01:00
|
|
|
This running correlation allows for more adaptive formula finding e.g. find the
|
2021-12-14 16:39:48 +01:00
|
|
|
relation between temperature and humidity per hour, and how it changes over time.
|
2021-01-29 12:31:58 +01:00
|
|
|
|
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Statistical
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2021-01-29 12:31:58 +01:00
|
|
|
These functions give an indication of the "trusted interval" for estimations.
|
|
|
|
The idea is that for **getEstimateX()** the further outside the range defined
|
|
|
|
by **getMinX()** and **getMaxX()**, the less the result can be trusted.
|
|
|
|
It also depends on **R** of course. Idem for **getEstimateY()**
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2021-01-29 12:31:58 +01:00
|
|
|
- **float getMinX()** idem
|
|
|
|
- **float getMaxX()** idem
|
|
|
|
- **float getMinY()** idem
|
|
|
|
- **float getMaxY()** idem
|
2020-05-18 14:53:34 +02:00
|
|
|
|
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Debugging / educational
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2021-12-14 16:39:48 +01:00
|
|
|
Normally not used. For all these functions index should be < count!
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2021-12-14 16:39:48 +01:00
|
|
|
- **bool setXY(uint8_t index, float x, float y)** overwrites a pair of values.
|
2021-08-27 16:16:35 +02:00
|
|
|
Returns true if succeeded.
|
2021-12-14 16:39:48 +01:00
|
|
|
- **bool setX(uint8_t index, float x)** overwrites single X.
|
|
|
|
- **bool setY(uint8_t index, float y)** overwrites single Y.
|
|
|
|
- **float getX(uint8_t index)** returns single value.
|
|
|
|
- **float getY(uint8_t index)** returns single value.
|
2022-06-21 08:22:05 +02:00
|
|
|
- **float getSumXY()** returns sum(Xi \* Yi).
|
|
|
|
- **float getSumX2()** returns sum(Xi \* Xi).
|
|
|
|
- **float getSumY2()** returns sum(Yi \* Yi).
|
|
|
|
|
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Obsolete since 0.3.0
|
2022-06-21 08:22:05 +02:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
To improve readability the following functions are replaced.
|
2022-06-21 08:22:05 +02:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
- **float getAvgX()** ==> **getAverageX()**
|
|
|
|
- **float getAvgY()** ==> **getAverageY()**
|
|
|
|
- **float getSumXiYi()** ==> **getSumXY()**
|
|
|
|
- **float getSumXi2()** ==> **getSumX2()**
|
|
|
|
- **float getSumYi2()** ==> **getSumY2()**
|
2021-01-29 12:31:58 +01:00
|
|
|
|
|
|
|
|
|
|
|
## Future
|
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Must
|
|
|
|
|
|
|
|
- improve documentation
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Should
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
- examples
|
|
|
|
- real world if possible.
|
2022-06-21 08:22:05 +02:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Could
|
2022-06-21 08:22:05 +02:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
- Template version?
|
|
|
|
The constructor should get a TYPE parameter, as this
|
|
|
|
allows smaller data types to be analysed taking less memory.
|
|
|
|
- move code from .h to .cpp
|
2022-06-21 08:22:05 +02:00
|
|
|
|
2023-01-22 15:55:51 +01:00
|
|
|
#### Wont
|
2022-06-21 08:22:05 +02:00
|
|
|
|
2020-05-18 14:53:34 +02:00
|
|
|
|
2023-10-19 09:38:48 +02:00
|
|
|
## Support
|
|
|
|
|
|
|
|
If you appreciate my libraries, you can support the development and maintenance.
|
|
|
|
Improve the quality of the libraries by providing issues and Pull Requests, or
|
|
|
|
donate through PayPal or GitHub sponsors.
|
|
|
|
|
|
|
|
Thank you,
|
|
|
|
|