RunningMedian
Arduino library to determine the running median by means of a circular buffer.
Description
Running Median looks like a running average with a small but important twist. Running average averages the last N samples while the running median takes the last N samples, sort them and take the middle one, or the average of the middle two in case the internal buffer size is even.
Important differences between running average and running median:
- Running median will return real data (e.g. a real sample from a sensor) if one uses an odd size of the buffer (therefore preferred). Running average may return a value that is never sampled.
- Running median will give zero weight to outliers, and 100% to the middle sample, whereas running average gives the same weight to all samples.
- Running median will give often constant values for some time.
- As one knows the values in the buffer one can predict the maximum change of the running median in the next steps in advance.
- Running median is slower as one needs to keep the values in timed order to remove the oldest and keep them sorted to be able to select the median.
Note: MEDIAN_MAX_SIZE
The maximum size of the internal buffer is defined by MEDIAN_MAX_SIZE and is set to 255 (since version 0.3.1). The memory allocated currently is in the order of 5 bytes per element plus some overhead, so 255 elements take ~1300 bytes. For an UNO this is quite a bit.
With larger sizes the performance penalty to keep the internal array sorted is large. For most applications a value much lower e.g. 19 is working well, and is performance wise O(100x) faster in sorting than 255 elements.
Note: Configurable Options
There are several options that can be configured via defines at compile time, those being:
- RUNNING_MEDIAN_USE_MALLOC: bool
- true (default): Dynamic memory allocation is used for the buffer.
- false: Static buffers of size MEDIAN_MAX_SIZE are used.
- MEDIAN_MIN_SIZE: uint8_t
- Dynamic / Static: The buffer stores at least this many items.
- should be minimal 3.
- MEDIAN_MAX_SIZE: uint8_t
- Dynamic: Not used.
- Static: The buffer stores at most this many items.
Related
- https://github.com/RobTillaart/Correlation
- https://github.com/RobTillaart/GST - Golden standard test metrics
- https://github.com/RobTillaart/Histogram
- https://github.com/RobTillaart/RunningAngle
- https://github.com/RobTillaart/RunningAverage
- https://github.com/RobTillaart/RunningMedian
- https://github.com/RobTillaart/statHelpers - combinations & permutations
- https://github.com/RobTillaart/Statistic
Interface
Constructor
- RunningMedian(const uint8_t size) Constructor, dynamically allocates memory.
- ~RunningMedian() Destructor.
- uint8_t getSize() returns size of internal array.
- uint8_t getCount() returns current used elements, getCount() <= getSize().
- bool isFull() returns true if the internal buffer is 100% filled.
Base functions
- clear() resets internal buffer and variables, effectively empty the buffer.
- add(const float value) adds a new value to internal buffer, optionally replacing the oldest element if the buffer is full.
- float getMedian() returns the median == middle element.
- float getAverage() returns average of all the values in the internal buffer.
- float getAverage(uint8_t nMedian) returns average of the middle n values. This effectively removes noise from the outliers in the samples. The function is improved in 0.3.8 to correct a bias, see #22.
- float getMedianAverage(uint8_t nMedian) almost same as above, except it compensates for alignment bias, see #22. This is done by adjusting the nMedian parameter (-1 or +1) if needed.
- float getHighest() get the largest values in the buffer.
- float getLowest() get the smallest value in the buffer.
- float getQuantile(const float quantile) returns the Quantile value from the buffer. This value is often interpolated.
getMedianAverage(nMedian)
getAverage(nMedian) and getMedianAverage(uint8_t nMedian) differ. When nMedian is odd and count is even or vice versa, the middle N are not perfectly in the middle. By auto-adjusting nMedian (-1 +1) this balance is restored.
Assume an internal size of 7 elements [0..6] then
- getAverage(4) will average element 1, 2, 3, 4
- getMedianAverage(4) will adjust nMedian and average element 2, 3, 4.
The example RunningMedian_getMedianAverage.ino shows the difference.
The implementation of getMedianAverage(uint8_t nMedian) is experimental and might change in the future. Idea is taking top and bottom elements only for 50% if needed, however that implies at least 2 extra float multiplications.
It is possible that the name getMedianAverage(uint8_t nMedian) will change in the future to be more descriptive.
Less used functions
- float getElement(const uint8_t n) returns the n'th element from the values in time order.
- float getSortedElement(const uint8_t n) returns the n'th element from the values in size order (sorted ascending).
- float predict(const uint8_t n) predict the maximum change of median after n additions, n must be smaller than getSize()/2.
SearchMode optimization
Since 0.3.7 the internal sort has been optimized. It is now possible to select between LINEAR (=0) and BINARY (=1) insertion sort. Pre-0.3.7 used linear insertion sort, and the new linear version is slightly optimized. For larger internal arrays the performance gain of BINARY mode is substantial.
- void setSearchMode(uint8_t searchMode = 0) 0 = linear, 1 = binary - see table below. Other values will set the searchMode to linear.
- uint8_t getSearchMode() returns the set mode
searchMode | value | notes |
---|---|---|
LINEAR | 0 | fastest for smaller internal buffers (default) |
BINARY | 1 | faster for larger internal buffers |
Depends on the board / clock used where the methods are equally fast.
Give it a try, and let me know your.
Operation
See examples.
Future
- improve documentation.
- check for optimizations.
- get the median without (full) sorting. QuickSelect()
- move all code to .cpp file