GY-63_MS5611/libraries/float16/README.md

152 lines
4.7 KiB
Markdown
Raw Normal View History

2021-11-27 10:28:52 -05:00
[![Arduino CI](https://github.com/RobTillaart/float16/workflows/Arduino%20CI/badge.svg)](https://github.com/marketplace/actions/arduino_ci)
[![Arduino-lint](https://github.com/RobTillaart/float16/actions/workflows/arduino-lint.yml/badge.svg)](https://github.com/RobTillaart/float16/actions/workflows/arduino-lint.yml)
[![JSON check](https://github.com/RobTillaart/float16/actions/workflows/jsoncheck.yml/badge.svg)](https://github.com/RobTillaart/float16/actions/workflows/jsoncheck.yml)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/RobTillaart/float16/blob/master/LICENSE)
[![GitHub release](https://img.shields.io/github/release/RobTillaart/float16.svg?maxAge=3600)](https://github.com/RobTillaart/Complex/releases)
# float16
Arduino library to implement float16 data type.
## Description
2021-12-18 08:15:24 -05:00
This **experimental** library defines the float16 (2 byte) data type, including conversion
2021-11-27 10:28:52 -05:00
function to and from float32 type. It is definitely **work in progress**.
2021-12-18 08:15:24 -05:00
The library implements the **Printable** interface so one can directly print the
2021-11-27 10:28:52 -05:00
float16 values in any stream e.g. Serial.
2021-12-18 08:15:24 -05:00
The primary usage of the float16 data type is to efficiently store and transport
a floating point number. As it uses only 2 bytes where float and double have typical
2021-12-02 14:27:32 -05:00
4 and 8 bytes, gains can be made at the price of range and precision.
2021-11-27 10:28:52 -05:00
## Specifications
2021-12-02 14:27:32 -05:00
| attribute | value | notes |
|:----------|:-------------|:--------|
| size | 2 bytes | layout s eeeee mmmmmmmmmm
2021-12-18 08:15:24 -05:00
| sign | 1 bit |
| exponent | 5 bit |
| mantissa | 11 bit | ~ 3 digits
| minimum | 5.96046 E8 | smallest positive number.
| | 1.0009765625 | 1 + 2^10 = smallest nr larger than 1.
| maximum | 65504 |
| | |
#### example values
```cpp
/*
SIGN EXP MANTISSA
0 01111 0000000000 = 1
0 01111 0000000001 = 1 + 210 = 1.0009765625 (next smallest float after 1)
1 10000 0000000000 = 2
0 11110 1111111111 = 65504 (max half precision)
0 00001 0000000000 = 214 ≈ 6.10352 × 105 (minimum positive normal)
0 00000 1111111111 = 214 - 224 ≈ 6.09756 × 105 (maximum subnormal)
0 00000 0000000001 = 224 ≈ 5.96046 × 108 (minimum positive subnormal)
0 00000 0000000000 = 0
1 00000 0000000000 = 0
0 11111 0000000000 = infinity
1 11111 0000000000 = infinity
0 01101 0101010101 = 0.333251953125 ≈ 1/3
*/
```
2021-11-27 10:28:52 -05:00
## Interface
to elaborate
#### Constructors
- **float16(void)** defaults to zero.
- **float16(double f)** constructor.
- **float16(const float16 &f)** copy constructor.
#### Conversion
2021-12-02 14:27:32 -05:00
- **double toDouble(void)** convert to double (or float).
- **uint16_t getBinary()** get the 2 byte binary representation.
- **void setBinary(uint16_t u)** set the 2 bytes binary representation.
2021-11-27 10:28:52 -05:00
- **size_t printTo(Print& p) const** Printable interface.
- **void setDecimals(uint8_t d)** idem, used for printTo.
- **uint8_t getDecimals()** idem.
2021-12-02 14:27:32 -05:00
Note the setDecimals takes one byte per object which is not efficient for arrays of float16.
2021-11-27 10:28:52 -05:00
See array example for efficient storage using set/getBinary() functions.
#### Compare
2021-12-18 08:15:24 -05:00
Standard compare functions. Since 0.1.5 these are quite optimized,
2021-12-02 14:27:32 -05:00
so it is fast to compare e.g. 2 measurements.
2021-11-27 10:28:52 -05:00
2021-12-02 14:27:32 -05:00
- **bool operator == (const float16& f)**
- **bool operator != (const float16& f)**
- **bool operator > (const float16& f)**
- **bool operator >= (const float16& f)**
- **bool operator < (const float16& f)**
- **bool operator <= (const float16& f)**
2021-11-27 10:28:52 -05:00
2021-12-02 14:27:32 -05:00
#### Math (basic)
2021-11-27 10:28:52 -05:00
2021-12-02 14:27:32 -05:00
Math is done by converting to double, do the math and convert back.
2021-12-18 08:15:24 -05:00
These operators are added for convenience only.
2021-12-02 14:27:32 -05:00
Not planned to optimize these.
2021-11-27 10:28:52 -05:00
2021-12-02 14:27:32 -05:00
- **float16 operator + (const float16& f)**
- **float16 operator - (const float16& f)**
- **float16 operator \* (const float16& f)**
- **float16 operator / (const float16& f)**
- **float16& operator += (const float16& f)**
- **float16& operator -= (const float16& f)**
- **float16& operator \*= (const float16& f)**
- **float16& operator /= (const float16& f)**
2021-11-27 10:28:52 -05:00
2021-12-02 14:27:32 -05:00
negation operator.
- **float16 operator - ()** fast negation.
2021-11-27 10:28:52 -05:00
2021-12-02 14:27:32 -05:00
- **int sign()** returns 1 == positive, 0 == zero, -1 == negative.
- **bool isZero()** returns true if zero. slightly faster than **sign()**.
- **bool isInf()** returns true if value is (-)infinite.
2021-11-27 10:28:52 -05:00
2021-12-02 14:27:32 -05:00
## Notes
2021-11-27 10:28:52 -05:00
2021-12-02 14:27:32 -05:00
## Future
2021-11-27 10:28:52 -05:00
2021-12-18 08:15:24 -05:00
#### 0.1.x
2021-11-27 10:28:52 -05:00
2021-12-02 14:27:32 -05:00
- update documentation.
- unit tests of the above.
- isNan().
2021-11-27 10:28:52 -05:00
#### later
2021-12-02 14:27:32 -05:00
- update documentation.
- error handling.
- divide by zero errors.
- look for optimizations.
- rewrite **f16tof32()** with bit magic.
- add storage example - with SD card, FRAM or EEPROM
- add communication example - serial or Ethernet?
2021-11-27 10:28:52 -05:00