2021-11-27 10:28:52 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[![Arduino CI](https://github.com/RobTillaart/float16/workflows/Arduino%20CI/badge.svg)](https://github.com/marketplace/actions/arduino_ci)
|
|
|
|
|
[![Arduino-lint](https://github.com/RobTillaart/float16/actions/workflows/arduino-lint.yml/badge.svg)](https://github.com/RobTillaart/float16/actions/workflows/arduino-lint.yml)
|
|
|
|
|
[![JSON check](https://github.com/RobTillaart/float16/actions/workflows/jsoncheck.yml/badge.svg)](https://github.com/RobTillaart/float16/actions/workflows/jsoncheck.yml)
|
|
|
|
|
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/RobTillaart/float16/blob/master/LICENSE)
|
|
|
|
|
[![GitHub release](https://img.shields.io/github/release/RobTillaart/float16.svg?maxAge=3600)](https://github.com/RobTillaart/Complex/releases)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# float16
|
|
|
|
|
|
|
|
|
|
Arduino library to implement float16 data type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Description
|
|
|
|
|
|
2021-12-18 08:15:24 -05:00
|
|
|
|
This **experimental** library defines the float16 (2 byte) data type, including conversion
|
2021-11-27 10:28:52 -05:00
|
|
|
|
function to and from float32 type. It is definitely **work in progress**.
|
|
|
|
|
|
2021-12-18 08:15:24 -05:00
|
|
|
|
The library implements the **Printable** interface so one can directly print the
|
2021-11-27 10:28:52 -05:00
|
|
|
|
float16 values in any stream e.g. Serial.
|
|
|
|
|
|
2021-12-18 08:15:24 -05:00
|
|
|
|
The primary usage of the float16 data type is to efficiently store and transport
|
|
|
|
|
a floating point number. As it uses only 2 bytes where float and double have typical
|
2021-12-02 14:27:32 -05:00
|
|
|
|
4 and 8 bytes, gains can be made at the price of range and precision.
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Specifications
|
|
|
|
|
|
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
| attribute | value | notes |
|
|
|
|
|
|:----------|:-------------|:--------|
|
|
|
|
|
| size | 2 bytes | layout s eeeee mmmmmmmmmm
|
2021-12-18 08:15:24 -05:00
|
|
|
|
| sign | 1 bit |
|
|
|
|
|
| exponent | 5 bit |
|
|
|
|
|
| mantissa | 11 bit | ~ 3 digits
|
|
|
|
|
| minimum | 5.96046 E−8 | smallest positive number.
|
|
|
|
|
| | 1.0009765625 | 1 + 2^−10 = smallest nr larger than 1.
|
|
|
|
|
| maximum | 65504 |
|
|
|
|
|
| | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### example values
|
|
|
|
|
|
|
|
|
|
```cpp
|
|
|
|
|
/*
|
|
|
|
|
SIGN EXP MANTISSA
|
|
|
|
|
0 01111 0000000000 = 1
|
|
|
|
|
0 01111 0000000001 = 1 + 2−10 = 1.0009765625 (next smallest float after 1)
|
|
|
|
|
1 10000 0000000000 = −2
|
|
|
|
|
|
|
|
|
|
0 11110 1111111111 = 65504 (max half precision)
|
|
|
|
|
|
|
|
|
|
0 00001 0000000000 = 2−14 ≈ 6.10352 × 10−5 (minimum positive normal)
|
|
|
|
|
0 00000 1111111111 = 2−14 - 2−24 ≈ 6.09756 × 10−5 (maximum subnormal)
|
|
|
|
|
0 00000 0000000001 = 2−24 ≈ 5.96046 × 10−8 (minimum positive subnormal)
|
|
|
|
|
|
|
|
|
|
0 00000 0000000000 = 0
|
|
|
|
|
1 00000 0000000000 = −0
|
|
|
|
|
|
|
|
|
|
0 11111 0000000000 = infinity
|
|
|
|
|
1 11111 0000000000 = −infinity
|
|
|
|
|
|
|
|
|
|
0 01101 0101010101 = 0.333251953125 ≈ 1/3
|
|
|
|
|
*/
|
|
|
|
|
```
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Interface
|
|
|
|
|
|
|
|
|
|
to elaborate
|
|
|
|
|
|
|
|
|
|
#### Constructors
|
|
|
|
|
|
|
|
|
|
- **float16(void)** defaults to zero.
|
|
|
|
|
- **float16(double f)** constructor.
|
|
|
|
|
- **float16(const float16 &f)** copy constructor.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### Conversion
|
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
- **double toDouble(void)** convert to double (or float).
|
|
|
|
|
- **uint16_t getBinary()** get the 2 byte binary representation.
|
|
|
|
|
- **void setBinary(uint16_t u)** set the 2 bytes binary representation.
|
2021-11-27 10:28:52 -05:00
|
|
|
|
- **size_t printTo(Print& p) const** Printable interface.
|
|
|
|
|
- **void setDecimals(uint8_t d)** idem, used for printTo.
|
|
|
|
|
- **uint8_t getDecimals()** idem.
|
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
Note the setDecimals takes one byte per object which is not efficient for arrays of float16.
|
2021-11-27 10:28:52 -05:00
|
|
|
|
See array example for efficient storage using set/getBinary() functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### Compare
|
|
|
|
|
|
2021-12-18 08:15:24 -05:00
|
|
|
|
Standard compare functions. Since 0.1.5 these are quite optimized,
|
2021-12-02 14:27:32 -05:00
|
|
|
|
so it is fast to compare e.g. 2 measurements.
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
- **bool operator == (const float16& f)**
|
|
|
|
|
- **bool operator != (const float16& f)**
|
|
|
|
|
- **bool operator > (const float16& f)**
|
|
|
|
|
- **bool operator >= (const float16& f)**
|
|
|
|
|
- **bool operator < (const float16& f)**
|
|
|
|
|
- **bool operator <= (const float16& f)**
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
#### Math (basic)
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
Math is done by converting to double, do the math and convert back.
|
2021-12-18 08:15:24 -05:00
|
|
|
|
These operators are added for convenience only.
|
2021-12-02 14:27:32 -05:00
|
|
|
|
Not planned to optimize these.
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
- **float16 operator + (const float16& f)**
|
|
|
|
|
- **float16 operator - (const float16& f)**
|
|
|
|
|
- **float16 operator \* (const float16& f)**
|
|
|
|
|
- **float16 operator / (const float16& f)**
|
|
|
|
|
- **float16& operator += (const float16& f)**
|
|
|
|
|
- **float16& operator -= (const float16& f)**
|
|
|
|
|
- **float16& operator \*= (const float16& f)**
|
|
|
|
|
- **float16& operator /= (const float16& f)**
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
negation operator.
|
|
|
|
|
- **float16 operator - ()** fast negation.
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
- **int sign()** returns 1 == positive, 0 == zero, -1 == negative.
|
|
|
|
|
- **bool isZero()** returns true if zero. slightly faster than **sign()**.
|
|
|
|
|
- **bool isInf()** returns true if value is (-)infinite.
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
## Notes
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
## Future
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
|
|
|
|
|
2021-12-18 08:15:24 -05:00
|
|
|
|
#### 0.1.x
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
- update documentation.
|
|
|
|
|
- unit tests of the above.
|
|
|
|
|
- isNan().
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### later
|
|
|
|
|
|
2021-12-02 14:27:32 -05:00
|
|
|
|
- update documentation.
|
|
|
|
|
- error handling.
|
|
|
|
|
- divide by zero errors.
|
|
|
|
|
- look for optimizations.
|
|
|
|
|
- rewrite **f16tof32()** with bit magic.
|
|
|
|
|
- add storage example - with SD card, FRAM or EEPROM
|
|
|
|
|
- add communication example - serial or Ethernet?
|
2021-11-27 10:28:52 -05:00
|
|
|
|
|