mirror of
https://github.com/RobTillaart/Arduino.git
synced 2024-09-19 16:46:11 -04:00
0.3.0 float16
This commit is contained in:
parent
27d550815d
commit
c6851e9ae3
@ -6,6 +6,19 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/)
|
|||||||
and this project adheres to [Semantic Versioning](http://semver.org/).
|
and this project adheres to [Semantic Versioning](http://semver.org/).
|
||||||
|
|
||||||
|
|
||||||
|
## [0.3.0] - 2024-04-17
|
||||||
|
- Fix #12, breaking change. Thanks to Andyjbm for the measurements.
|
||||||
|
- remove Printable interface as it makes the effective footprint larger!
|
||||||
|
- remove getDecimals() and setDecimals().
|
||||||
|
- patch examples and unit test for the above.
|
||||||
|
- add example **float16_sizeof_array.ino**.
|
||||||
|
- add **isPosInf()** and **isNegInf()**
|
||||||
|
- add link to **float16ext** class with a larger range than float16.
|
||||||
|
- update readme.md.
|
||||||
|
- update unit-tests.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
## [0.2.0] - 2024-03-05
|
## [0.2.0] - 2024-03-05
|
||||||
- **warning: breaking changes!**
|
- **warning: breaking changes!**
|
||||||
- Fix #10, mantissa overflow
|
- Fix #10, mantissa overflow
|
||||||
|
@ -16,15 +16,68 @@ Arduino library to implement float16 data type.
|
|||||||
## Description
|
## Description
|
||||||
|
|
||||||
This **experimental** library defines the float16 (2 byte) data type, including conversion
|
This **experimental** library defines the float16 (2 byte) data type, including conversion
|
||||||
function to and from float32 type. It is definitely **work in progress**.
|
function to and from float32 type.
|
||||||
|
|
||||||
The library implements the **Printable** interface so one can directly print the
|
|
||||||
float16 values in any stream e.g. Serial.
|
|
||||||
|
|
||||||
The primary usage of the float16 data type is to efficiently store and transport
|
The primary usage of the float16 data type is to efficiently store and transport
|
||||||
a floating point number. As it uses only 2 bytes where float and double have typical
|
a floating point number. As it uses only 2 bytes where float and double have typical
|
||||||
4 and 8 bytes, gains can be made at the price of range and precision.
|
4 and 8 bytes, gains can be made at the price of range and precision.
|
||||||
|
|
||||||
|
Note that float16 only has ~3 significant digits.
|
||||||
|
|
||||||
|
To print a float16, one need to convert it with toFloat(), toDouble() or toString(decimals).
|
||||||
|
The latter allows concatenation and further conversion to an char array.
|
||||||
|
|
||||||
|
In pre 0.3.0 version the Printable interface was implemented, but it has been removed
|
||||||
|
as it caused excessive memory usage when declaring arrays of float16.
|
||||||
|
|
||||||
|
|
||||||
|
#### ARM alternative half-precision
|
||||||
|
|
||||||
|
-https://en.wikipedia.org/wiki/Half-precision_floating-point_format#ARM_alternative_half-precision
|
||||||
|
|
||||||
|
_ARM processors support (via a floating point control register bit)
|
||||||
|
an "alternative half-precision" format, which does away with the
|
||||||
|
special case for an exponent value of 31 (111112).[10] It is almost
|
||||||
|
identical to the IEEE format, but there is no encoding for infinity or NaNs;
|
||||||
|
instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008._
|
||||||
|
|
||||||
|
Implemented in https://github.com/RobTillaart/float16ext class.
|
||||||
|
|
||||||
|
|
||||||
|
#### Difference with float16 and float16ext
|
||||||
|
|
||||||
|
The float16ext library has an extended range as it supports values from +- 65504
|
||||||
|
to +- 131008.
|
||||||
|
|
||||||
|
The float16ext does not support INF, -INF and NAN. These values are mapped upon
|
||||||
|
the largest positive, the largest negative and the largest positive number.
|
||||||
|
|
||||||
|
The -0 and 0 values will both exist.
|
||||||
|
|
||||||
|
|
||||||
|
Although they share a lot of code float16 and float16ext should not be mixed.
|
||||||
|
In the future these libraries might merge / derive one from the other.
|
||||||
|
|
||||||
|
|
||||||
|
#### Breaking change 0.3.0
|
||||||
|
|
||||||
|
Version 0.3.0 has a breaking change. The **Printable** interface is removed as
|
||||||
|
it causes larger than expected arrays of float 16 (See #16). On ESP8266 every
|
||||||
|
float16 object was 8 bytes and on AVR it was 5 bytes instead of the expected 2 bytes.
|
||||||
|
|
||||||
|
To support printing the class added two new conversion functions:
|
||||||
|
```cpp
|
||||||
|
f16.toFloat();
|
||||||
|
f16.toString(decimals);
|
||||||
|
|
||||||
|
Serial.println(f16.toFloat(), 4);
|
||||||
|
Serial.println(f16.toString(4));
|
||||||
|
```
|
||||||
|
This keeps printing relative easy.
|
||||||
|
|
||||||
|
The footprint of the library is now smaller and one can now create compact array's
|
||||||
|
of float16 elements using only 2 bytes per element.
|
||||||
|
|
||||||
|
|
||||||
#### Breaking change 0.2.0
|
#### Breaking change 0.2.0
|
||||||
|
|
||||||
@ -34,9 +87,9 @@ For some specific values the mantissa overflowed when the float 16 was
|
|||||||
assigned a value to. This overflow was not detected / corrected.
|
assigned a value to. This overflow was not detected / corrected.
|
||||||
|
|
||||||
During the analysis of this bug it became clear that the sub-normal numbers
|
During the analysis of this bug it became clear that the sub-normal numbers
|
||||||
were also implemented correctly. This is fixed too in 0.2.0.
|
were also not implemented correctly. This is fixed too in 0.2.0.
|
||||||
|
|
||||||
There is still an issue 0 versus -0
|
There is still an issue with 0 versus -0 (sign gets lost in conversion).
|
||||||
|
|
||||||
**This makes all pre-0.2.0 version obsolete.**
|
**This makes all pre-0.2.0 version obsolete.**
|
||||||
|
|
||||||
@ -44,16 +97,18 @@ There is still an issue 0 versus -0
|
|||||||
## Specifications
|
## Specifications
|
||||||
|
|
||||||
|
|
||||||
| attribute | value | notes |
|
| Attribute | Value | Notes |
|
||||||
|:----------|:-------------|:--------|
|
|:------------|:----------------|:--------|
|
||||||
| size | 2 bytes | layout s eeeee mmmmmmmmmm (1,5,10)
|
| size | 2 bytes | layout s eeeee mmmmmmmmmm (1, 5, 10)
|
||||||
| sign | 1 bit |
|
| sign | 1 bit |
|
||||||
| exponent | 5 bit |
|
| exponent | 5 bit |
|
||||||
| mantissa | 10 bit | ~ 3 digits
|
| mantissa | 10 bit | 3 - 4 digits
|
||||||
| minimum | 5.96046 E−8 | smallest positive number.
|
| minimum | ±5.96046 E−8 | smallest number.
|
||||||
| | 1.0009765625 | 1 + 2^−10 = smallest number larger than 1.
|
| | ±1.0009765625 | 1 + 2^−10 = smallest number larger than 1.
|
||||||
| maximum | 65504 |
|
| maximum | ±65504 |
|
||||||
| | |
|
| | |
|
||||||
|
|
||||||
|
± = ALT 0177
|
||||||
|
|
||||||
|
|
||||||
#### Example values
|
#### Example values
|
||||||
@ -87,6 +142,10 @@ Source: https://en.wikipedia.org/wiki/Half-precision_floating-point_format
|
|||||||
#### Related
|
#### Related
|
||||||
|
|
||||||
- https://wokwi.com/projects/376313228108456961 (demo of its usage)
|
- https://wokwi.com/projects/376313228108456961 (demo of its usage)
|
||||||
|
- https://github.com/RobTillaart/float16
|
||||||
|
- https://github.com/RobTillaart/float16ext
|
||||||
|
- https://github.com/RobTillaart/fraction
|
||||||
|
- https://en.wikipedia.org/wiki/Half-precision_floating-point_format
|
||||||
|
|
||||||
|
|
||||||
## Interface
|
## Interface
|
||||||
@ -97,28 +156,35 @@ Source: https://en.wikipedia.org/wiki/Half-precision_floating-point_format
|
|||||||
|
|
||||||
#### Constructors
|
#### Constructors
|
||||||
|
|
||||||
- **float16(void)** defaults to zero.
|
- **float16(void)** defaults value to zero.
|
||||||
- **float16(double f)** constructor.
|
- **float16(double f)** constructor.
|
||||||
- **float16(const float16 &f)** copy constructor.
|
- **float16(const float16 &f)** copy constructor.
|
||||||
|
|
||||||
|
|
||||||
#### Conversion
|
#### Conversion
|
||||||
|
|
||||||
- **double toDouble(void)** convert to double (or float).
|
- **double toDouble(void)** convert value to double or float (if the same e.g. UNO).
|
||||||
|
- **float toFloat(void)** convert value to float.
|
||||||
|
- **String toString(unsigned int decimals = 2)** convert value to a String with decimals.
|
||||||
|
Please note that the accuracy is only 3-4 digits for the whole number so use decimals
|
||||||
|
with care.
|
||||||
|
|
||||||
|
|
||||||
|
#### Export and store
|
||||||
|
|
||||||
|
To serialize the internal format e.g. to disk, two helper functions are available.
|
||||||
|
|
||||||
- **uint16_t getBinary()** get the 2 byte binary representation.
|
- **uint16_t getBinary()** get the 2 byte binary representation.
|
||||||
- **void setBinary(uint16_t u)** set the 2 bytes binary representation.
|
- **void setBinary(uint16_t u)** set the 2 bytes binary representation.
|
||||||
- **size_t printTo(Print& p) const** Printable interface.
|
|
||||||
- **void setDecimals(uint8_t d)** idem, used for printTo.
|
|
||||||
- **uint8_t getDecimals()** idem.
|
|
||||||
|
|
||||||
Note the setDecimals takes one byte per object which is not efficient for arrays of float16.
|
|
||||||
See array example for efficient storage using set/getBinary() functions.
|
|
||||||
|
|
||||||
|
|
||||||
#### Compare
|
#### Compare
|
||||||
|
|
||||||
Standard compare functions. Since 0.1.5 these are quite optimized,
|
The library implement the standard compare functions.
|
||||||
so it is fast to compare e.g. 2 measurements.
|
These are optimized, so it is fast to compare 2 float16 values.
|
||||||
|
|
||||||
|
Note: comparison with a float or double always include a conversion.
|
||||||
|
You can improve performance by converting e.g. a threshold only once before comparison.
|
||||||
|
|
||||||
- **bool operator == (const float16& f)**
|
- **bool operator == (const float16& f)**
|
||||||
- **bool operator != (const float16& f)**
|
- **bool operator != (const float16& f)**
|
||||||
@ -143,20 +209,16 @@ Not planned to optimize these.
|
|||||||
- **float16& operator \*= (const float16& f)**
|
- **float16& operator \*= (const float16& f)**
|
||||||
- **float16& operator /= (const float16& f)**
|
- **float16& operator /= (const float16& f)**
|
||||||
|
|
||||||
negation operator.
|
Negation operator.
|
||||||
- **float16 operator - ()** fast negation.
|
- **float16 operator - ()** fast negation.
|
||||||
|
|
||||||
|
Math helpers.
|
||||||
- **int sign()** returns 1 == positive, 0 == zero, -1 == negative.
|
- **int sign()** returns 1 == positive, 0 == zero, -1 == negative.
|
||||||
- **bool isZero()** returns true if zero. slightly faster than **sign()**.
|
- **bool isZero()** returns true if zero. slightly faster than **sign()**.
|
||||||
- **bool isInf()** returns true if value is (-)infinite.
|
|
||||||
|
|
||||||
|
|
||||||
#### Experimental 0.1.8
|
|
||||||
|
|
||||||
- **bool isNaN()** returns true if value is not a number.
|
- **bool isNaN()** returns true if value is not a number.
|
||||||
|
- **bool isInf()** returns true if value is ± infinite.
|
||||||
|
- **bool isPosInf()** returns true if value is + infinite.
|
||||||
## Notes
|
- **bool isNegInf()** returns true if value is - infinite.
|
||||||
|
|
||||||
|
|
||||||
## Future
|
## Future
|
||||||
@ -167,26 +229,19 @@ negation operator.
|
|||||||
|
|
||||||
#### Should
|
#### Should
|
||||||
|
|
||||||
- unit tests of the above.
|
|
||||||
- how to handle 0 == -0 (0x0000 == 0x8000)
|
- how to handle 0 == -0 (0x0000 == 0x8000)
|
||||||
- investigate ARM alternative half-precision
|
|
||||||
_ARM processors support (via a floating point control register bit)
|
|
||||||
an "alternative half-precision" format, which does away with the
|
|
||||||
special case for an exponent value of 31 (111112).[10] It is almost
|
|
||||||
identical to the IEEE format, but there is no encoding for infinity or NaNs;
|
|
||||||
instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008._
|
|
||||||
|
|
||||||
|
|
||||||
#### Could
|
#### Could
|
||||||
|
|
||||||
- copy constructor?
|
- unit tests.
|
||||||
- update documentation.
|
|
||||||
- error handling.
|
- error handling.
|
||||||
- divide by zero errors.
|
- divide by zero errors.
|
||||||
- look for optimizations.
|
- look for optimizations.
|
||||||
- rewrite **f16tof32()** with bit magic.
|
- rewrite **f16tof32()** with bit magic.
|
||||||
- add storage example - with SD card, FRAM or EEPROM
|
- add examples
|
||||||
- add communication example - serial or Ethernet?
|
- persistent storage e.g. SD card, FRAM or EEPROM.
|
||||||
|
- communication e.g. Serial or Ethernet (XML, JSON)?
|
||||||
|
- sorting an array of float16?
|
||||||
|
|
||||||
#### Wont
|
#### Wont
|
||||||
|
|
||||||
|
@ -0,0 +1,44 @@
|
|||||||
|
//
|
||||||
|
// FILE: float16_sizeof_array.ino
|
||||||
|
// AUTHOR: Rob Tillaart
|
||||||
|
// PURPOSE: test float16 size
|
||||||
|
// URL: https://github.com/RobTillaart/float16
|
||||||
|
// See #12
|
||||||
|
|
||||||
|
#include "Arduino.h"
|
||||||
|
#include "float16.h"
|
||||||
|
|
||||||
|
|
||||||
|
float16 test16[100];
|
||||||
|
float test32[100];
|
||||||
|
|
||||||
|
void setup()
|
||||||
|
{
|
||||||
|
Serial.begin(115200);
|
||||||
|
|
||||||
|
Serial.println("FLOAT16");
|
||||||
|
Serial.println(sizeof(test16) / sizeof(test16[0]));
|
||||||
|
Serial.println(sizeof(test16));
|
||||||
|
Serial.println(sizeof(test16[0]));
|
||||||
|
Serial.println();
|
||||||
|
|
||||||
|
Serial.println("FLOAT32");
|
||||||
|
Serial.println(sizeof(test32) / sizeof(test32[0]));
|
||||||
|
Serial.println(sizeof(test32));
|
||||||
|
Serial.println(sizeof(test32[0]));
|
||||||
|
Serial.println();
|
||||||
|
|
||||||
|
// set some values to make sure the compiler doesn't optimise out the arrays.
|
||||||
|
test16[5] = 32;
|
||||||
|
test32[4] = 32;
|
||||||
|
|
||||||
|
// Serial.println(test16[5].toDouble(), 3);
|
||||||
|
// Serial.println(test16[5].toFloat(), 3);
|
||||||
|
// Serial.println(test16[5].toString());
|
||||||
|
// Serial.println(test16[5].toString(1));
|
||||||
|
// Serial.println(test16[5].toString(3));
|
||||||
|
};
|
||||||
|
|
||||||
|
void loop()
|
||||||
|
{
|
||||||
|
};
|
@ -29,8 +29,6 @@ void setup()
|
|||||||
Serial.println(FLOAT16_LIB_VERSION);
|
Serial.println(FLOAT16_LIB_VERSION);
|
||||||
Serial.println("\nStart ");
|
Serial.println("\nStart ");
|
||||||
|
|
||||||
f16.setDecimals(6);
|
|
||||||
|
|
||||||
test_1();
|
test_1();
|
||||||
test_2();
|
test_2();
|
||||||
test_3();
|
test_3();
|
||||||
|
@ -24,8 +24,6 @@ void setup()
|
|||||||
Serial.print("FLOAT16_LIB_VERSION: ");
|
Serial.print("FLOAT16_LIB_VERSION: ");
|
||||||
Serial.println(FLOAT16_LIB_VERSION);
|
Serial.println(FLOAT16_LIB_VERSION);
|
||||||
|
|
||||||
f16.setDecimals(6);
|
|
||||||
|
|
||||||
test_all();
|
test_all();
|
||||||
|
|
||||||
Serial.println("\ndone");
|
Serial.println("\ndone");
|
||||||
@ -96,7 +94,7 @@ void test_0()
|
|||||||
f16 = x;
|
f16 = x;
|
||||||
Serial.print(x);
|
Serial.print(x);
|
||||||
Serial.print("\t");
|
Serial.print("\t");
|
||||||
Serial.print(f16);
|
Serial.print(f16.toString(2));
|
||||||
Serial.print("\t");
|
Serial.print("\t");
|
||||||
Serial.print(f16.toDouble(), 2);
|
Serial.print(f16.toDouble(), 2);
|
||||||
Serial.print("\t");
|
Serial.print("\t");
|
||||||
|
@ -5,7 +5,7 @@
|
|||||||
// URL: https://github.com/RobTillaart/float16
|
// URL: https://github.com/RobTillaart/float16
|
||||||
|
|
||||||
|
|
||||||
// show different storage needs
|
// show storage needs (fixed in 0.3.0)
|
||||||
|
|
||||||
#include "float16.h"
|
#include "float16.h"
|
||||||
|
|
||||||
|
@ -0,0 +1,32 @@
|
|||||||
|
|
||||||
|
float16_test_array.ino
|
||||||
|
FLOAT16_LIB_VERSION: 0.3.0
|
||||||
|
|
||||||
|
0 5.07
|
||||||
|
1 -0.51
|
||||||
|
2 -2.27
|
||||||
|
3 3.58
|
||||||
|
4 6.30
|
||||||
|
5 -0.28
|
||||||
|
6 2.44
|
||||||
|
7 5.78
|
||||||
|
8 6.23
|
||||||
|
9 4.09
|
||||||
|
0.30
|
||||||
|
|
||||||
|
0 5.07
|
||||||
|
1 -0.51
|
||||||
|
2 -2.27
|
||||||
|
3 3.58
|
||||||
|
4 6.30
|
||||||
|
5 -0.28
|
||||||
|
6 2.44
|
||||||
|
7 5.78
|
||||||
|
8 6.23
|
||||||
|
9 4.09
|
||||||
|
0.30
|
||||||
|
|
||||||
|
SIZE: 20
|
||||||
|
SIZE: 20
|
||||||
|
|
||||||
|
done
|
@ -1,7 +1,7 @@
|
|||||||
//
|
//
|
||||||
// FILE: float16_test_performance.ino
|
// FILE: float16_test_performance.ino
|
||||||
// AUTHOR: Rob Tillaart
|
// AUTHOR: Rob Tillaart
|
||||||
// PURPOSE: test float16
|
// PURPOSE: test float16 performance
|
||||||
// URL: https://github.com/RobTillaart/float16
|
// URL: https://github.com/RobTillaart/float16
|
||||||
|
|
||||||
|
|
||||||
@ -162,7 +162,7 @@ void setup()
|
|||||||
delay(10);
|
delay(10);
|
||||||
Serial.println();
|
Serial.println();
|
||||||
|
|
||||||
Serial.println(f16);
|
Serial.println(f16.toString(4));
|
||||||
|
|
||||||
Serial.println("MATH III - negation");
|
Serial.println("MATH III - negation");
|
||||||
start = micros();
|
start = micros();
|
||||||
@ -173,7 +173,7 @@ void setup()
|
|||||||
delay(10);
|
delay(10);
|
||||||
Serial.println();
|
Serial.println();
|
||||||
|
|
||||||
Serial.println(f18);
|
Serial.println(f18.toString(4));
|
||||||
|
|
||||||
Serial.println("\ndone");
|
Serial.println("\ndone");
|
||||||
}
|
}
|
||||||
|
@ -0,0 +1,53 @@
|
|||||||
|
|
||||||
|
//
|
||||||
|
// test: UNO
|
||||||
|
// IDE: 1.8.19
|
||||||
|
//
|
||||||
|
|
||||||
|
FLOAT16_LIB_VERSION: 0.3.0
|
||||||
|
|
||||||
|
CONSTRUCTORS
|
||||||
|
Constructor: 28
|
||||||
|
a = b: 4
|
||||||
|
|
||||||
|
CONVERSION
|
||||||
|
toDouble(): 360
|
||||||
|
|
||||||
|
COMPARE
|
||||||
|
compare == : 12
|
||||||
|
compare != : 8
|
||||||
|
compare < : 4
|
||||||
|
compare <= : 4
|
||||||
|
compare >= : 8
|
||||||
|
compare > : 4
|
||||||
|
|
||||||
|
MATH I
|
||||||
|
math + : 756
|
||||||
|
math - : 692
|
||||||
|
math * : 692
|
||||||
|
math / : 688
|
||||||
|
|
||||||
|
MATH II
|
||||||
|
math += : 756
|
||||||
|
math -= : 692
|
||||||
|
math *= : 692
|
||||||
|
math /= : 704
|
||||||
|
|
||||||
|
17.8125
|
||||||
|
MATH III - negation
|
||||||
|
negation : 4
|
||||||
|
|
||||||
|
-17.8125
|
||||||
|
|
||||||
|
done
|
||||||
|
|
||||||
|
|
||||||
|
------------------------------------------
|
||||||
|
|
||||||
|
//
|
||||||
|
// test: RP2040
|
||||||
|
// IDE: 1.8.13
|
||||||
|
//
|
||||||
|
|
||||||
|
|
||||||
|
-------------------------------------------
|
@ -21,7 +21,7 @@ void setup()
|
|||||||
float16 f16(n);
|
float16 f16(n);
|
||||||
Serial.print(n);
|
Serial.print(n);
|
||||||
Serial.print('\t');
|
Serial.print('\t');
|
||||||
Serial.print(f16);
|
Serial.print(f16.toString());
|
||||||
Serial.print('\t');
|
Serial.print('\t');
|
||||||
Serial.print(abs(f16.toDouble() / n), 8);
|
Serial.print(abs(f16.toDouble() / n), 8);
|
||||||
Serial.println();
|
Serial.println();
|
||||||
|
@ -31,14 +31,12 @@ void setup()
|
|||||||
Serial.print("FLOAT16_LIB_VERSION: ");
|
Serial.print("FLOAT16_LIB_VERSION: ");
|
||||||
Serial.println(FLOAT16_LIB_VERSION);
|
Serial.println(FLOAT16_LIB_VERSION);
|
||||||
|
|
||||||
f16.setDecimals(6);
|
|
||||||
|
|
||||||
for (int i = 0; i < 32; i++)
|
for (int i = 0; i < 32; i++)
|
||||||
{
|
{
|
||||||
f16.setBinary(value[i]);
|
f16.setBinary(value[i]);
|
||||||
Serial.print(value[i], HEX);
|
Serial.print(value[i], HEX);
|
||||||
Serial.print("\t");
|
Serial.print("\t");
|
||||||
Serial.print(f16);
|
Serial.print(f16.toString());
|
||||||
Serial.print("\t");
|
Serial.print("\t");
|
||||||
Serial.print(f16.toDouble(), 6);
|
Serial.print(f16.toDouble(), 6);
|
||||||
Serial.print("\t");
|
Serial.print("\t");
|
||||||
@ -52,7 +50,7 @@ void setup()
|
|||||||
f16 = x;
|
f16 = x;
|
||||||
Serial.print(f16.getBinary(), HEX);
|
Serial.print(f16.getBinary(), HEX);
|
||||||
Serial.print("\t");
|
Serial.print("\t");
|
||||||
Serial.print(f16);
|
Serial.print(f16.toString());
|
||||||
Serial.print("\t");
|
Serial.print("\t");
|
||||||
Serial.print(f16.toDouble(), 6);
|
Serial.print(f16.toDouble(), 6);
|
||||||
Serial.print("\t");
|
Serial.print("\t");
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
//
|
//
|
||||||
// FILE: float16.cpp
|
// FILE: float16.cpp
|
||||||
// AUTHOR: Rob Tillaart
|
// AUTHOR: Rob Tillaart
|
||||||
// VERSION: 0.2.0
|
// VERSION: 0.3.0
|
||||||
// PURPOSE: library for Float16s for Arduino
|
// PURPOSE: library for Float16s for Arduino
|
||||||
// URL: http://en.wikipedia.org/wiki/Half-precision_floating-point_format
|
// URL: http://en.wikipedia.org/wiki/Half-precision_floating-point_format
|
||||||
|
|
||||||
@ -15,18 +15,26 @@ float16::float16(double f)
|
|||||||
_value = f32tof16(f);
|
_value = f32tof16(f);
|
||||||
}
|
}
|
||||||
|
|
||||||
// PRINTING
|
|
||||||
size_t float16::printTo(Print& p) const
|
|
||||||
{
|
|
||||||
double d = this->f16tof32(_value);
|
|
||||||
return p.print(d, _decimals);
|
|
||||||
}
|
|
||||||
|
|
||||||
|
//////////////////////////////////////////////////////////
|
||||||
|
//
|
||||||
|
// CONVERTING & PRINTING
|
||||||
|
//
|
||||||
double float16::toDouble() const
|
double float16::toDouble() const
|
||||||
{
|
{
|
||||||
return f16tof32(_value);
|
return f16tof32(_value);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
float float16::toFloat() const
|
||||||
|
{
|
||||||
|
return f16tof32(_value);
|
||||||
|
}
|
||||||
|
|
||||||
|
String float16::toString(unsigned int decimals) const
|
||||||
|
{
|
||||||
|
return String((double)f16tof32(_value), decimals);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
//////////////////////////////////////////////////////////
|
//////////////////////////////////////////////////////////
|
||||||
//
|
//
|
||||||
@ -164,6 +172,16 @@ bool float16::isInf()
|
|||||||
return ((_value == 0x7C00) || (_value == 0xFC00));
|
return ((_value == 0x7C00) || (_value == 0xFC00));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
bool float16::isPosInf()
|
||||||
|
{
|
||||||
|
return (_value == 0x7C00);
|
||||||
|
}
|
||||||
|
|
||||||
|
bool float16::isNegInf()
|
||||||
|
{
|
||||||
|
return (_value == 0xFC00);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
//////////////////////////////////////////////////////////
|
//////////////////////////////////////////////////////////
|
||||||
//
|
//
|
||||||
|
@ -2,7 +2,7 @@
|
|||||||
//
|
//
|
||||||
// FILE: float16.h
|
// FILE: float16.h
|
||||||
// AUTHOR: Rob Tillaart
|
// AUTHOR: Rob Tillaart
|
||||||
// VERSION: 0.2.0
|
// VERSION: 0.3.0
|
||||||
// PURPOSE: Arduino library to implement float16 data type.
|
// PURPOSE: Arduino library to implement float16 data type.
|
||||||
// half-precision floating point format,
|
// half-precision floating point format,
|
||||||
// used for efficient storage and transport.
|
// used for efficient storage and transport.
|
||||||
@ -11,10 +11,10 @@
|
|||||||
|
|
||||||
#include "Arduino.h"
|
#include "Arduino.h"
|
||||||
|
|
||||||
#define FLOAT16_LIB_VERSION (F("0.2.0"))
|
#define FLOAT16_LIB_VERSION (F("0.3.0"))
|
||||||
|
|
||||||
|
|
||||||
class float16: public Printable
|
class float16
|
||||||
{
|
{
|
||||||
public:
|
public:
|
||||||
// Constructors
|
// Constructors
|
||||||
@ -22,21 +22,18 @@ public:
|
|||||||
float16(double f);
|
float16(double f);
|
||||||
float16(const float16 &f) { _value = f._value; };
|
float16(const float16 &f) { _value = f._value; };
|
||||||
|
|
||||||
// Conversion
|
// Conversion and printing
|
||||||
double toDouble(void) const;
|
double toDouble(void) const;
|
||||||
|
float toFloat() const;
|
||||||
|
String toString(unsigned int decimals = 2) const; // keep esp32 happy.
|
||||||
|
|
||||||
// access the 2 byte representation.
|
// access the 2 byte representation.
|
||||||
uint16_t getBinary() { return _value; };
|
uint16_t getBinary() { return _value; };
|
||||||
void setBinary(uint16_t u) { _value = u; };
|
void setBinary(uint16_t u) { _value = u; };
|
||||||
|
|
||||||
// Printable
|
|
||||||
size_t printTo(Print& p) const;
|
|
||||||
void setDecimals(uint8_t d) { _decimals = d; };
|
|
||||||
uint8_t getDecimals() { return _decimals; };
|
|
||||||
|
|
||||||
// equalities
|
// equalities
|
||||||
bool operator == (const float16& f);
|
bool operator == (const float16& f);
|
||||||
bool operator != (const float16& f);
|
bool operator != (const float16& f);
|
||||||
|
|
||||||
bool operator > (const float16& f);
|
bool operator > (const float16& f);
|
||||||
bool operator >= (const float16& f);
|
bool operator >= (const float16& f);
|
||||||
bool operator < (const float16& f);
|
bool operator < (const float16& f);
|
||||||
@ -61,7 +58,8 @@ public:
|
|||||||
bool isZero();
|
bool isZero();
|
||||||
bool isNaN();
|
bool isNaN();
|
||||||
bool isInf();
|
bool isInf();
|
||||||
|
bool isPosInf();
|
||||||
|
bool isNegInf();
|
||||||
|
|
||||||
// CORE CONVERSION
|
// CORE CONVERSION
|
||||||
// should be private, needed for testing.
|
// should be private, needed for testing.
|
||||||
@ -70,7 +68,6 @@ public:
|
|||||||
|
|
||||||
|
|
||||||
private:
|
private:
|
||||||
uint8_t _decimals = 4;
|
|
||||||
uint16_t _value;
|
uint16_t _value;
|
||||||
|
|
||||||
};
|
};
|
||||||
|
@ -7,16 +7,19 @@ float16 KEYWORD1
|
|||||||
|
|
||||||
# Methods and Functions (KEYWORD2)
|
# Methods and Functions (KEYWORD2)
|
||||||
toDouble KEYWORD2
|
toDouble KEYWORD2
|
||||||
|
toFloat KEYWORD2
|
||||||
|
toString KEYWORD2
|
||||||
getBinary KEYWORD2
|
getBinary KEYWORD2
|
||||||
setBinary KEYWORD2
|
setBinary KEYWORD2
|
||||||
|
|
||||||
setDecimals KEYWORD2
|
|
||||||
getDecimals KEYWORD2
|
|
||||||
|
|
||||||
sign KEYWORD2
|
sign KEYWORD2
|
||||||
isZero KEYWORD2
|
isZero KEYWORD2
|
||||||
isNan KEYWORD2
|
isNan KEYWORD2
|
||||||
isInf KEYWORD2
|
isInf KEYWORD2
|
||||||
|
isPosInf KEYWORD2
|
||||||
|
isNegInf KEYWORD2
|
||||||
|
|
||||||
|
|
||||||
# Constants (LITERAL1)
|
# Constants (LITERAL1)
|
||||||
FLOAT16_LIB_VERSION LITERAL1
|
FLOAT16_LIB_VERSION LITERAL1
|
||||||
|
@ -15,7 +15,7 @@
|
|||||||
"type": "git",
|
"type": "git",
|
||||||
"url": "https://github.com/RobTillaart/float16.git"
|
"url": "https://github.com/RobTillaart/float16.git"
|
||||||
},
|
},
|
||||||
"version": "0.2.0",
|
"version": "0.3.0",
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"frameworks": "*",
|
"frameworks": "*",
|
||||||
"platforms": "*",
|
"platforms": "*",
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
name=float16
|
name=float16
|
||||||
version=0.2.0
|
version=0.3.0
|
||||||
author=Rob Tillaart <rob.tillaart@gmail.com>
|
author=Rob Tillaart <rob.tillaart@gmail.com>
|
||||||
maintainer=Rob Tillaart <rob.tillaart@gmail.com>
|
maintainer=Rob Tillaart <rob.tillaart@gmail.com>
|
||||||
sentence=Arduino library to implement float16 data type.
|
sentence=Arduino library to implement float16 data type.
|
||||||
|
@ -85,6 +85,16 @@ unittest(test_constructor)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
unittest(test_sizeof)
|
||||||
|
{
|
||||||
|
float16 value(1);
|
||||||
|
float16 arr[10];
|
||||||
|
|
||||||
|
assertEqual(2, sizeof(value));
|
||||||
|
assertEqual(20, sizeof(arr));
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
unittest(test_compare_equal)
|
unittest(test_compare_equal)
|
||||||
{
|
{
|
||||||
float16 a(1);
|
float16 a(1);
|
||||||
@ -142,6 +152,21 @@ unittest(test_negation)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
unittest(test_infinity)
|
||||||
|
{
|
||||||
|
float16 f16;
|
||||||
|
f16.setBinary(0x7C00);
|
||||||
|
assertTrue(f16.isInf());
|
||||||
|
assertTrue(f16.isPosInf());
|
||||||
|
assertFalse(f16.isNegInf());
|
||||||
|
|
||||||
|
f16.setBinary(0xFC00);
|
||||||
|
assertTrue(f16.isInf());
|
||||||
|
assertFalse(f16.isPosInf());
|
||||||
|
assertTrue(f16.isNegInf());
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
unittest(test_conversion)
|
unittest(test_conversion)
|
||||||
{
|
{
|
||||||
for (int i = 0; i < 20; i++)
|
for (int i = 0; i < 20; i++)
|
||||||
@ -153,19 +178,12 @@ unittest(test_conversion)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
unittest(test_printable)
|
unittest(test_toString)
|
||||||
{
|
{
|
||||||
float16 f16(123.456);
|
float16 f16(-123.456);
|
||||||
// test default value.
|
fprintf(stderr, "note the limited accuracy (~4 digits).\n");
|
||||||
assertEqual(4, f16.getDecimals());
|
assertNotEqual("-123.456", f16.toString(3));
|
||||||
for (int i = 0; i < 6; i++)
|
assertEqual("-123.4", f16.toString(1));
|
||||||
{
|
|
||||||
f16.setDecimals(i);
|
|
||||||
assertEqual(i, f16.getDecimals());
|
|
||||||
}
|
|
||||||
|
|
||||||
// TODO
|
|
||||||
// printable? how to test?
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@ -184,21 +202,24 @@ unittest(test_all_values)
|
|||||||
b = a.toDouble();
|
b = a.toDouble();
|
||||||
if (x != b.getBinary()) // assert would give 65K lines output!
|
if (x != b.getBinary()) // assert would give 65K lines output!
|
||||||
{
|
{
|
||||||
fprintf(stderr, "fail at %d\n", x);
|
fprintf(stderr, "fail at %d != %d\n", x, b.getBinary());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fprintf(stderr, "test all negative patterns\n");
|
fprintf(stderr, "test all negative patterns\n");
|
||||||
|
fprintf(stderr, "only fails -0\n");
|
||||||
for (uint32_t x = 0x8000; x < 0xFC01; x++)
|
for (uint32_t x = 0x8000; x < 0xFC01; x++)
|
||||||
{
|
{
|
||||||
a.setBinary(x);
|
a.setBinary(x);
|
||||||
b = a.toDouble();
|
b = a.toDouble();
|
||||||
if (x != b.getBinary())
|
if (x != b.getBinary())
|
||||||
{
|
{
|
||||||
fprintf(stderr, "fail at %d\n", x);
|
fprintf(stderr, "fail at %d != %d\n", x, b.getBinary());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
unittest_main()
|
unittest_main()
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user