0.1.5 Soundex

This commit is contained in:
Rob Tillaart 2023-11-22 10:00:38 +01:00
parent 1e6689d826
commit b4c59098b0
6 changed files with 43 additions and 22 deletions

View File

@ -6,6 +6,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).
## [0.1.5] - 2023-11-21
- update readme.md
## [0.1.4] - 2023-02-02
- update readme.md
- update GitHub actions
@ -13,7 +17,6 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
- allow **SOUNDEX_MAX_LENGTH** be defined from command line.
- move code to .cpp
## [0.1.3] - 2022-11-24
- Add RP2040 support to build-CI.
- Add CHANGELOG.md

View File

@ -2,8 +2,11 @@
[![Arduino CI](https://github.com/RobTillaart/Soundex/workflows/Arduino%20CI/badge.svg)](https://github.com/marketplace/actions/arduino_ci)
[![Arduino-lint](https://github.com/RobTillaart/Soundex/actions/workflows/arduino-lint.yml/badge.svg)](https://github.com/RobTillaart/Soundex/actions/workflows/arduino-lint.yml)
[![JSON check](https://github.com/RobTillaart/Soundex/actions/workflows/jsoncheck.yml/badge.svg)](https://github.com/RobTillaart/Soundex/actions/workflows/jsoncheck.yml)
[![GitHub issues](https://img.shields.io/github/issues/RobTillaart/Soundex.svg)](https://github.com/RobTillaart/Soundex/issues)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/RobTillaart/Soundex/blob/master/LICENSE)
[![GitHub release](https://img.shields.io/github/release/RobTillaart/Soundex.svg?maxAge=3600)](https://github.com/RobTillaart/Soundex/releases)
[![PlatformIO Registry](https://badges.registry.platformio.org/packages/robtillaart/library/Soundex.svg)](https://registry.platformio.org/libraries/robtillaart/Soundex)
# Soundex
@ -24,14 +27,16 @@ followed by 3 digits replacing the consonants.
The base Soundex has 26 x 7 x 7 x 7 = 8918 possible outcomes,
this could be easily encoded in an uint16_t.
This insight triggered the experimental functions.
This insight triggered the experimental functions **soundex16()** and **soundex32()**.
These experimental functions can be used e.g. to optimize word-searching
as less bytes need to be compared / stored.
#### 0.1.2 Experimental
The library has two experimental functions, **soundex16()** and **soundex32()**.
These functions pack a Soundex length 5 hash in a uint16_t and a length 10 in a uint32_t.
These compress soundex() results.
They effectively compress the **soundex()** results.
Advantages (16 bit version):
- better hash as it adds 1 extra character
@ -56,11 +61,14 @@ The hash codes of these new SoundexNN() are a continuous numeric range.
Note that soundex16() and soundex32() compresses info much better than
the standard soundex().
A soundex64() is possible and uses 8 bytes.
It would allow to compress very long soundex() results (up to 22 chars) in 8 bytes.
A **soundex64()** is technically possible and would use 8 bytes (not implemented).
It would allow to compress very long **soundex()** results (up to 22 chars) in 8 bytes.
Application might be chemical formulas.
It could use the **printHelper** library to print the uint64_t as HEX.
#### Links
#### Related
- https://en.wikipedia.org/wiki/Soundex
- https://en.wikipedia.org/wiki/Metaphone (not implemented)
@ -94,7 +102,7 @@ Note: preferably printed in HEX.
Not tested on other platforms.
First numbers of **.soundex("Trichloroethylene")** measured with
First numbers of **soundex("Trichloroethylene")** measured with
a test sketch shows the following timing per word.
| Checksum | digits | UNO 16 MHz | ESP32 240 MHz | notes |
@ -109,14 +117,13 @@ a test sketch shows the following timing per word.
See examples.
## Future ideas
## Future
#### Must
- improve documentation
- add examples
#### Should
- more testing
@ -125,11 +132,11 @@ See examples.
- string lengths
- performance
#### Could
- use spare bits of soundex16/32 as parity / checksum.
- add String interface e.g.
- **String soundex(String str)**
#### Wont
@ -143,3 +150,12 @@ See examples.
- Beider-Morse Soundex
- Metaphone
## Support
If you appreciate my libraries, you can support the development and maintenance.
Improve the quality of the libraries by providing issues and Pull Requests, or
donate through PayPal or GitHub sponsors.
Thank you,

View File

@ -1,7 +1,7 @@
//
// FILE: Soundex.cpp
// AUTHOR: Rob Tillaart
// VERSION: 0.1.4
// VERSION: 0.1.5
// DATE: 2022-02-05
// PURPOSE: Arduino Library for calculating Soundex hash
// URL: https://github.com/RobTillaart/Soundex
@ -39,7 +39,7 @@ uint8_t Soundex::getLength()
char * Soundex::soundex(const char * str)
{
uint8_t i = 0; // index for the buffer.
uint8_t i = 0; // index for the buffer.
// fill buffer with zeros
for (i = 0; i < _length; i++) _buffer[i] = '0';
@ -53,13 +53,15 @@ char * Soundex::soundex(const char * str)
// handle first character
i = 0;
_buffer[i++] = toupper(*p);
uint8_t last = sdx[_buffer[0] - 'A']; // remember last code
// remember last code
uint8_t last = sdx[_buffer[0] - 'A'];
p++;
// process the remainder of the string
while ((*p != 0) && (i < _length))
{
if (isalpha(*p)) // skip non ASCII
// skip non ASCII
if (isalpha(*p))
{
uint8_t current = sdx[toupper(*p) - 'A'];
// new code?

View File

@ -2,7 +2,7 @@
//
// FILE: Soundex.h
// AUTHOR: Rob Tillaart
// VERSION: 0.1.4
// VERSION: 0.1.5
// DATE: 2022-02-05
// PURPOSE: Arduino Library for calculating Soundex hash
// URL: https://github.com/RobTillaart/Soundex
@ -11,7 +11,7 @@
#include "Arduino.h"
#define SOUNDEX_LIB_VERSION (F("0.1.4"))
#define SOUNDEX_LIB_VERSION (F("0.1.5"))
#define SOUNDEX_MIN_LENGTH 4

View File

@ -1,7 +1,7 @@
{
"name": "Soundex",
"keywords": "Soundex,hash,Soundex16,Soundex32",
"description": "Arduino Library for soundex.",
"description": "Arduino Library for soundex.\nExperimental Soundex16, Soundex32.",
"authors":
[
{
@ -15,9 +15,9 @@
"type": "git",
"url": "https://github.com/RobTillaart/Soundex.git"
},
"version": "0.1.4",
"version": "0.1.5",
"license": "MIT",
"frameworks": "arduino",
"frameworks": "*",
"platforms": "*",
"headers": "Soundex.h"
}

View File

@ -1,9 +1,9 @@
name=Soundex
version=0.1.4
version=0.1.5
author=Rob Tillaart <rob.tillaart@gmail.com>
maintainer=Rob Tillaart <rob.tillaart@gmail.com>
sentence="Arduino Library for calculating Soundex hash.
paragraph=Experimental Soundex16, Soundex32
paragraph=Experimental Soundex16, Soundex32.
category=Signal Input/Output
url=https://github.com/RobTillaart/Soundex
architectures=*