Richard Allen f45c9f14c3 change(mbedtls/port): optimize gcm_mult()
1) pre-shift GCM last4 to use 32-bit shift

On 32-bit architectures like Aarch32, RV32, Xtensa,
shifting a 64-bit variable by 32-bits is free,
since it changes the register representing half of the 64-bit var.
Pre-shift the last4 array to take advantage of this.

2) unroll first GCM iteration

The first loop of gcm_mult() is different from
the others. By unrolling it separately from the
others, the other iterations may take advantage
of the zero-overhead loop construct, in addition
to saving a conditional branch in the loop.
2024-08-21 18:26:31 +05:30
..