1) pre-shift GCM last4 to use 32-bit shift
On 32-bit architectures like Aarch32, RV32, Xtensa,
shifting a 64-bit variable by 32-bits is free,
since it changes the register representing half of the 64-bit var.
Pre-shift the last4 array to take advantage of this.
2) unroll first GCM iteration
The first loop of gcm_mult() is different from
the others. By unrolling it separately from the
others, the other iterations may take advantage
of the zero-overhead loop construct, in addition
to saving a conditional branch in the loop.
When buffer_needs_realloc in the AES driver, check the location of the buffer only in case
of SOC_AXI_DMA_EXT_MEM_ENC_ALIGNMENT for allocating the newer buffer, otherwise use generic
DMA capable memory (as was done earlier)
- In case of AXI-DMA, the DMA descriptors need to be 8 bytes aligned
lldesc_t do not satify this condition thus we need to replace it with
dma_descriptor_t (align(4) and align(8)) in esp_crypto_shared_gdma.
- Added new shared gdma start API that supports the dma_descriptor_t
DMA descriptor.
- Added some generic dma descriptor macros and helper functions
- replace lldesc_t with dma_descriptor_t
- Even if the config MBEDTLS_HARDWARE_AES is enabled, we now support fallback
to software implementation of GCM operations when non-AES ciphers are used.
For certain data lengths, the last input descriptor was not getting appended
correctly and hence the EOF flag in the DMA descriptor link list was
set at incorrect location. This was resulting in the peripheral being
stalled expecting more data and eventually the code used to timeout
waiting for the AES completion interrupt.
Required configs for this issue:
CONFIG_MBEDTLS_HARDWARE_AES
CONFIG_SOC_AES_SUPPORT_DMA
This observation is similar to the issue reported in:
https://github.com/espressif/esp-idf/issues/10647
To recreate this issue, start the AES-GCM DMA operation with data length
12280 bytes and this should stall the operation forever.
In this fix, we are tracing the entire descriptor list and then appending the
extra bytes descriptor at correct position (as the last node).
DMA operation completion must wait until the last DMA descriptor
ownership has been changed to hardware, that is hardware is completed
the write operation for entire data. Earlier for the hardware GCM case,
the first DMA descriptor was checked and it could have resulted in some
race condition for non interrupt (MBEDTLS_AES_USE_INTERRUPT disabled) case.
- Earlier, some intermediate return values were not stored and returned,
thus incorrect return values used to get transmitted to the upper layer of APIs.
- Also, zeroised the output buffer in case of error condition.
The number of the DMA descriptors allocated for certain length (e.g.,
8176) were not sufficient (off by 1 error). This used to result in the
dynamic memory corruption as the region was modified beyond the
allocated range.
This change fixes the DMA descriptor calculation part and allocates
sufficient DMA descriptors based on the data length alignment considerations.
Test has also been added to cover the specific scenario in the CI.
Closes https://github.com/espressif/esp-idf/issues/11310
- GCM operation in mbedtls used ECB, which calculated only 16 bytes of data each time.
- Therefore, when processing a large amount of data, it is necessary to frequently set hardware acceleration calculations,
- which could not make good use of the AES DMA function to improve efficiency.
- Hence, GCM implementation is replaced with CTR-based calculation which utilizes AES DMA to improve efficiency.
This function removes the following legacy atomic CAS functions:
From compare_set.h (file removed):
- compare_and_set_native()
- compare_and_set_extram()
From portmacro.h
- uxPortCompareSet()
- uxPortCompareSetExtram()
Users should call esp_cpu_compare_and_set() instead as this function hides the details
of atomic CAS on internal and external RAM addresses.
Due to the removal of compare_set.h, some missing header includes are also fixed in this commit.