mirror of
https://github.com/espressif/esp-idf.git
synced 2024-10-05 20:47:46 -04:00
doc: Add performance guides for execuion speed, binary size, RAM usage
Closes https://github.com/espressif/esp-idf/issues/7007 Closes https://github.com/espressif/esp-idf/issues/6715 Closes https://github.com/espressif/esp-idf/issues/3781 Closes https://github.com/espressif/esp-idf/issues/2566
This commit is contained in:
parent
2a27e46cd9
commit
dc6b950257
1
Kconfig
1
Kconfig
@ -361,6 +361,7 @@ mainmenu "Espressif IoT Development Framework Configuration"
|
||||
|
||||
- coverage: NORMAL < STRONG < OVERALL
|
||||
|
||||
The performance impact includes increasing the amount of stack memory required for each task.
|
||||
|
||||
config COMPILER_STACK_CHECK_MODE_NONE
|
||||
bool "None"
|
||||
|
@ -31,7 +31,7 @@ config BTDM_CTRL_BR_EDR_MAX_ACL_CONN
|
||||
range 1 7
|
||||
help
|
||||
BR/EDR ACL maximum connections of bluetooth controller.
|
||||
Each connection uses 1.2KB static DRAM whenever the BT controller is enabled.
|
||||
Each connection uses 1.2 KB DRAM whenever the BT controller is enabled.
|
||||
|
||||
config BTDM_CTRL_BR_EDR_MAX_SYNC_CONN
|
||||
int "BR/EDR Sync(SCO/eSCO) Max Connections"
|
||||
@ -40,7 +40,7 @@ config BTDM_CTRL_BR_EDR_MAX_SYNC_CONN
|
||||
range 0 3
|
||||
help
|
||||
BR/EDR Synchronize maximum connections of bluetooth controller.
|
||||
Each connection uses 2KB static DRAM whenever the BT controller is enabled.
|
||||
Each connection uses 2 KB DRAM whenever the BT controller is enabled.
|
||||
|
||||
|
||||
|
||||
|
@ -21,6 +21,8 @@
|
||||
* 4. If the configMAX_PRIORITIES is modified, please make all priority are
|
||||
* greater than 0
|
||||
* 5. Make sure esp_task.h is consistent between wifi lib and idf
|
||||
* 6. If changing system task priorities, please check the values documented in /api-guides/performance/speed.rst
|
||||
* are up to date
|
||||
*/
|
||||
|
||||
#ifndef _ESP_TASK_H_
|
||||
|
@ -136,11 +136,7 @@
|
||||
#define configTICK_RATE_HZ ( CONFIG_FREERTOS_HZ )
|
||||
|
||||
/* This has impact on speed of search for highest priority */
|
||||
#ifdef SMALL_TEST
|
||||
#define configMAX_PRIORITIES ( 7 )
|
||||
#else
|
||||
#define configMAX_PRIORITIES ( 25 )
|
||||
#endif
|
||||
|
||||
/* Various things that impact minimum stack sizes */
|
||||
|
||||
|
@ -102,6 +102,8 @@
|
||||
|
||||
#define SOC_CPU_WATCHPOINT_SIZE 64 // bytes
|
||||
|
||||
#define SOC_CPU_HAS_FPU 1
|
||||
|
||||
/*-------------------------- DAC CAPS ----------------------------------------*/
|
||||
#define SOC_DAC_PERIPH_NUM 2
|
||||
#define SOC_DAC_RESOLUTION 8 // DAC resolution ratio 8 bit
|
||||
|
@ -18,3 +18,5 @@
|
||||
#define SOC_CPU_WATCHPOINTS_NUM 2
|
||||
|
||||
#define SOC_CPU_WATCHPOINT_SIZE 64 // bytes
|
||||
|
||||
#define SOC_CPU_HAS_FPU 1
|
||||
|
@ -93,15 +93,13 @@ The autocomplete support for PowerShell is planned in the future.
|
||||
|
||||
.. note:: The environment variables ``ESPPORT`` and ``ESPBAUD`` can be used to set default values for the ``-p`` and ``-b`` options, respectively. Providing these options on the command line overrides the default.
|
||||
|
||||
.. _idf.py-size:
|
||||
|
||||
Advanced Commands
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
- ``idf.py app``, ``idf.py bootloader``, ``idf.py partition_table`` can be used to build only the app, bootloader, or partition table from the project as applicable.
|
||||
- There are matching commands ``idf.py app-flash``, etc. to flash only that single part of the project to the target.
|
||||
- ``idf.py -p PORT erase_flash`` will use esptool.py to erase the target's entire flash chip.
|
||||
- ``idf.py size`` prints some size information about the app. ``size-components`` and ``size-files`` are similar commands which print more detailed per-component or per-source-file information, respectively. If you define variable ``-DOUTPUT_JSON=1`` when running CMake (or ``idf.py``), the output will be formatted as JSON not as human readable text.
|
||||
- ``idf.py size`` prints some size information about the app. ``size-components`` and ``size-files`` are similar commands which print more detailed per-component or per-source-file information, respectively. If you define variable ``-DOUTPUT_JSON=1`` when running CMake (or ``idf.py``), the output will be formatted as JSON not as human readable text. See ``idf.py-size`` for more information.
|
||||
- ``idf.py reconfigure`` re-runs CMake_ even if it doesn't seem to need re-running. This isn't necessary during normal usage, but can be useful after adding/removing files from the source tree, or when modifying CMake cache variables. For example, ``idf.py -DNAME='VALUE' reconfigure`` can be used to set variable ``NAME`` in CMake cache to value ``VALUE``.
|
||||
- ``idf.py python-clean`` deletes generated Python byte code from the IDF directory which may cause issues when switching between IDF and Python versions. It is advised to run this target after switching versions of Python.
|
||||
|
||||
|
@ -29,6 +29,7 @@ API Guides
|
||||
Memory Types <memory-types>
|
||||
lwIP TCP/IP Stack <lwip>
|
||||
Partition Tables <partition-tables>
|
||||
Performance <performance/index>
|
||||
RF Calibration <RF_calibration>
|
||||
:esp32: Secure Boot <../security/secure-boot-v1>
|
||||
Secure Boot V2 <../security/secure-boot-v2>
|
||||
|
@ -337,6 +337,8 @@ Calling ``send()`` or ``sendto()`` repeatedly on a UDP socket may eventually fai
|
||||
|
||||
Increasing the number of TX buffers in the :ref:`Wi-Fi <CONFIG_ESP32_WIFI_TX_BUFFER>` project configuration may also help.
|
||||
|
||||
.. _lwip-performance:
|
||||
|
||||
Performance Optimization
|
||||
------------------------
|
||||
|
||||
@ -351,10 +353,12 @@ The :example_file:`wifi/iperf/sdkconfig.defaults` file for the iperf example con
|
||||
|
||||
.. important:: Suggest applying changes a few at a time and checking the performance each time with a particular application workload.
|
||||
|
||||
- If a lot of tasks are competing for CPU time on the system, consider that the lwIP task has configurable CPU affinity (:ref:`CONFIG_LWIP_TCPIP_TASK_AFFINITY`) and runs at fixed priority ``ESP_TASK_TCPIP_PRIO`` (18). Configure competing tasks to be pinned to a different core, or to run at a lower priority.
|
||||
- If a lot of tasks are competing for CPU time on the system, consider that the lwIP task has configurable CPU affinity (:ref:`CONFIG_LWIP_TCPIP_TASK_AFFINITY`) and runs at fixed priority ``ESP_TASK_TCPIP_PRIO`` (18). Configure competing tasks to be pinned to a different core, or to run at a lower priority. See also :ref:`built-in-task-priorities`.
|
||||
|
||||
- If using ``select()`` function with socket arguments only, setting :ref:`CONFIG_LWIP_USE_ONLY_LWIP_SELECT` will make ``select()`` calls faster.
|
||||
|
||||
- If there is enough free IRAM, select :ref:`CONFIG_LWIP_IRAM_OPTIMIZATION` to improve TX/RX throughput
|
||||
|
||||
If using a Wi-Fi network interface, please also refer to :ref:`wifi-buffer-usage`.
|
||||
|
||||
Minimum latency
|
||||
|
24
docs/en/api-guides/performance/index.rst
Normal file
24
docs/en/api-guides/performance/index.rst
Normal file
@ -0,0 +1,24 @@
|
||||
Performance
|
||||
===========
|
||||
|
||||
ESP-IDF ships with default settings that are designed for a trade-off between performance, resource usage, and available functionality.
|
||||
|
||||
These guides describe how to optimize a firmware application for a particular aspect of performance. Usually this involves some trade-off in terms of limiting available functions, or swapping one aspect of performance (such as execution speed) for another (such as RAM usage).
|
||||
|
||||
How to Optimize Performance
|
||||
---------------------------
|
||||
|
||||
1. Decide what the performance-critical aspects of your application are (for example: a particular response time to a certain network operation, a particular startup time limit, particular peripheral data throughput, etc.).
|
||||
2. Find a way to measure this performance (some methods are outlined in the guides below).
|
||||
3. Modify the code and project configuration and compare the new measurement to the old measurement.
|
||||
4. Repeat step 3 until the performance meets the requirements set out in step 1.
|
||||
|
||||
Guides
|
||||
------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
Execution Speed <speed>
|
||||
Binary Size <size>
|
||||
RAM Usage <ram-usage>
|
144
docs/en/api-guides/performance/ram-usage.rst
Normal file
144
docs/en/api-guides/performance/ram-usage.rst
Normal file
@ -0,0 +1,144 @@
|
||||
Minimizing RAM Usage
|
||||
====================
|
||||
|
||||
{IDF_TARGET_STATIC_MEANS_HEAP:default="Wi-Fi library, Bluetooth controller", esp32s2="Wi-Fi library"}
|
||||
|
||||
In some cases, a firmware application's available RAM may run low or run out entirely. In these cases, it's necessary to tune the memory usage of the firmware application.
|
||||
|
||||
In general, firmware should aim to leave some "headroom" of free internal RAM in order to deal with extraordinary situations or changes in RAM usage in future updates.
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
Before optimizing ESP-IDF RAM usage, it's necessary to understand the basics of {IDF_TARGET_NAME} memory types, the difference between static and dynamic memory usage in C, and the way ESP-IDF uses stack and heap. This information can all be found in :doc:`/api-reference/system/mem_alloc`.
|
||||
|
||||
Measuring Static Memory Usage
|
||||
-----------------------------
|
||||
|
||||
The :ref:`idf.py` tool can be used to generate reports about the static memory usage of an application. Refer to :ref:`the Binary Size chapter for more information <idf.py-size>`.
|
||||
|
||||
Measuring Dynamic Memory Usage
|
||||
------------------------------
|
||||
|
||||
ESP-IDF contains a range of heap APIs for measuring free heap at runtime. See :doc:`/api-reference/system/heap_debug`.
|
||||
|
||||
.. note::
|
||||
|
||||
In embedded systems, heap fragmentation can be a significant issue alongside total RAM usage. The heap measurement APIs provide ways to measure the "largest free block". Monitoring this value along with the total number of free bytes can give a quick indication of whether heap fragmentation is becoming an issue.
|
||||
|
||||
Reducing Static Memory Usage
|
||||
----------------------------
|
||||
|
||||
- Reducing the static memory usage of the application increases the amount of RAM available for heap at runtime, and vice versa.
|
||||
- Generally speaking, minimizing static memory usage requires monitoring the .data and .bss sizes. For tools to do this, see :ref:`idf.py-size`.
|
||||
- Internal ESP-IDF functions do not make heavy use of static RAM allocation in C. In many instances (including: {IDF_TARGET_STATIC_MEANS_HEAP}) "static" buffers are still allocated from heap, but the allocation is done once when the feature is initialized and will be freed if the feature is deinitialized. This is done in order to maximize the amount of free memory at different points in the application life-cycle.
|
||||
|
||||
To minimize static memory use:
|
||||
|
||||
.. list::
|
||||
|
||||
- Declare structures, buffers, or other variables ``const`` whenever possible. Constant data can be stored in flash not RAM. This may require changing functions in the firmware to take ``const *`` arguments instead of mutable pointer arguments. These changes can also reduce the stack usage of some functions.
|
||||
:esp32: - If using Bluedroid, setting the option :ref:`CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY` will cause Bluedroid to allocate memory on initialization and free it on deinitialization. This doesn't necessarily reduce the peak memory usage, but changes it from static memory usage to runtime memory usage.
|
||||
|
||||
.. _optimize-stack-sizes:
|
||||
|
||||
Reducing Stack Sizes
|
||||
--------------------
|
||||
|
||||
In FreeRTOS, task stacks are usually allocated from the heap. The stack size for each task is fixed (passed as an argument to :cpp:func:`xTaskCreate`). Each task can use up to its allocated stack size, but using more than this will cause an otherwise valid program to crash with a stack overflow or heap corruption.
|
||||
|
||||
Therefore, determining the optimum sizes of each task stack can substantially reduce RAM usage.
|
||||
|
||||
To determine optimum task stack sizes:
|
||||
|
||||
- Combine tasks. The best task stack size is 0 bytes, achieved by combining a task with another existing task. Anywhere that the firmware can be structured to perform multiple functions sequentially in a single task will increase free memory. In some cases, using a "worker task" pattern where jobs are serialized into a FreeRTOS queue (or similar) and then processed by generic worker tasks may help.
|
||||
- Consolidate task functions. String formatting functions (like ``printf``) are particularly heavy users of stack, so any task which doesn't ever call these can usually have its stack size reduced.
|
||||
- Enabling :ref:`newlib-nano-formatting` will reduce the stack usage of any task that calls ``printf()`` or other C string formatting functions.
|
||||
- Avoid allocating large variables on the stack. In C, any large struct or array allocated as an "automatic" variable (i.e. default scope of a C declaration) will use space on the stack. Minimize the sizes of these, allocate them statically and/or see if you can save memory by allocating them from the heap only when they are needed.
|
||||
- Avoid deep recursive function calls. Individual recursive function calls don't always add a lot of stack usage each time they are called, but if each function includes large stack-based variables then the overhead can get quite high.
|
||||
- At runtime, call the function :cpp:func:`uxTaskGetStackHighWaterMark` with the handle of any task where you think there is unused stack memory. This function returns the minimum lifetime free stack memory in bytes. The easiest time to call this is from the task itself: call ``uxTaskGetStackHighWaterMark(NULL)`` to get the current task's high water mark after the time that the task has achieved its peak stack usage (i.e. if there is a main loop, execute the main loop a number of times with all possible states and then call :cpp:func:`uxTaskGetStackHighWaterMark`). Often, it's possible to subtract almost the entire value returned here from the total stack size of a task, but allow some safety margin to account for unexpected small increases in stack usage at runtime.
|
||||
- Call :cpp:func:`uxTaskGetSystemState` at runtime to get a summary of all tasks in the system. This includes their individual stack "high watermark" values.
|
||||
- When debugger watchpoints are not being used, set the :ref:`CONFIG_FREERTOS_WATCHPOINT_END_OF_STACK` option to trigger an immediate panic if a task writes the word at the end of its assigned stack. This is slightly more reliable than the default :ref:`CONFIG_FREERTOS_CHECK_STACKOVERFLOW` option of "Check using canary bytes", because the panic happens immediately, not on the next RTOS context switch. Neither option is perfect, it's possible in some cases for stack pointer to skip the watchpoint or canary bytes and corrupt another region of RAM, instead.
|
||||
|
||||
Internal Stack Sizes
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
ESP-IDF allocates a number of internal tasks for housekeeping purposes or operating system functions. Some are created during the startup process, and some are created at runtime when particular features are initialized.
|
||||
|
||||
The default stack sizes for these tasks are usually set conservatively high, to allow all common usage patterns. Many of the stack sizes are configurable, and it may be possible to reduce them to match the real runtime stack usage of the task.
|
||||
|
||||
.. important::
|
||||
|
||||
If internal task stack sizes are set too small, ESP-IDF will crash unpredictably. Even if the root cause is task stack overflow, this is not always clear when debugging. It is recommended that internal stack sizes are only reduced carefully (if at all), with close attention to "high water mark" free space under load. If reporting an issue that occurs when internal task stack sizes have been reduced, please always include this information and the specific configuration that is being used.
|
||||
|
||||
.. list::
|
||||
|
||||
- :ref:`Main task that executes app_main function <app-main-task>` has stack size :ref:`CONFIG_ESP_MAIN_TASK_STACK_SIZE`.
|
||||
- :doc:`/api-reference/system/esp_timer` system task which executes callbacks has stack size :ref:`CONFIG_ESP_TIMER_TASK_STACK_SIZE`.
|
||||
- FreeRTOS Timer Task to handle FreeRTOS timer callbacks has stack size :ref:`CONFIG_FREERTOS_TIMER_TASK_STACK_DEPTH`.
|
||||
- :doc:`/api-guides/event-handling` system task to execute callbacks for the default system event loop has stack size :ref:`CONFIG_ESP_SYSTEM_EVENT_TASK_STACK_SIZE`.
|
||||
- :doc:`/api-guides/lwip` TCP/IP task has stack size :ref:`CONFIG_LWIP_TCPIP_TASK_STACK_SIZE`
|
||||
:esp32: - :doc:`Bluedroid Bluetooth Host </api-reference/bluetooth/index>` have task stack sizes :ref:`CONFIG_BT_BTC_TASK_STACK_SIZE`, :ref:`CONFIG_BT_BTU_TASK_STACK_SIZE`.
|
||||
:SOC_BT_SUPPORTED: - :doc:`NimBLE Bluetooth Host </api-reference/bluetooth/nimble/index>` has task stack size :ref:`CONFIG_BT_NIMBLE_TASK_STACK_SIZE`
|
||||
- The Ethernet driver creates a task for the MAC to receive Ethernet frames. If using the default config ``ETH_MAC_DEFAULT_CONFIG`` then the task stack size is 4 KB. This setting can be changed by passing a custom :cpp:class:`eth_mac_config_t` struct when initializing the Ethernet MAC.
|
||||
- FreeRTOS idle task stack size is configured by :ref:`CONFIG_FREERTOS_IDLE_TASK_STACKSIZE`.
|
||||
- If using the :doc:`mDNS </api-reference/protocols/mdns>` and/or :doc:`MQTT </api-reference/protocols/mqtt>` components, they create tasks with stack sizes configured by :ref:`CONFIG_MDNS_TASK_STACK_SIZE` and :ref:`CONFIG_MQTT_TASK_STACK_SIZE`, respectively. MQTT stack size can also be configured using ``task_stack`` field of :cpp:class:`esp_mqtt_client_config_t`.
|
||||
|
||||
.. note::
|
||||
|
||||
Aside from built-in system features such as esp-timer, if an ESP-IDF feature is not initialized by the firmware then no associated task is created. In those cases, the stack usage is zero and the stack size configuration for the task is not relevant.
|
||||
|
||||
Reducing Heap Usage
|
||||
-------------------
|
||||
|
||||
For functions that assist in analyzing heap usage at runtime, see :doc:`/api-reference/system/heap_debug`.
|
||||
|
||||
Normally, optimizing heap usage consists of analyzing the usage and removing calls to ``malloc()`` that aren't being used, reducing the corresponding sizes, or freeing previously allocated buffers earlier.
|
||||
|
||||
There are some ESP-IDF configuration options that can reduce heap usage at runtime:
|
||||
|
||||
.. list::
|
||||
|
||||
- lwIP documentation has a section to configure :ref:`lwip-ram-usage`.
|
||||
- :ref:`wifi-buffer-usage` describes options to either reduce numbers of "static" buffers or reduce the maximum number of "dynamic" buffers in use, in order to minimize memory usage at possible cost of performance. Note that "static" Wi-Fi buffers are still allocated from heap when Wi-Fi is initialized and will be freed if Wi-Fi is deinitialized.
|
||||
:esp32: - The Ethernet driver allocates DMA buffers for the internal Ethernet MAC when it is initialized - configuration options are :ref:`CONFIG_ETH_DMA_BUFFER_SIZE`, :ref:`CONFIG_ETH_DMA_RX_BUFFER_NUM`, :ref:`CONFIG_ETH_DMA_TX_BUFFER_NUM`.
|
||||
- mbedTLS TLS session memory usage can be minimized by enabling the ESP-IDF feature :ref:`CONFIG_MBEDTLS_DYNAMIC_BUFFER`.
|
||||
:esp32: - In single core mode only, it's possible to use IRAM as byte accessible memory (added to the regular heap) by enabling :ref:`CONFIG_ESP32_IRAM_AS_8BIT_ACCESSIBLE_MEMORY`. Note that this option carries a performance penalty and the risk of security issues caused by executable data. If this option is enabled then it's possible to set other options to prefer certain buffers be allocated from this memory: :ref:`mbedTLS <CONFIG_MBEDTLS_MEM_ALLOC_MODE>`, :ref:`NimBLE <CONFIG_BT_NIMBLE_MEM_ALLOC_MODE>`.
|
||||
:esp32: - Reduce :ref:`CONFIG_BTDM_CTRL_BLE_MAX_CONN` if using BLE.
|
||||
:esp32: - Reduce :ref:`CONFIG_BTDM_CTRL_BR_EDR_MAX_ACL_CONN` if using Bluetooth Classic.
|
||||
|
||||
.. note::
|
||||
|
||||
There are other configuration options that will increase heap usage at runtime if changed from the defaults. These are not listed here, but the help text for the configuration item will mention if there is some memory impact.
|
||||
|
||||
.. _optimize-iram-usage:
|
||||
|
||||
Optimizing IRAM Usage
|
||||
---------------------
|
||||
|
||||
.. only:: not esp32
|
||||
|
||||
The available DRAM at runtime (for heap usage) is also reduced by the static IRAM usage. Therefore, one way to increase available DRAM is to reduce IRAM usage.
|
||||
|
||||
If the app allocates more static IRAM than is available then the app will fail to build and linker errors such as ``section `.iram0.text' will not fit in region `iram0_0_seg'``, ``IRAM0 segment data does not fit`` and ``region `iram0_0_seg' overflowed by 84 bytes`` will be seen. If this happens, it is necessary to find ways to reduce static IRAM usage in order to link the application.
|
||||
|
||||
To analyze the IRAM usage in the firmware binary, use :ref:`idf.py-size`. If the firmware failed to link, steps to analyze are shown at :ref:`idf-size-linker-failed`.
|
||||
|
||||
The following options will reduce IRAM usage of some ESP-IDF features:
|
||||
|
||||
.. list::
|
||||
|
||||
- Enable :ref:`CONFIG_FREERTOS_PLACE_FUNCTIONS_INTO_FLASH`. Provided these functions are not (incorrectly) used from ISRs, this option is safe to enable in all configurations.
|
||||
- Disable Wi-Fi options :ref:`CONFIG_ESP32_WIFI_IRAM_OPT` and/or :ref:`CONFIG_ESP32_WIFI_RX_IRAM_OPT`. Disabling these options will free available IRAM at the cost of Wi-Fi performance.
|
||||
:esp32c3 or esp32s3: - :ref:`CONFIG_SPI_FLASH_ROM_IMPL` enabling this option will free some IRAM but will mean that esp_flash bugfixes and new flash chip support is not available.
|
||||
:esp32: - :ref:`CONFIG_SPI_FLASH_ROM_DRIVER_PATCH` disabling this option will free some IRAM but is only available in some flash configurations (see the configuration item help text).
|
||||
- Disabling :ref:`CONFIG_ESP_EVENT_POST_FROM_IRAM_ISR` prevents posting ``esp_event`` events from :ref:`iram-safe-interrupt-handlers` but will save some IRAM.
|
||||
- Disabling :ref:`CONFIG_SPI_MASTER_ISR_IN_IRAM` prevents spi_master interrupts from being serviced while writing to flash, and may otherwise reduce spi_master performance, but will save some IRAM.
|
||||
|
||||
.. note::
|
||||
|
||||
Moving frequently-called functions from IRAM to flash may increase their execution time.
|
||||
|
||||
.. note::
|
||||
|
||||
Other configuration options exist that will increase IRAM usage by moving some functionality into IRAM, usually for performance, but the default option is not to do this. These are not listed here. The IRAM size impact of enabling these options is usually noted in the configuration item help text.
|
419
docs/en/api-guides/performance/size.rst
Normal file
419
docs/en/api-guides/performance/size.rst
Normal file
@ -0,0 +1,419 @@
|
||||
Minimizing Binary Size
|
||||
======================
|
||||
|
||||
{IDF_TARGET_REDUCED_BY_IRAM: default="DRAM", esp32="IRAM and/or DRAM (depending on sizes)"}
|
||||
|
||||
The ESP-IDF build system compiles all source files in the project and ESP-IDF, but only functions and variables that are actually referenced by the program are linked into the final binary. In some cases, it is necessary to reduce the total size of the firmware binary (for example, in order to fit it into the available flash partition size).
|
||||
|
||||
The first step to reducing the total firmware binary size is measuring what is causing the size to increase.
|
||||
|
||||
.. _idf.py-size:
|
||||
|
||||
Measuring Static Sizes
|
||||
----------------------
|
||||
|
||||
To optimize both firmware binary size and memory usage it's necessary to measure statically allocated RAM ("data", "bss"), code ("text") and read-only data ("rodata") in your project.
|
||||
|
||||
Using the :ref:`idf.py` sub-commands ``size``, ``size-components`` and ``size-files`` provides a summary of memory used by the project:
|
||||
|
||||
Size Summary (idf.py size)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. only:: esp32
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ idf.py size
|
||||
[...]
|
||||
Total sizes:
|
||||
DRAM .data size: 14956 bytes
|
||||
DRAM .bss size: 15808 bytes
|
||||
Used static DRAM: 30764 bytes ( 149972 available, 17.0% used)
|
||||
Used static IRAM: 83918 bytes ( 47154 available, 64.0% used)
|
||||
Flash code: 559943 bytes
|
||||
Flash rodata: 176736 bytes
|
||||
Total image size:~ 835553 bytes (.bin may be padded larger)
|
||||
|
||||
|
||||
.. only:: not esp32
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ idf.py size
|
||||
[...]
|
||||
Total sizes:
|
||||
DRAM .data size: 11584 bytes
|
||||
DRAM .bss size: 19624 bytes
|
||||
Used static DRAM: 0 bytes ( 0 available, nan% used)
|
||||
Used static IRAM: 0 bytes ( 0 available, nan% used)
|
||||
Used stat D/IRAM: 136276 bytes ( 519084 available, 20.8% used)
|
||||
Flash code: 630508 bytes
|
||||
Flash rodata: 177048 bytes
|
||||
Total image size:~ 924208 bytes (.bin may be padded larger)
|
||||
|
||||
This output breaks down the size of all static memory regions in the firmware binary:
|
||||
|
||||
.. list::
|
||||
|
||||
- ``DRAM .data size`` is statically allocated RAM that is assigned to non-zero values at startup. This uses RAM (DRAM) at runtime and also uses space in the binary file.
|
||||
- ``DRAM .bss size`` is statically allocated RAM that is assigned zero at startup. This uses RAM (DRAM) at runtime but doesn't use any space in the binary file.
|
||||
:esp32: - ``Used static DRAM`` is the total DRAM used by .data + .bss. The ``available`` size is the estimated amount of DRAM which will be available as heap memory at runtime (due to metadata overhead and implementation constraints, and heap allocations done by ESP-IDF during startup, the actual free heap at startup will be lower than this).
|
||||
:esp32: - ``Used static IRAM`` is the total size of executable code :ref:`executed from IRAM <iram>`. This uses space in the binary file and also reduces {IDF_TARGET_REDUCED_BY_IRAM} available as heap memory at runtime. See :ref:`optimize-iram-usage`.
|
||||
:not esp32: - ``Used static DRAM``, ``Used static IRAM`` - these options are kept for compatibility with ESP32 target, and currently read 0.
|
||||
:not esp32: - ``Used stat D/IRAM`` - This is total internal RAM usage, the sum of static DRAM .data + .bss, and also static :ref:`iram` used by the application for executable code. The ``available`` size is the estimated amount of DRAM which will be available as heap memory at runtime (due to metadata overhead and implementation constraints, and heap allocations done by ESP-IDF during startup, the actual free heap at startup will be lower than this).
|
||||
- ``Flash code`` is the total size of executable code executed from flash cache (:ref:`IROM <irom>`). This uses space in the binary file.
|
||||
- ``Flash rodata`` is the total size of read-only data loaded from flash cache (:ref:`DROM <drom>`). This uses space in the binary file.
|
||||
- ``Total image size`` is the estimated total binary file size, which is the total of all the used memory types except for .bss.
|
||||
|
||||
Component Usage Summary (idf.py size-components)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The summary output provided by ``idf.py size`` does not give enough detail to find the main contributor to excessive binary size. To analyze in more detail, use ``idf.py size-components``
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ idf.py size-components
|
||||
[...]
|
||||
Total sizes:
|
||||
DRAM .data size: 14956 bytes
|
||||
DRAM .bss size: 15808 bytes
|
||||
Used static DRAM: 30764 bytes ( 149972 available, 17.0% used)
|
||||
Used static IRAM: 83918 bytes ( 47154 available, 64.0% used)
|
||||
Flash code: 559943 bytes
|
||||
Flash rodata: 176736 bytes
|
||||
Total image size:~ 835553 bytes (.bin may be padded larger)
|
||||
Per-archive contributions to ELF file:
|
||||
Archive File DRAM .data & .bss & other IRAM D/IRAM Flash code & rodata Total
|
||||
libnet80211.a 1267 6044 0 5490 0 107445 18484 138730
|
||||
liblwip.a 21 3838 0 0 0 97465 16116 117440
|
||||
libmbedtls.a 60 524 0 0 0 27655 69907 98146
|
||||
libmbedcrypto.a 64 81 0 30 0 76645 11661 88481
|
||||
libpp.a 2427 1292 0 20851 0 37208 4708 66486
|
||||
libc.a 4 0 0 0 0 57056 6455 63515
|
||||
libphy.a 1439 715 0 7798 0 33074 0 43026
|
||||
libwpa_supplicant.a 12 848 0 0 0 35505 1446 37811
|
||||
libfreertos.a 3104 740 0 15711 0 367 4228 24150
|
||||
libnvs_flash.a 0 24 0 0 0 14347 2924 17295
|
||||
libspi_flash.a 1562 294 0 8851 0 1840 1913 14460
|
||||
libesp_system.a 245 206 0 3078 0 5990 3817 13336
|
||||
libesp-tls.a 0 4 0 0 0 5637 3524 9165
|
||||
[... removed some lines here ...]
|
||||
libtcpip_adapter.a 0 17 0 0 0 216 0 233
|
||||
libesp_rom.a 0 0 0 112 0 0 0 112
|
||||
libcxx.a 0 0 0 0 0 47 0 47
|
||||
(exe) 0 0 0 3 0 3 12 18
|
||||
libesp_pm.a 0 0 0 0 0 8 0 8
|
||||
libesp_eth.a 0 0 0 0 0 0 0 0
|
||||
libmesh.a 0 0 0 0 0 0 0 0
|
||||
|
||||
The first lines of output from ``idf.py size-components`` are the same as ``idf.py size``. After this a table is printed of "per-archive contributions to ELF file". This means how much each static library archive has contributed to the final binary size.
|
||||
|
||||
Generally, one static library archive is built per component, although some are binary libraries included by a particular component (for example, ``libnet80211.a`` is included by ``esp_wifi`` component). There are also toolchain libraries such as ``libc.a`` and ``libgcc.a`` listed here, these provide Standard C/C++ Library and toolchain built-in functionality.
|
||||
|
||||
If your project is simple and only has a "main" component, then all of the project's code will be shown under ``libmain.a``. If your project includes its own components (see :doc:`/api-guides/build-system`), then they will each be shown on a separate line.
|
||||
|
||||
The table is sorted in descending order of the total contribution to the binary size.
|
||||
|
||||
The columns are as follows:
|
||||
|
||||
.. list::
|
||||
|
||||
- ``DRAM .data & .bss & other`` - .data and .bss are the same as for the totals shown above (static variables, these both reduce total available RAM at runtime but .bss doesn't contribute to the binary file size). "other" is a column for any custom section types that also contribute to RAM size (usually this value is 0).
|
||||
:esp32: - ``IRAM`` - is the same as for the totals shown above (code linked to execute from IRAM, uses space in the binary file and also reduces IRAM that can be dynamically allocated at runtime using ``HEAP_CAP_32BIT``.
|
||||
:esp32: - ``D/IRAM`` - Shows IRAM space which, due to occupying D/IRAM space, is also reducing available DRAM available as heap at runtime.
|
||||
:not esp32: - ``IRAM`` - is the same as for the totals shown above (code linked to execute from IRAM, uses space in the binary file and also reduces DRAM available as heap at runtime.
|
||||
- ``Flash code & rodata`` - these are the same as the totals above, IROM and DROM space accessed from flash cache that contribute to the binary size.
|
||||
|
||||
Source File Usage Summary (idf.py size-files)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
For even more detail, run ``idf.py size-files`` to get a summary of the contribution each object file has made to the final binary size. Each object file corresponds to a single source file.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ idf.py size-files
|
||||
[...]
|
||||
Total sizes:
|
||||
DRAM .data size: 14956 bytes
|
||||
DRAM .bss size: 15808 bytes
|
||||
Used static DRAM: 30764 bytes ( 149972 available, 17.0% used)
|
||||
Used static IRAM: 83918 bytes ( 47154 available, 64.0% used)
|
||||
Flash code: 559943 bytes
|
||||
Flash rodata: 176736 bytes
|
||||
Total image size:~ 835553 bytes (.bin may be padded larger)
|
||||
Per-file contributions to ELF file:
|
||||
Object File DRAM .data & .bss & other IRAM D/IRAM Flash code & rodata Total
|
||||
x509_crt_bundle.S.o 0 0 0 0 0 0 64212 64212
|
||||
wl_cnx.o 2 3183 0 221 0 13119 3286 19811
|
||||
phy_chip_v7.o 721 614 0 1642 0 16820 0 19797
|
||||
ieee80211_ioctl.o 740 96 0 437 0 15325 2627 19225
|
||||
pp.o 1142 45 0 8871 0 5030 537 15625
|
||||
ieee80211_output.o 2 20 0 2118 0 11617 914 14671
|
||||
ieee80211_sta.o 1 41 0 1498 0 10858 2218 14616
|
||||
lib_a-vfprintf.o 0 0 0 0 0 13829 752 14581
|
||||
lib_a-svfprintf.o 0 0 0 0 0 13251 752 14003
|
||||
ssl_tls.c.o 60 0 0 0 0 12769 463 13292
|
||||
sockets.c.o 0 648 0 0 0 11096 1030 12774
|
||||
nd6.c.o 8 932 0 0 0 11515 314 12769
|
||||
phy_chip_v7_cal.o 477 53 0 3499 0 8561 0 12590
|
||||
pm.o 32 364 0 2673 0 7788 782 11639
|
||||
ieee80211_scan.o 18 288 0 0 0 8889 1921 11116
|
||||
lib_a-svfiprintf.o 0 0 0 0 0 9654 1206 10860
|
||||
lib_a-vfiprintf.o 0 0 0 0 0 10069 734 10803
|
||||
ieee80211_ht.o 0 4 0 1186 0 8628 898 10716
|
||||
phy_chip_v7_ana.o 241 48 0 2657 0 7677 0 10623
|
||||
bignum.c.o 0 4 0 0 0 9652 752 10408
|
||||
tcp_in.c.o 0 52 0 0 0 8750 1282 10084
|
||||
trc.o 664 88 0 1726 0 6245 1108 9831
|
||||
tasks.c.o 8 704 0 7594 0 0 1475 9781
|
||||
ecp_curves.c.o 28 0 0 0 0 7384 2325 9737
|
||||
ecp.c.o 0 64 0 0 0 8864 286 9214
|
||||
ieee80211_hostap.o 1 41 0 0 0 8578 585 9205
|
||||
wdev.o 121 125 0 4499 0 3684 580 9009
|
||||
tcp_out.c.o 0 0 0 0 0 5686 2161 7847
|
||||
tcp.c.o 2 26 0 0 0 6161 1617 7806
|
||||
ieee80211_input.o 0 0 0 0 0 6797 973 7770
|
||||
wpa.c.o 0 656 0 0 0 6828 55 7539
|
||||
[... additional lines removed ...]
|
||||
|
||||
After the summary of total sizes, a table of "Per-file contributions to ELF file" is printed.
|
||||
|
||||
The columns are the same as shown above for ``idy.py size-components``, but this time the granularity is the contribution of each individual object file to the binary size.
|
||||
|
||||
For example, we can see that the file ``x509_crt_bundle.S.o`` contributed 64212 bytes to the total firmware size, all as ``.rodata`` in flash. Therefore we can guess that this application is using the :doc:`/api-reference/protocols/esp_crt_bundle` feature and not using this feature would save at last this many bytes from the firmware size.
|
||||
|
||||
Some of the object files are linked from binary libraries and therefore you won't find a corresponding source file. To locate which component a source file belongs to, it's generally possible to search in the ESP-IDF source tree or look in the :ref:`linker-map-file` for the full path.
|
||||
|
||||
Comparing Two Binaries
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If making some changes that affect binary size, it's possible to use an ESP-IDF tool to break down the exact differences in size.
|
||||
|
||||
This operation isn't part of ``idf.py``, it's necessary to run the ``idf-size.py`` Python tool directly.
|
||||
|
||||
To do so, first locate the linker map file in the build directory. It will have the name ``PROJECTNAME.map``. The ``idf-size.py`` tool performs its analysis based on the output of the linker map file.
|
||||
|
||||
To compare with another binary, you will also need its corresponding ``.map`` file saved from the build directory.
|
||||
|
||||
For example, to compare two builds: one with the default :ref:`CONFIG_COMPILER_OPTIMIZATION` setting "Debug (-Og)" configuration and one with "Optimize for size (-Os)":
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ $IDF_PATH/tools/idf_size.py --diff build_Og/https_request.map build_Os/https_request.map
|
||||
<CURRENT> MAP file: build_Os/https_request.map
|
||||
<REFERENCE> MAP file: build_Og/https_request.map
|
||||
Difference is counted as <CURRENT> - <REFERENCE>, i.e. a positive number means that <CURRENT> is larger.
|
||||
Total sizes of <CURRENT>: <REFERENCE> Difference
|
||||
DRAM .data size: 14516 bytes 14956 -440
|
||||
DRAM .bss size: 15792 bytes 15808 -16
|
||||
Used static DRAM: 30308 bytes ( 150428 available, 16.8% used) 30764 -456 ( +456 available, +0 total)
|
||||
Used static IRAM: 78498 bytes ( 52574 available, 59.9% used) 83918 -5420 ( +5420 available, +0 total)
|
||||
Flash code: 509183 bytes 559943 -50760
|
||||
Flash rodata: 170592 bytes 176736 -6144
|
||||
Total image size:~ 772789 bytes (.bin may be padded larger) 835553 -62764
|
||||
|
||||
We can see from the "Difference" column that changing this one setting caused the whole binary to be over 60 KB smaller and over 5 KB more RAM is available.
|
||||
|
||||
It's also possible to use the "diff" mode to output a table of component-level (static library archive) differences:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$IDF_PATH/tools/idf_size.py --archives --diff build_Og/https_request.map build_Oshttps_request.map
|
||||
|
||||
Also at the individual source file level:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$IDF_PATH/tools/idf_size.py --files --diff build_Og/https_request.map build_Oshttps_request.map
|
||||
|
||||
Other options (like writing the output to a file) are available, pass ``--help`` to see the full list.
|
||||
|
||||
.. _idf-size-linker-failed:
|
||||
|
||||
Showing Size When Linker Fails
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If too much static memory is used, then the linker will fail with an error such as ``DRAM segment data does not fit``, ``region `iram0_0_seg' overflowed by 44 bytes``, or similar.
|
||||
|
||||
In these cases, ``idf.py size`` will not succeed either. However it is possible to run ``idf_size.py`` manually in order to view the *partial static memory usage* (the memory usage will miss the variables which could not be linked, so there still appears to be some free space.)
|
||||
|
||||
The map file argument is ``<projectname>.map`` in the build directory
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$IDF_PATH/tools/idf_size.py build/project_name.map
|
||||
|
||||
It is also possible to view the equivalent of ``size-components`` or ``size-files`` output:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$IDF_PATH/tools/idf_size.py --archives build/project_name.map
|
||||
$IDF_PATH/tools/idf_size.py --files build/project_name.map
|
||||
|
||||
.. _linker-map-file:
|
||||
|
||||
Linker Map File
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
*This is an advanced analysis method, but it can be very useful. Feel free to skip ahead to :ref:`reducing-overall-size` and possibly come back to this later.*
|
||||
|
||||
The ``idf.py size`` analysis tools all work by parsing the GNU binutils "linker map file", which is a summary of everything the linker did when it created ("linked") the final firmware binary file
|
||||
|
||||
Linker map files themselves are plain text files, so it's possible to read them and find out exactly what the linker did. However, they are also very complex and long - often 100,000 or more lines!
|
||||
|
||||
The map file itself is broken into parts and each part has a heading. The parts are:
|
||||
|
||||
- ``Archive member included to satisfy reference by file (symbol)``. This shows you: for each object file included in the link, what symbol (function or variable) was the linker searching for when it included that object file. If you're wondering why some object file in particular was included in the binary, this part may give a clue. This part can be used in conjunction with the ``Cross Reference Table`` at the end of the file. Note that not every object file shown in this list ends up included in the final binary, some end up in the ``Discarded input sections`` list instead.
|
||||
- ``Allocating common symbols`` - This is a list of (some) global variables along with their sizes. Common symbols have a particular meaning in ELF binary files, but ESP-IDF doesn't make much use of them.
|
||||
- ``Discarded input sections`` - These sections were read by the linker as part of an object file to be linked into the final binary, but then nothing else referred to them so they were discarded from the final binary. For ESP-IDF this list can be very long, as we compile each function and static variable to a unique section in order to minimize the final binary size (specifically ESP-IDF uses compiler options ``-ffunction-sections -fdata-sections`` and linker option ``--gc-sections``). Items mentioned in this list *do not* contribute to the final binary.
|
||||
- ``Memory Configuration``, ``Linker script and memory map`` These two parts go together. Some of the output comes directly from the linker command line and the Linker Script, both provided by the :doc:`/api-guides/build-system`. The linker script is partially generated from the ESP-IDF project using the :doc:`/api-guides/linker-script-generation` feature.
|
||||
|
||||
As the output of the ``Linker script and memory map`` part of the map unfolds, you can see each symbol (function or static variable) linked into the final binary along with its address (as a 16 digit hex number), its length (also in hex), and the library and object file it was linked from (which can be used to determine the component and the source file).
|
||||
|
||||
Following all of the output sections that take up space in the final ``.bin`` file, the ``memory map`` also includes some sections in the ELF file that are only used for debugging (ELF sections ``.debug_*``, etc.). These don't contribute to the final binary size. You'll notice the address of these symbols is a very low number (starting from 0x0000000000000000 and counting up).
|
||||
- ``Cross Reference Table``. This table shows for each symbol (function or static variable), the list of object file(s) that referred to it. If you're wondering why a particular thing is included in the binary, this will help determine what included it.
|
||||
|
||||
.. note:: Unfortunately, the ``Cross Reference Table`` doesn't only include symbols that made it into the final binary. It also includes symbols in discarded sections. Therefore, just because something is shown here doesn't mean that it was included in the final binary - this needs to be checked separately.
|
||||
|
||||
.. note::
|
||||
|
||||
Linker map files are generated by the GNU binutils linker "ld", not ESP-IDF. You can find additional information online about the linker map file format. This quick summary is written from the perspective of ESP-IDF build system in particular.
|
||||
|
||||
.. _reducing-overall-size:
|
||||
|
||||
Reducing Overall Size
|
||||
---------------------
|
||||
|
||||
The following configuration options will reduce the final binary size of almost any ESP-IDF project:
|
||||
|
||||
.. list::
|
||||
|
||||
- Set :ref:`CONFIG_COMPILER_OPTIMIZATION` to "Optimize for size (-Os)". In some cases, "Optimize for performance (-O2)" will also reduce the binary size compared to the default. Note that if your code contains C or C++ Undefined Behaviour then increasing the compiler optimization level may expose bugs that otherwise don't happen.
|
||||
- Reduce the compiled-in log output by lowering the app :ref:`CONFIG_LOG_DEFAULT_LEVEL`. If the :ref:`CONFIG_LOG_MAXIMUM_LEVEL` is changed from the default then this setting controls the binary size instead. Reducing compiled-in logging reduces the number of strings in the binary, and also the code size of the calls to logging functions.
|
||||
- Set the :ref:`CONFIG_COMPILER_OPTIMIZATION_ASSERTION_LEVEL` to "Silent". This avoids compiling in a dedicated assertion string and source file name for each assert that may fail. It's still possible to find the failed assert in the code by looking at the memory address where the assertion failed.
|
||||
- Set :ref:`CONFIG_COMPILER_OPTIMIZATION_CHECKS_SILENT`. This removes specific error messages for particular internal ESP-IDF error check macros. This may make it harder to debug some error conditions by reading the log output.
|
||||
:esp32: - If the binary needs to run on only certain revision(s) of ESP32, increasing :ref:`CONFIG_ESP32_REV_MIN` to match can result in a reduced binary size. This will make a large difference if setting ESP32 minimum revision 3, and PSRAM is enabled.
|
||||
:esp32c3: - If the binary needs to run on only certain revision(s) of ESP32-C3, increasing :ref:`CONFIG_ESP32C3_REV_MIN` to match can result in a reduced binary size. This is particularly true if setting ESP32-C3 minimum revision 3 and using Wi-Fi, as some functionality was moved to ROM code.
|
||||
- Don't enable :ref:`CONFIG_COMPILER_CXX_EXCEPTIONS`, :ref:`CONFIG_COMPILER_CXX_RTTI`, or set the :ref:`CONFIG_COMPILER_STACK_CHECK_MODE` to Overall. All of these options are already disabled by default, but they have a large impact on binary size.
|
||||
- Disabling :ref:`CONFIG_ESP_ERR_TO_NAME_LOOKUP` will remove the lookup table to translate user-friendly names for error values (see :doc:`/api-guides/error-handling`) in error logs, etc. This saves some binary size, but error values will be printed as integers only.
|
||||
- Setting :ref:`CONFIG_ESP_SYSTEM_PANIC` to "Silent reboot" will save a small amount of binary size, however this is *only* recommended if no one will use UART output to debug the device.
|
||||
|
||||
.. note::
|
||||
|
||||
In addition to the many configuration items shown here, there are a number of configuration options where changing the option from the default will increase binary size. These are not noted here. Where the increase is significant, this is usually noted in the configuration item help text.
|
||||
|
||||
.. _size-targeted-optimizations:
|
||||
|
||||
Targeted Optimizations
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The following binary size optimizations apply to a particular component or a function:
|
||||
|
||||
Wi-Fi
|
||||
@@@@@
|
||||
|
||||
- Disabling :ref:`CONFIG_ESP32_WIFI_ENABLE_WPA3_SAE` will save some Wi-Fi binary size if WPA3 support is not needed. (Note that WPA3 is mandatory for new Wi-Fi device certifications.)
|
||||
|
||||
.. only:: esp32
|
||||
|
||||
ADC
|
||||
@@@
|
||||
|
||||
- Disabling ADC calibration features :ref:`CONFIG_ADC_CAL_EFUSE_TP_ENABLE`, :ref:`CONFIG_ADC_CAL_EFUSE_VREF_ENABLE`, :ref:`CONFIG_ADC_CAL_LUT_ENABLE` will save a small amount of binary size if ADC driver is used, at expense of accuracy.
|
||||
|
||||
.. only:: SOC_BT_SUPPORTED
|
||||
|
||||
Bluetooth NimBLE
|
||||
@@@@@@@@@@@@@@@@
|
||||
|
||||
If using :doc:`NimBLE Bluetooth Host </api-reference/bluetooth/nimble/index>` then the following modifications can reduce binary size:
|
||||
|
||||
.. list::
|
||||
|
||||
:esp32: - Set :ref:`CONFIG_BTDM_CTRL_BLE_MAX_CONN` to 1 if only one BLE connection is needed.
|
||||
- :ref:`CONFIG_BT_NIMBLE_MAX_CONNECTIONS` to 1 if only one BLE connection is needed.
|
||||
- Disable either :ref:`CONFIG_BT_NIMBLE_ROLE_CENTRAL` or :ref:`CONFIG_BT_NIMBLE_ROLE_OBSERVER` if these roles are not needed.
|
||||
- Reducing :ref:`CONFIG_BT_NIMBLE_LOG_LEVEL` can reduce binary size. Note that if the overall log level has been reduced as described above in :ref:`reducing-overall-size` then this also reduces the NimBLE log level.
|
||||
|
||||
lwIP IPv6
|
||||
@@@@@@@@@
|
||||
|
||||
- Setting :ref:`CONFIG_LWIP_IPV6` to false will reduce the size of the lwIP TCP/IP stack, at the cost of only supporting IPv4.
|
||||
|
||||
.. note::
|
||||
|
||||
IPv6 is required by some components such as ``coap`` and :doc:`/api-reference/protocols/asio`, These components will not be available if IPV6 is disabled.
|
||||
|
||||
.. _newlib-nano-formatting:
|
||||
|
||||
Newlib nano formatting
|
||||
@@@@@@@@@@@@@@@@@@@@@@
|
||||
|
||||
By default, ESP-IDF uses newlib "full" formating for I/O (printf, scanf, etc.)
|
||||
|
||||
Enabling the config option :ref:`CONFIG_NEWLIB_NANO_FORMAT` will switch newlib to the "nano" formatting mode. This both smaller in code size and a large part of the implementation is compiled into the {IDF_TARGET_NAME} ROM, so it doesn't need to be included in the binary at all.
|
||||
|
||||
The exact difference in binary size depends on which features the firmware uses, but 25 KB ~ 50 KB is typical.
|
||||
|
||||
Enabling Nano formatting also reduces the stack usage of each function that calls printf() or another string formatting function, see :ref:`optimize-stack-sizes`.
|
||||
|
||||
"Nano" formatting doesn't support 64-bit integers, or C99 formatting features. For a full list of restrictions, search for ``--enable-newlib-nano-formatted-io`` in the `Newlib README file`_.
|
||||
|
||||
.. _Newlib README file: https://sourceware.org/newlib/README
|
||||
|
||||
mbedTLS features
|
||||
@@@@@@@@@@@@@@@@
|
||||
|
||||
Under *Component Config* -> *mbedTLS* there are multiple mbedTLS features which are enabled by default but can be disabled if not needed to save code size.
|
||||
|
||||
These include:
|
||||
|
||||
- :ref:`CONFIG_MBEDTLS_HAVE_TIME`
|
||||
- :ref:`CONFIG_MBEDTLS_ECDSA_DETERMINISTIC`
|
||||
- :ref:`CONFIG_MBEDTLS_SHA512_C`
|
||||
- :ref:`CONFIG_MBEDTLS_SSL_PROTO_TLS1`
|
||||
- :ref:`CONFIG_MBEDTLS_SSL_PROTO_TLS1_1`
|
||||
- :ref:`CONFIG_MBEDTLS_CLIENT_SSL_SESSION_TICKETS`
|
||||
- :ref:`CONFIG_MBEDTLS_SERVER_SSL_SESSION_TICKETS`
|
||||
- :ref:`CONFIG_MBEDTLS_SSL_ALPN`
|
||||
- :ref:`CONFIG_MBEDTLS_CCM_C`
|
||||
- :ref:`CONFIG_MBEDTLS_GCM_C`
|
||||
- :ref:`CONFIG_MBEDTLS_ECP_C` (Alternatively: Leave this option enabled but disable some of the elliptic curves listed in the sub-menu.)
|
||||
- :ref:`CONFIG_MBEDTLS_SSL_RENEGOTIATION`
|
||||
- Change :ref:`CONFIG_MBEDTLS_TLS_MODE` if both Server & Client are not needed
|
||||
- Consider disabling some ciphersuites listed in the "TLS Key Exchange Methods" sub-menu (i.e. :ref:`CONFIG_MBEDTLS_KEY_EXCHANGE_RSA`)
|
||||
|
||||
The help text for each option has some more information.
|
||||
|
||||
.. important::
|
||||
|
||||
It is **strongly not recommended to disable all these mbedTLS options**. Only disable options where you understand the functionality and are certain that it is not needed in the application. In particular:
|
||||
|
||||
- Ensure that any TLS server(s) the device connects to can still be used. If the server is controlled by a third party or a cloud service, recommend ensuring that the firmware supports at least two of the supported cipher suites in case one is disabled in a future update.
|
||||
- Ensure that any TLS client(s) that connect to the device can still connect with supported/recommended cipher suites. Note that future versions of client operating systems may remove support for some features, so it is recommended to enable multiple supported cipher suites or algorithms for redundancy.
|
||||
|
||||
If depending on third party clients or servers, always pay attention to announcements about future changes to supported TLS features. If not, the {IDF_TARGET_NAME} device may become inaccessible if support changes.
|
||||
|
||||
.. note::
|
||||
|
||||
Not every combination of mbedTLS compile-time config is tested in ESP-IDF. If you find a combination that fails to compile or function as expected, please report the details on GitHub.
|
||||
|
||||
FreeModBus
|
||||
@@@@@@@@@@
|
||||
|
||||
If using Modbus, enable or disable :ref:`CONFIG_FMB_COMM_MODE_TCP_EN`, :ref:`CONFIG_FMB_COMM_MODE_RTU_EN`, :ref:`CONFIG_FMB_COMM_MODE_ASCII_EN` as applicable for the necessary functionality.
|
||||
|
||||
Bootloader Size
|
||||
---------------
|
||||
|
||||
This document deals with the size of an ESP-IDF app binary only, and not the ESP-IDF :ref:`second-stage-bootloader`.
|
||||
|
||||
For a discussion of ESP-IDF bootloader binary size, see :doc:`/security/secure-boot-bootloader-size`.
|
||||
|
||||
IRAM Binary Size
|
||||
----------------
|
||||
|
||||
If the IRAM section of a binary is too large, this issue can be resolved by reducing IRAM memory usage. See :ref:`optimize-iram-usage`.
|
||||
|
||||
|
||||
|
237
docs/en/api-guides/performance/speed.rst
Normal file
237
docs/en/api-guides/performance/speed.rst
Normal file
@ -0,0 +1,237 @@
|
||||
Maximizing Execution Speed
|
||||
==========================
|
||||
|
||||
{IDF_TARGET_CONTROLLER_CORE_CONFIG:default="CONFIG_BT_CTRL_PINNED_TO_CORE", esp32="CONFIG_BTDM_CTRL_PINNED_TO_CORE_CHOICE"}
|
||||
{IDF_TARGET_RF_TYPE:default="Wi-Fi/BT", esp32s2="Wi-Fi"}
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
Optimizing execution speed is a key element of software performance. Code that executes faster can also have other positive effects, like reducing overall power consumption. However, improving execution speed may have trade-offs with other aspects of performance such as :doc:`size`.
|
||||
|
||||
Choose What To Optimize
|
||||
-----------------------
|
||||
|
||||
If a function in the application firmware is executed once per week in the background, it may not matter if that function takes 10 ms or 100 ms to execute. If a function is executed constantly at 10 Hz, it matters greatly if it takes 10 ms or 100 ms to execute.
|
||||
|
||||
Most application firmwares will only have a small set of functions which require optimal performance. Perhaps those functions are executed very often, or have to meet some application requirements for latency or throughput. Optimization efforts should be targeted at these particular functions.
|
||||
|
||||
Measuring Performance
|
||||
---------------------
|
||||
|
||||
The first step to improving something is to measure it.
|
||||
|
||||
Basic Performance Measurements
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If measuring performance relative to an external interaction with the world, you may be able to measure this directly (for example see the examples :example:`wifi/iperf` and :example:`ethernet/iperf` for measuring general network performance, or you can use an oscilloscope or logic analyzer to measure timing of an interaction with a device peripheral.)
|
||||
|
||||
Otherwise, one way to measure performance is to augment the code to take timing measurements:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#include "esp_timer.h"
|
||||
|
||||
void measure_important_function(void) {
|
||||
const unsigned MEASUREMENTS = 5000;
|
||||
uint64_t start = esp_timer_get_time();
|
||||
|
||||
for (int retries = 0; retries < MEASUREMENTS; retries++) {
|
||||
important_function(); // This is the thing you need to measure
|
||||
}
|
||||
|
||||
uint64_t end = esp_timer_get_time();
|
||||
|
||||
printf("%u iterations took %ull milliseconds (%ull microseconds per invocation)\n",
|
||||
MEASUREMENTS, (end - start)/1000, (end - start)/MEASUREMENTS);
|
||||
}
|
||||
|
||||
Executing the target multiple times can help average out factors like RTOS context switches, overhead of measurements, etc.
|
||||
|
||||
- Using :cpp:func:`esp_timer_get_time` generates "wall clock" timestamps with microsecond precision, but has moderate overhead each time the timing functions are called.
|
||||
- It's also possible to use the standard Unix ``gettimeofday()`` and ``utime()`` functions, although the overhead is slightly higher.
|
||||
- Otherwise, including ``hal/cpu_hal.h`` and calling the HAL function ``cpu_hal_get_cycle_count()`` will return the number of CPU cycles executed. This function has lower overhead than the others. It is good for measuring very short execution times with high precision.
|
||||
|
||||
.. only:: not CONFIG_FREERTOS_UNICORE
|
||||
|
||||
The CPU cycles are counted per-core, so only use this method from an interrupt handler, or a task that is pinned to a single core.
|
||||
|
||||
- If making "microbenchmarks" (i.e. benchmarking only a very small routine of code that runs in less than 1-2 milliseconds) then flash cache performance can sometimes cause big variations in timing measurements depending on the binary. This happens because binary layout can cause different patterns of cache misses in a particular sequence of execution. If the test code is larger then this effect usually averages out. Executing a small function multiple times when benchmarking can help reduce the impact of flash cache misses. Alternatively, move this code to IRAM (see :ref:`speed-targeted-optimizations`).
|
||||
|
||||
External Tracing
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
The :doc:`/api-guides/app_trace` allows measuring code execution with minimal impact on the code itself.
|
||||
|
||||
Tasks
|
||||
^^^^^
|
||||
|
||||
If the option :ref:`CONFIG_FREERTOS_GENERATE_RUN_TIME_STATS` is enabled then the FreeRTOS API :cpp:func:`vTaskGetRunTimeStats` can be used to retrieve runtime information about the processor time used by each FreeRTOS task.
|
||||
|
||||
:ref:`SEGGER SystemView <app_trace-system-behaviour-analysis-with-segger-systemview>` is an excellent tool for visualizing task execution and looking for performance issues or improvements in the system as a whole.
|
||||
|
||||
Improving Overall Speed
|
||||
-----------------------
|
||||
|
||||
The following optimizations will improve the execution of nearly all code - including boot times, throughput, latency, etc:
|
||||
|
||||
.. list::
|
||||
|
||||
:esp32: - Set :ref:`CONFIG_ESPTOOLPY_FLASHFREQ` to 80 MHz. This is double the 40 MHz default value and will double the speed at which code is loaded or executed from flash. You should verify that the board or module that connects the {IDF_TARGET_NAME} to the flash chip is rated for 80 MHz operation at the relevant temperature ranges, before changing this setting. The hardware datasheet(s) will have this information.
|
||||
- Set :ref:`CONFIG_ESPTOOLPY_FLASHMODE` to QIO or QOUT mode (Quad I/O). Both will almost double the speed at which code is loaded or executed from flash compared to the default DIO mode. QIO is slightly faster than QOUT if both are supported. Note that both the flash chip model and the electrical connections between the {IDF_TARGET_NAME} and the flash chip must support quad I/O modes or the SoC will not work correctly.
|
||||
- Set :ref:`CONFIG_COMPILER_OPTIMIZATION` to "Optimize for performance (-O2)". This may slightly increase binary size compared to the default setting, but will almost certainly increase performance of some code. Note that if your code contains C or C++ Undefined Behaviour then increasing the compiler optimization level may expose bugs that otherwise are not seen.
|
||||
:SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic (``float``). Even though {IDF_TARGET_NAME} has a single precision hardware floating point unit, floating point calculations are always slower than integer calculations. If possible then use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point.
|
||||
:not SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic (``float``). On {IDF_TARGET_NAME} these calculations are emulated in software and are very slow. If possible then use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point.
|
||||
- Avoid using double precision floating point arithmetic (``double``). These calculations are emulated in software and are very slow. If possible then use an integer-based representation, or single-precision floating point.
|
||||
|
||||
Reduce Logging Overhead
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Although standard output is buffered, it's possible for an application to be limited by the rate at which it can print data to log output once buffers are full. This is particularly relevant for startup time if a lot of output is logged, but can happen at other times as well. There are multiple ways to solve this problem:
|
||||
|
||||
.. list::
|
||||
|
||||
- Reduce the volume of log output by lowering the app :ref:`CONFIG_LOG_DEFAULT_LEVEL` (the equivalent bootloader setting is :ref:`CONFIG_BOOTLOADER_LOG_LEVEL`). This also reduces the binary size, and saves some CPU time spent on string formatting.
|
||||
:not SOC_USB_SUPPORTED: - Increase the speed of logging output by increasing the :ref:`CONFIG_ESP_CONSOLE_UART_BAUDRATE`
|
||||
:SOC_USB_SUPPORTED: - Increase the speed of logging output by increasing the :ref:`CONFIG_ESP_CONSOLE_UART_BAUDRATE`. (Unless using internal USB-CDC for serial console, in which case the serial throughput doesn't depend on the configured baud rate.)
|
||||
|
||||
Not Recommended
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
The following options will also increase execution speed, but are not recommended as they also reduce the debuggability of the firmware application and may increase the severity of any bugs.
|
||||
|
||||
.. list::
|
||||
|
||||
- Set :ref:`CONFIG_COMPILER_OPTIMIZATION_ASSERTION_LEVEL` to disabled. This also reduces firmware binary size by a small amount. However, it may increase the severity of bugs in the firmware including security-related bugs. If necessary to do this to optimize a particular function, consider adding ``#define NDEBUG`` in the top of that single source file instead.
|
||||
|
||||
.. _speed-targeted-optimizations:
|
||||
|
||||
Targeted Optimizations
|
||||
----------------------
|
||||
|
||||
The following changes will increase the speed of a chosen part of the firmware application:
|
||||
|
||||
.. list::
|
||||
|
||||
- Move frequently executed code to IRAM. By default, all code in the app is executed from flash cache. This means that it's possible for the CPU to have to wait on a "cache miss" while the next instructions are loaded from flash. Functions which are copied into IRAM are loaded once at boot time, and then will always execute at full speed.
|
||||
|
||||
IRAM is a limited resource, and using more IRAM may reduce available DRAM, so a strategic approach is needed when moving code to IRAM. See :ref:`iram` for more information.
|
||||
|
||||
Improving Startup Time
|
||||
----------------------
|
||||
|
||||
In addition to the overall performance improvements shown above, the following options can be tweaked to specifically reduce startup time:
|
||||
|
||||
.. list::
|
||||
|
||||
- Minimizing the :ref:`CONFIG_LOG_DEFAULT_LEVEL` and :ref:`CONFIG_BOOTLOADER_LOG_LEVEL` has a large impact on startup time. To enable more logging after the app starts up, set the :ref:`CONFIG_LOG_MAXIMUM_LEVEL` as well and then call :cpp:func:`esp_log_set_level` to restore higher level logs. The :example:`system/startup_time` main function shows how to do this.
|
||||
- If using deep sleep, setting :ref:`CONFIG_BOOTLOADER_SKIP_VALIDATE_IN_DEEP_SLEEP` allows a faster wake from sleep. Note that if using Secure Boot this represents a security compromise, as Secure Boot validation will not be performed on wake.
|
||||
- Setting :ref:`CONFIG_BOOTLOADER_SKIP_VALIDATE_ON_POWER_ON` will skip verifying the binary on every boot from power-on reset. How much time this saves depends on the binary size and the flash settings. Note that this setting carries some risk if the flash becomes corrupt unexpectedly. Read the help text of the :ref:`config item <CONFIG_BOOTLOADER_SKIP_VALIDATE_ON_POWER_ON>` for an explanation and recommendations if using this option.
|
||||
- It's possible to save a small amount of time during boot by disabling RTC slow clock calibration. To do so, set :ref:`CONFIG_{IDF_TARGET_CFG_PREFIX}_RTC_CLK_CAL_CYCLES` to 0. Any part of the firmware that uses RTC slow clock as a timing source will be less accurate as a result.
|
||||
|
||||
The example project :example:`system/startup_time` is pre-configured to optimize startup time. The files :example_file:`system/startup_time/sdkconfig.defaults` and :example_file:`system/startup_time/sdkconfig.defaults.{IDF_TARGET_PATH_NAME}` contain all of these settings. You can append these to the end of your project's own ``sdkconfig`` file to merge the settings, but please read the documentation for each setting first.
|
||||
|
||||
Task Priorities
|
||||
---------------
|
||||
|
||||
As ESP-IDF FreeRTOS is a real-time operating system, it's necessary to ensure that high throughput or low latency tasks are granted a high priority in order to run immediately. Priority is set when calling :cpp:func:`xTaskCreate` or :cpp:func:`xTaskCreatePinnedToCore` and can be changed at runtime by calling :cpp:func:`vTaskPrioritySet`.
|
||||
|
||||
It's also necessary to ensure that tasks yield CPU (by calling :cpp:func:`vTaskDelay`, ``sleep()``, or by blocking on semaphores, queues, task notifications, etc) in order to not starve lower priority tasks and cause problems for the overall system. The :ref:`task-watchdog-timer` provides a mechanism to automatically detect if task starvation happens, however note that a Task WDT timeout does not always indicate a problem (sometimes the correct operation of the firmware requires some long-running computation). In these cases tweaking the Task WDT timeout or even disabling the Task WDT may be necessary.
|
||||
|
||||
.. _built-in-task-priorities:
|
||||
|
||||
Built-In Task Priorities
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
ESP-IDF starts a number of system tasks at fixed priority levels. Some are automatically started during the boot process, some are started only if the application firmware initializes a particular feature. To optimize performance, structure application task priorities so that they are not delayed by system tasks, while also not starving system tasks and impacting other functions of the system.
|
||||
|
||||
This may require splitting up a particular task. For example, perform a time-critical operation in a high priority task or an interrupt handler and do the non-time-critical part in a lower priority task.
|
||||
|
||||
Header :idf_file:`components/esp_system/include/esp_task.h` contains macros for the priority levels used for built-in ESP-IDF tasks system.
|
||||
|
||||
Common priorities are:
|
||||
|
||||
.. Note: the following two lists should be kept the same, but the second list also shows CPU affinities
|
||||
|
||||
.. only:: CONFIG_FREERTOS_UNICORE
|
||||
|
||||
.. list::
|
||||
|
||||
- :ref:`Main task that executes app_main function <app-main-task>` has minimum priority (1).
|
||||
- :doc:`/api-reference/system/esp_timer` system task to manage timer events and execute callbacks has high priority (22, ``ESP_TASK_TIMER_PRIO``)
|
||||
- FreeRTOS Timer Task to handle FreeRTOS timer callbacks is created when the scheduler initializes and has minimum task priority (1, :ref:`configurable <CONFIG_FREERTOS_TIMER_TASK_PRIORITY>`).
|
||||
- :doc:`/api-guides/event-handling` system task to manage the default system event loop and execute callbacks has high priority (20, ``ESP_TASK_EVENT_PRIO``). This configuration is only used if the application calls :cpp:func:`esp_event_loop_create_default`, it's possible to call :cpp:func:`esp_event_loop_create` with a custom task configuration instead.
|
||||
- :doc:`/api-guides/lwip` TCP/IP task has high priority (18, ``ESP_TASK_TCPIP_PRIO``).
|
||||
- :doc:`Wi-Fi Driver </api-guides/wifi>` task has high priority (23).
|
||||
- Wi-Fi wpa_supplicant component may create dedicated tasks while the Wi-Fi Protected Setup (WPS), WPA2 EAP-TLS, Device Provisioning Protocol (DPP) or BSS Transition Management (BTM) features are in use. These tasks all have low priority (2).
|
||||
:SOC_BT_SUPPORTED: - :doc:`Bluetooth Controller </api-reference/bluetooth/index>` task has high priority (23, ``ESP_TASK_BT_CONTROLLER_PRIO``). The Bluetooth Controller needs to respond to requests with low latency, so it should always be close to the highest priority task in the system.
|
||||
:SOC_BT_SUPPORTED: - :doc:`NimBLE Bluetooth Host </api-reference/bluetooth/nimble/index>` host task has high priority (21).
|
||||
- The Ethernet driver creates a task for the MAC to receive Ethernet frames. If using the default config ``ETH_MAC_DEFAULT_CONFIG`` then the priority is medium-high (15). This setting can be changed by passing a custom :cpp:class:`eth_mac_config_t` struct when initializing the Ethernet MAC.
|
||||
- If using the :doc:`mDNS </api-reference/protocols/mdns>` component, it creates a task with default low priority 1 (:ref:`configurable<CONFIG_MDNS_TASK_PRIORITY>`.
|
||||
- If using the :doc:`MQTT </api-reference/protocols/mqtt>` component, it creates a task with default priority 5 (:ref:`configurable<CONFIG_MQTT_TASK_PRIORITY>`, depends on :ref:`CONFIG_MQTT_USE_CUSTOM_CONFIG` (also configurable runtime by ``task_prio`` field in the :cpp:class:`esp_mqtt_client_config_t`)
|
||||
|
||||
.. only :: not CONFIG_FREERTOS_UNICORE
|
||||
|
||||
.. list::
|
||||
|
||||
- :ref:`Main task that executes app_main function <app-main-task>` has minimum priority (1). This task is pinned to Core 0 by default (:ref:`configurable<CONFIG_ESP_MAIN_TASK_AFFINITY>`).
|
||||
- :doc:`/api-reference/system/esp_timer` system task to manage high precision timer events and execute callbacks has high priority (22, ``ESP_TASK_TIMER_PRIO``). This task is pinned to Core 0.
|
||||
- FreeRTOS Timer Task to handle FreeRTOS timer callbacks is created when the scheduler initializes and has minimum task priority (1, :ref:`configurable <CONFIG_FREERTOS_TIMER_TASK_PRIORITY>`). This task is pinned to Core 0.
|
||||
- :doc:`/api-guides/event-handling` system task to manage the default system event loop and execute callbacks has high priority (20, ``ESP_TASK_EVENT_PRIO``) and pinned to Core 0. This configuration is only used if the application calls :cpp:func:`esp_event_loop_create_default`, it's possible to call :cpp:func:`esp_event_loop_create` with a custom task configuration instead.
|
||||
- :doc:`/api-guides/lwip` TCP/IP task has high priority (18, ``ESP_TASK_TCPIP_PRIO``) and is not pinned to any core (:ref:`configurable<CONFIG_LWIP_TCPIP_TASK_AFFINITY>`).
|
||||
- :doc:`Wi-Fi Driver </api-guides/wifi>` task has high priority (23) and is pinned to Core 0 by default (:ref:`configurable<CONFIG_ESP32_WIFI_TASK_CORE_ID>`).
|
||||
- Wi-Fi wpa_supplicant component may create dedicated tasks while the Wi-Fi Protected Setup (WPS), WPA2 EAP-TLS, Device Provisioning Protocol (DPP) or BSS Transition Management (BTM) features are in use. These tasks all have low priority (2) and are not pinned to any core.
|
||||
:SOC_BT_SUPPORTED: - :doc:`Bluetooth Controller </api-reference/bluetooth/index>` task has high priority (23, ``ESP_TASK_BT_CONTROLLER_PRIO``) and is pinned to Core 0 by default (:ref:`configurable <{IDF_TARGET_CONTROLLER_CORE_CONFIG}>`). The Bluetooth Controller needs to respond to requests with low latency, so it should always be close to the highest priority task assigned to a single CPU.
|
||||
:SOC_BT_SUPPORTED: - :doc:`NimBLE Bluetooth Host </api-reference/bluetooth/nimble/index>` host task has high priority (21) and is pinned to Core 0 by default (:ref:`configurable <CONFIG_BT_NIMBLE_PINNED_TO_CORE_CHOICE>`).
|
||||
:esp32: - :doc:`Bluedroid Bluetooth Host </api-reference/bluetooth/index>` creates multiple tasks when used:
|
||||
- Stack event callback task ("BTC") has high priority (19).
|
||||
- Stack BTU layer task has high priority (20).
|
||||
- Host HCI host task has high priority (22).
|
||||
|
||||
All Bluedroid Tasks are pinned to the same core, which is Core 0 by default (:ref:`configurable <CONFIG_BT_BLUEDROID_PINNED_TO_CORE_CHOICE>`).
|
||||
- The Ethernet driver creates a task for the MAC to receive Ethernet frames. If using the default config ``ETH_MAC_DEFAULT_CONFIG`` then the priority is medium-high (15) and the task is not pinned to any core. These settings can be changed by passing a custom :cpp:class:`eth_mac_config_t` struct when initializing the Ethernet MAC.
|
||||
- If using the :doc:`mDNS </api-reference/protocols/mdns>` component, it creates a task with default low priority 1 (:ref:`configurable <CONFIG_MDNS_TASK_PRIORITY>`) and pinned to CPU0 (:ref:`configurable <CONFIG_MDNS_TASK_AFFINITY>`).
|
||||
- If using the :doc:`MQTT </api-reference/protocols/mqtt>` component, it creates a task with default priority 5 (:ref:`configurable <CONFIG_MQTT_TASK_PRIORITY>`, depends on :ref:`CONFIG_MQTT_USE_CUSTOM_CONFIG`) and not pinned to any core (:ref:`configurable <CONFIG_MQTT_TASK_CORE_SELECTION_ENABLED>`).
|
||||
|
||||
Choosing application task priorities
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. only:: CONFIG_FREERTOS_UNICORE
|
||||
|
||||
In general, it's not recommended to set task priorities higher than the built-in {IDF_TARGET_RF_TYPE} operations as starving them of CPU may make the system unstable. For very short timing-critical operations that don't use the network, use an ISR or a very restricted task (very short bursts of runtime only) at highest priority (24). Choosing priority 19 will allow lower layer {IDF_TARGET_RF_TYPE} functionality to run without delays, but still preempts the lwIP TCP/IP stack and other less time-critical internal functionality - this is the best option for time-critical tasks that don't perform network operations. Any task that does TCP/IP network operations should run at lower priority than the lwIP TCP/IP task (18) to avoid priority inversion issues.
|
||||
|
||||
.. only:: not CONFIG_FREERTOS_UNICORE
|
||||
|
||||
With a few exceptions (most importantly the lwIP TCP/IP task), in the default configuration most built-in tasks are pinned to Core 0. This makes it quite easy for the application to place high priority tasks on Core 1. Using priority 19 or higher will guarantee an application task can run on Core 1 without being preempted by any built-in task. To further isolate the tasks running on each CPU, configure the :ref:`lwIP task <CONFIG_LWIP_TCPIP_TASK_AFFINITY>` to only run on Core 0 instead of either core (this may reduce total TCP/IP throughput depending on what other tasks are running).
|
||||
|
||||
In general, it's not recommended to set task priorities on Core 0 higher than the built-in {IDF_TARGET_RF_TYPE} operations as starving them of CPU may make the system unstable. Choosing priority 19 and Core 0 will allow lower layer {IDF_TARGET_RF_TYPE} functionality to run without delays, but still pre-empts the lwIP TCP/IP stack and other less time-critical internal functionality - this is an option for time-critical tasks that don't perform network operations. Any task that does TCP/IP network operations should run at lower priority than the lwIP TCP/IP task (18) to avoid priority inversion issues.
|
||||
|
||||
.. note::
|
||||
|
||||
Setting a task to always run in preference to built-in ESP-IDF tasks does not require pinning to Core 1. The task can be left unpinned - at priority 17 or lower - to optionally run on Core 0 as well, if no higher priority built-in task is running there. Using unpinned tasks can improve the overall CPU utilization, however it makes reasoning about task scheduling more complex.
|
||||
|
||||
.. note::
|
||||
|
||||
Task execution is always completely suspended when writing to the built-in SPI flash chip. Only :ref:`iram-safe-interrupt-handlers` will continue executing.
|
||||
|
||||
Improving Interrupt Performance
|
||||
-------------------------------
|
||||
|
||||
ESP-IDF supports dynamic :doc:`/api-reference/system/intr_alloc` with interrupt preemption. Each interrupt in the system has a priority, and higher priority interrupts will preempt lower priority ones.
|
||||
|
||||
Interrupt handlers will execute in preference to any task (provided the task is not inside a critical section). For this reason, it's important to minimize the amount of time spent executing in an interrupt handler.
|
||||
|
||||
To obtain the best performance for a particular interrupt handler:
|
||||
|
||||
.. list::
|
||||
|
||||
- Assign more important interrupts a higher priority using a flag such as ``ESP_INTR_FLAG_LEVEL2`` or ``ESP_INTR_FLAG_LEVEL3`` when calling :cpp:func:`esp_intr_alloc`.
|
||||
:not CONFIG_FREERTOS_UNICORE: - Assign the interrupt on a CPU where built-in {IDF_TARGET_RF_TYPE} tasks are not configured to run (this means assigning on Core 1 by default, see :ref:`built-in-task-priorities`). Interrupts are assigned on the same CPU where the :cpp:func:`esp_intr_alloc` function call is made.
|
||||
- If you're sure the entire interrupt handler can run from IRAM (see :ref:`iram-safe-interrupt-handlers`) then set the ``ESP_INTR_FLAG_IRAM`` flag when calling :cpp:func:`esp_intr_alloc` to assign the interrupt. This prevents it being temporarily disabled if the application firmware writes to the internal SPI flash.
|
||||
- Even if the interrupt handler is not IRAM safe, if it is going to be executed frequently then consider moving the handler function to IRAM anyhow. This minimizes the chance of a flash cache miss when the interrupt code is executed (see :ref:`speed-targeted-optimizations`). It's possible to do this without adding the ``ESP_INTR_FLAG_IRAM`` flag to mark the interrupt as IRAM-safe, if only part of the handler is guaranteed to be in IRAM.
|
||||
|
||||
Improving Network Speed
|
||||
-----------------------
|
||||
|
||||
* For Wi-Fi, see :ref:`How-to-improve-Wi-Fi-performance` and :ref:`wifi-buffer-usage`
|
||||
* For lwIP TCP/IP (Wi-Fi and Ethernet), see :ref:`lwip-performance`
|
||||
* The :example:`wifi/iperf` example contains a configuration that is heavily optimized for Wi-Fi TCP/IP throughput. Append the contents of the files :example_file:`wifi/iperf/sdkconfig.defaults`, :example_file:`wifi/iperf/sdkconfig.defaults.{IDF_TARGET_PATH_NAME}` and :example_file:`wifi/iperf/sdkconfig.ci.99` to your project ``sdkconfig`` file in order to add all of these options. Note that some of these options may have trade-offs in terms of reduced debuggability, increased firmware size, increased memory usage, or reduced performance of other features. To get the best result, read the documentation pages linked above and use this information to determine exactly which options are best suited for your app.
|
@ -246,6 +246,8 @@ Similar to multi-device test cases, multi-stage test cases will also print sub-m
|
||||
First time you execute this case, input ``1`` to run first stage (trigger deepsleep). After DUT is rebooted and able to run test cases, select this case again and input ``2`` to run the second stage. The case only passes if the last stage passes and all previous stages trigger reset.
|
||||
|
||||
|
||||
.. _cache-compensated-timer:
|
||||
|
||||
Timing Code with Cache Compensated Timer
|
||||
-----------------------------------------
|
||||
|
||||
|
@ -1906,7 +1906,7 @@ Dynamic vs. Static Buffer
|
||||
|
||||
The default type of buffer in Wi-Fi drivers is "dynamic". Most of the time the dynamic buffer can significantly save memory. However, it makes the application programming a little more difficult, because in this case the application needs to consider memory usage in Wi-Fi.
|
||||
|
||||
lwIP also allocates buffers at the TCP/IP layer, and this buffer allocation is also dynamic. See `lwIP documentation section about memory use and performance <https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/lwip.html#performance-optimization>`_.
|
||||
lwIP also allocates buffers at the TCP/IP layer, and this buffer allocation is also dynamic. See :ref:`lwIP documentation section about memory use and performance <lwip-performance>`.
|
||||
|
||||
Peak Wi-Fi Dynamic Buffer
|
||||
++++++++++++++++++++++++++++++
|
||||
|
@ -54,6 +54,8 @@ sections or interrupt handlers should ever block waiting for another event to oc
|
||||
If changing the code to reduce the processing time is not possible or desirable, it's possible to
|
||||
increase the :ref:`CONFIG_ESP_INT_WDT_TIMEOUT_MS` setting instead.
|
||||
|
||||
.. _task-watchdog-timer:
|
||||
|
||||
Task Watchdog Timer
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
@ -93,8 +93,6 @@ idf.py
|
||||
|
||||
.. 注解:: 环境变量 ``ESPPORT`` 和 ``ESPBAUD`` 可分别用来设置 ``-p`` 和 ``-b`` 选项的默认值。在命令行中,重新为这两个选项赋值,会覆盖其默认值。
|
||||
|
||||
.. _idf.py-size:
|
||||
|
||||
高级命令
|
||||
^^^^^^^^
|
||||
|
||||
|
@ -29,6 +29,7 @@ API 指南
|
||||
lwIP TCP/IP 协议栈 <lwip>
|
||||
Memory Types <memory-types>
|
||||
分区表 <partition-tables>
|
||||
Performance <performance/index>
|
||||
射频校准 <RF_calibration>
|
||||
:esp32: 安全启动 <../security/secure-boot-v1>
|
||||
安全启动 V2 <../security/secure-boot-v2>
|
||||
|
@ -8,6 +8,8 @@
|
||||
|
||||
ESP-IDF 应用程序的代码可以放在以下内存区域之一。
|
||||
|
||||
.. _iram:
|
||||
|
||||
IRAM(指令 RAM)
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -47,6 +49,8 @@ RAM。除了开始的 64kB 用作 PRO CPU 和 APP CPU
|
||||
中读取代码和数据的,将函数放在 IRAM
|
||||
中运行可以减少由高速缓存未命中引起的时间延迟。
|
||||
|
||||
.. _irom:
|
||||
|
||||
IROM(代码从 Flash 中运行)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -101,6 +105,8 @@ DRAM(数据 RAM)
|
||||
|
||||
__NOINIT_ATTR uint32_t noinit_data;
|
||||
|
||||
.. _drom:
|
||||
|
||||
DROM(数据存储在 Flash 中)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
2
docs/zh_CN/api-guides/performance/index.rst
Normal file
2
docs/zh_CN/api-guides/performance/index.rst
Normal file
@ -0,0 +1,2 @@
|
||||
.. include:: ../../../en/api-guides/performance/index.rst
|
||||
|
2
docs/zh_CN/api-guides/performance/ram-usage.rst
Normal file
2
docs/zh_CN/api-guides/performance/ram-usage.rst
Normal file
@ -0,0 +1,2 @@
|
||||
.. include:: ../../../en/api-guides/performance/ram-usage.rst
|
||||
|
2
docs/zh_CN/api-guides/performance/size.rst
Normal file
2
docs/zh_CN/api-guides/performance/size.rst
Normal file
@ -0,0 +1,2 @@
|
||||
.. include:: ../../../en/api-guides/performance/size.rst
|
||||
|
2
docs/zh_CN/api-guides/performance/speed.rst
Normal file
2
docs/zh_CN/api-guides/performance/speed.rst
Normal file
@ -0,0 +1,2 @@
|
||||
.. include:: ../../../en/api-guides/performance/speed.rst
|
||||
|
Loading…
Reference in New Issue
Block a user