This commit fixes an issue with gdbstub, where it would list threads
with TIDs 1 to N in qfThreadInfo/qsThreadInfo responses, and then
would tell GDB that the current TID is 0 in the qC response. This
caused an assertion failure in GDB, because it couldn't find the
thread structure corresponding to TID 0:
src/gdb/gdb/thread.c:93: internal-error: thread_info* inferior_thread(): Assertion `tp' failed.
The issue was caused by the logic of qfThreadInfo/qsThreadInfo.
If the "paniced" task index was 1, the code would report it in the
response to qfThreadInfo, and then mistakenly skip task with index 0
in qsThreadInfo, due to the use of pre-increment instead of a
post-increment.
With that issue fixed, GDB assertion doesn't happen anymore. However
the code contained a deeper problem, which manifested itself in the
fact that GDB would incorrectly show task index 0 as the current task,
after the above fix.
Previous version of the code assumed that when GDB requests the thread
list, it uses the first thread returned by the target as the "default"
thread, and subsequently shows the user that the program is stopped
in that thread. This assumption was incorrect. In fact, after
connecting to a remote target, GDB obtains information about the
"default" or "current" thread from two sources:
1. the 'thread' special register indicated in the status response
($T00thread;00000001#ee)
2. if the target has only sent the plain stop response ($T00#ee), GDB
would ask for the current thread using a qC packet.
With that in mind, it is not necessary to report the paniced task as
the first task in qfThreadInfo response. We can simply returns the
tasks in their natural order, and then indicate the current task in
the qS packet response.
However even that change does not fully resolve the issues with task
list. The previous version of this code also incorrectly interpreted
the meaning of GDB TIDs -1 and 0. When GDB sends an "Hg0" command
early in the connection process, it doesn't expect the server to set
task 0 as the current task, as the code assumed. Rather, it tells the
server to "set any (arbitrary) task as the current one", and the most
logical thing to do for the server that is already in "stopped" state
is to keep the current task selection.
Since TID 0 has a special meaning in GDB remote protocol, gdbstub code
is now modified to map task indices (which start from 0) to GDB TIDs.
GDB TIDs are arbitrary, and for simplicity we keep the same order and
start counting them from 1.
The summary of all the above changes is:
1. Use "task index + 1" as the TID reported to GDB
2. Report the tasks in natural order; don't complicate the code to
make the paniced task first in the list.
3. Centralize modification of 'current_task_index' and 'regfile'
in the new 'set_active_task' function, to improve encapsulation.
At -O2 optimization level, GCC seems to optimize out the copying of the
first byte of the checksum, assuming it is zero. This "miscompilation"
happens because the esp_app_desc struct is declared const, but then modified
post-compilation. Casting to volatile disables the optimization.
Closes: https://github.com/espressif/esp-idf/pull/6389
Possible for a joined task to be deleted at the moment it is logging,
meaning it might hold the stdout lock. In that case the lock isn't
released and the next task to try and take it (i.e. call printf)
will block indefinitely.
Additionally, always enable the partition MD5 check if flash encryption is on in
Release mode. This ensures the partition table ciphertext has not been modified
(CVE-2021-27926).
The exception is pre-V3.1 ESP-IDF bootloaders and partition tables, which
don't have support for the MD5 entry.
* Patched longjmp to be context-switch safe
longjmp modifies the windowbase and windowstart
registers, which isn't safe if a context switch
occurs during the modification. After a context
switch, windowstart and windowbase will be
different, leading to a wrongly set windowstart
bit due to longjmp writing it based on the
windowbase before the context switch. This
corrupts the registers at the next window
overflow reaching that wrongly set bit.
The solution is to disable interrupts during
this code. It is only 6 instructions long,
the impact shouldn't be significant.
The fix is implemented as a wrapper which
replaces the original first instructions of
longjmp which are buggy. Then, it jumps back
to execute the rest of the original longjmp
function.
Added a comparably reliable test to the
test apps.