Since dd849ffc, _rodata_start label has been moved to a different
linker output section from where the TLS templates (.tdata, .tbss)
are located. Since link-time addresses of thread-local variables are
calculated relative to the section start address, this resulted in
incorrect calculation of THREADPTR/$tp registers.
Fix by introducing new linker label, _flash_rodata_start, which points
to the .flash.rodata output section where TLS variables are located,
and use it when calculating THREADPTR/$tp.
Also remove the hardcoded rodata section alignment for Xtensa targets.
Alignment of rodata can be affected by the user application, which is
the issue dd849ffc was fixing. To accommodate any possible alignment,
save it in a linker label (_flash_rodata_align) and then use when
calculating THREADPTR. Note that this is not required on RISC-V, since
this target doesn't use TPOFF.
Enable shared stack watchpoint for overflow detection
Enable unit tests:
* "test printf using shared buffer stack" for C3
* "Test vTaskDelayUntil" for S2
* "UART can do poll()" for C3