As my system is growing in many directions (HW, SW, features…), I am currently considering moving to PlatformIO to add more flexibility/stability in my system infrastructure without changing the code (C code for FW, C++ for libraries) which is building/working fine on my current environment.
Unfortunately I experiment some crash (arm_fault) when running libraries after a while (few seconds to some minutes). This seems to be related to my C++ libraries, eventually dynamic allocation, eventually memory alignment problems, … I’m running out of ideas on directions to investigate.
More details below.
-Have you any clue on what could be causing the problem? I know it’s difficult without a reproducible example at handle but at least suggesting any other directions might help.
-I know that Zephyr C++ support is not fully comprehensive yet and so I could be hitting some limits here. Have you any experience doing similar things on your own project and could you point me in the right direction?
-As far as I investigated I have the feeling it’s related to dynamic memory allocation/reallocation or memory alignment problems. Given that I use the default linker file and default libc-hooks (for Newlib C), do you think I should provide these on my own as well? How did you do yourself?
My stable environment (on the left) → PlatformIO environment (not stable on the right)
More or less I considered the following porting:
- Custom Makefiles → PlatformIO (using platformio.ini)
- FreeRTOS - > Zephyr RTOS (using CMSIS rtos v2 to abstract differences)
- Custom HW description/Init → Zephyr board abstraction + drivers
- C++ libraries built using CMake → Same libraries build triggered by PlatformIO
My current MCU is an ARM Cortex-M7, very standard and I am able to proceed to building, uploading, debugging and unit-testing.
The board I use is custom but very similar to Nucleo-H743ZI.
[env:NucleoH743ZI] platform = ststm32 board = nucleo_h743zi framework = zephyr
The current toolchain (from platformIO packages) used to build both FW and Lib is email@example.com (8.2.1), I reproduced the same faulty behavior with 9.2.1 or even external arm-gcc toolchain 10.2.
The C++ libraries are custom C++ archive (.a), statically linked to my FW.
I cannot disclose the real usage but they are computer-vision library like Eigen, Zxing using FPU, dynamic memory allocation eventually C++ exceptions but no HW access, nor threading or anything. Only scientific computation.
I am aware it’s not generally recommended to use such C++ features in embedded context but so far on the stable setup I have encountered no problems with them.
I reduced the problem to a minimal example running in a unit-test (using Unity framework).
A single main thread simply calling my library foo in a while loop with same known inputs and waiting for a crash. Which happens fast but usually after hundreds of iterations.
The custom transportation for the test is using USB_CDC_ACM to be able to show traces.
The problem happens both in release and debug mode. Console activated or not.
foo(inputs); // After some iterations, arm_fault is called → MCU spins endlessly
I actually discovered the problem when raising the main stack size (CONFIG_MAIN_STACK_SIZE in Zephyr prj.conf) from 30KB (Everything runs fine without any problems):
to > 100KB → Crash as described.
The stack size is located at the end of my zephyr_prebuilt.map file after all code/data symbols. Only the heap memory goes after and fills up the current SRAM section (512KB).
Which leads me to the conclusion that beyond a certain stack size, some sort of corruption was becoming fatal for my system. But I have no clue where it happens.
Basically my SRAM section is divided into:
- starts @0x24000000
- ~30KB of code
- 30KB (no problem) → >100KB of stack memory (faulty)
- rest up to 512KB of heap memory.
- __kernel__ram_end @0x24080000
The heap usage is basically around 20KB which is totally fine in my setup with ~350KB of available heap memory.
I have tried to investigate the arm fault by accessing exception stack frames and dedicated registers. It showed me that the fault was mainly categorized as BUS_FAULT (Imprecise error). When I accessed the stack pointer at execption usually it shows me symbols related to memory allocation or eventually reentrancy like malloc/malloc_r/free/free_r but I have not been able to reproduce the problem in a separated test case dealing with intensive usage of such symbols. I also don’t use thread at all in this context (single main thread) while I have multithreading activated in the regular full FW.
I was not able to get more stack trace to be able to point to the code causing the error but I assume it’s related to memory allocation on some sort, like reallocating/resizing a matrix or so. I have not been able to reproduce with specific code smaller than what I showed (1 library call).
Here are a bunch of meaningful additional configuration properties, compiler flags, options that I use (most are auto-generated by platformio/zephyr). I gathered them from a custom script accessing Environment definition (using Scons python module).
I compile FW and libraries with the exact same set of options and flags.
I trigger the library compilation (using CMake) from a custom target for my board and bridge every options to CMake, toolchain included.
-std=gnu99 -std=gnu++11 // Using this to make my libraries compiling
CONFIG_FPU=y CONFIG_ARM_MPU=n CONFIG_HW_STACK_PROTECTION=n CONFIG_NEWLIB_LIBC=y CONFIG_NEWLIB_LIBC_NANO=n CONFIG_NEWLIB_LIBC_FLOAT_PRINTF=y CONFIG_NEWLIB_LIBC_MIN_REQUIRED_HEAP_SIZE=65536 CONFIG_NEWLIB_LIBC_ALIGNED_HEAP_SIZE=65536 CONFIG_CPLUSPLUS=y CONFIG_LIB_CPLUSPLUS=y CONFIG_MAIN_STACK_SIZE=130000 // -> faulty
'-std=gnu99', '-Os', '-ffreestanding', '-fno-common', '-g', '-fdiagnostics-color=always', '-mcpu=cortex-m7', '-mthumb', '-mabi=aapcs', '-mfpu=fpv5-d16', '-mfloat-abi=hard', '-Wall', '-Wformat', '-Wformat-security', '-Wno-format-zero-length', '-Wno-main', '-Wno-pointer-sign', '-Wpointer-arith', '-Wexpansion-to-defined', '-Wno-unused-but-set-variable', '-Werror=implicit-int', '-fno-asynchronous-unwind-tables', '-fno-pie', '-fno-pic', '-fno-strict-overflow', '-fno-reorder-functions', '-fno-defer-pop', '-ffunction-sections', '-fdata-sections'
'-std=gnu++11', '-Os', '-fcheck-new', '-Wno-register', '-fno-exceptions', '-fno-rtti', '-ffreestanding', '-fno-common', '-g', '-fdiagnostics-color=always', '-mcpu=cortex-m7', '-mthumb', '-mabi=aapcs', '-mfpu=fpv5-d16', '-mfloat-abi=hard', '-Wall', '-Wformat', '-Wformat-security', '-Wno-format-zero-length', '-Wno-main', '-Wpointer-arith', '-Wexpansion-to-defined', '-Wno-unused-but-set-variable', '-fno-asynchronous-unwind-tables', '-fno-pie', '-fno-pic', '-fno-strict-overflow', '-fno-reorder-functions', '-fno-defer-pop', '-ffunction-sections', '-fdata-sections'
'-T', '$LDSCRIPT_PATH', '-Wl,-Map=../zephyr/zephyr_prebuilt.map', '-mcpu=cortex-m7', '-mthumb', '-mabi=aapcs', '-mfpu=fpv5-d16', '-Wl,--gc-sections', '-Wl,--build-id=none', '-Wl,--sort-common=descending', '-Wl,--sort-section=alignment', '-Wl,-u,_OffsetAbsSyms', '-Wl,-u,_ConfigAbsSyms', '-nostdlib', '-static', '-no-pie', '-Wl,-X', '-Wl,-N', '-Wl,--orphan-handling=warn', '-Wl,-lc', '-u_printf_float', '-Wl,-lgcc'