Hey there,
As my system is growing in many directions (HW, SW, features…), I am currently considering moving to PlatformIO to add more flexibility/stability in my system infrastructure without changing the code (C code for FW, C++ for libraries) which is building/working fine on my current environment.
Unfortunately I experiment some crash (arm_fault) when running libraries after a while (few seconds to some minutes). This seems to be related to my C++ libraries, eventually dynamic allocation, eventually memory alignment problems, … I’m running out of ideas on directions to investigate.
More details below.
Questions
-Have you any clue on what could be causing the problem? I know it’s difficult without a reproducible example at handle but at least suggesting any other directions might help.
-I know that Zephyr C++ support is not fully comprehensive yet and so I could be hitting some limits here. Have you any experience doing similar things on your own project and could you point me in the right direction?
-As far as I investigated I have the feeling it’s related to dynamic memory allocation/reallocation or memory alignment problems. Given that I use the default linker file and default libc-hooks (for Newlib C), do you think I should provide these on my own as well? How did you do yourself?
Environment description
My stable environment (on the left) → PlatformIO environment (not stable on the right)
More or less I considered the following porting:
- Custom Makefiles → PlatformIO (using platformio.ini)
- FreeRTOS - > Zephyr RTOS (using CMSIS rtos v2 to abstract differences)
- Custom HW description/Init → Zephyr board abstraction + drivers
- C++ libraries built using CMake → Same libraries build triggered by PlatformIO
My current MCU is an ARM Cortex-M7, very standard and I am able to proceed to building, uploading, debugging and unit-testing.
The board I use is custom but very similar to Nucleo-H743ZI.
[env:NucleoH743ZI]
platform = ststm32
board = nucleo_h743zi
framework = zephyr
The current toolchain (from platformIO packages) used to build both FW and Lib is toolchain-gccarmnoneeabi@1.80201.0 (8.2.1), I reproduced the same faulty behavior with 9.2.1 or even external arm-gcc toolchain 10.2.
The C++ libraries are custom C++ archive (.a), statically linked to my FW.
I cannot disclose the real usage but they are computer-vision library like Eigen, Zxing using FPU, dynamic memory allocation eventually C++ exceptions but no HW access, nor threading or anything. Only scientific computation.
I am aware it’s not generally recommended to use such C++ features in embedded context but so far on the stable setup I have encountered no problems with them.
Problem description
I reduced the problem to a minimal example running in a unit-test (using Unity framework).
A single main thread simply calling my library foo in a while loop with same known inputs and waiting for a crash. Which happens fast but usually after hundreds of iterations.
The custom transportation for the test is using USB_CDC_ACM to be able to show traces.
The problem happens both in release and debug mode. Console activated or not.
#include <libfoo.hpp>
Inputs_t inputs;
while(1)
foo(inputs); // After some iterations, arm_fault is called → MCU spins endlessly
I actually discovered the problem when raising the main stack size (CONFIG_MAIN_STACK_SIZE in Zephyr prj.conf) from 30KB (Everything runs fine without any problems):
to > 100KB → Crash as described.
The stack size is located at the end of my zephyr_prebuilt.map file after all code/data symbols. Only the heap memory goes after and fills up the current SRAM section (512KB).
Which leads me to the conclusion that beyond a certain stack size, some sort of corruption was becoming fatal for my system. But I have no clue where it happens.
Basically my SRAM section is divided into:
- starts @0x24000000
- ~30KB of code
- 30KB (no problem) → >100KB of stack memory (faulty)
- rest up to 512KB of heap memory.
- __kernel__ram_end @0x24080000
The heap usage is basically around 20KB which is totally fine in my setup with ~350KB of available heap memory.
I have tried to investigate the arm fault by accessing exception stack frames and dedicated registers. It showed me that the fault was mainly categorized as BUS_FAULT (Imprecise error). When I accessed the stack pointer at execption usually it shows me symbols related to memory allocation or eventually reentrancy like malloc/malloc_r/free/free_r but I have not been able to reproduce the problem in a separated test case dealing with intensive usage of such symbols. I also don’t use thread at all in this context (single main thread) while I have multithreading activated in the regular full FW.
I was not able to get more stack trace to be able to point to the code causing the error but I assume it’s related to memory allocation on some sort, like reallocating/resizing a matrix or so. I have not been able to reproduce with specific code smaller than what I showed (1 library call).
Here are a bunch of meaningful additional configuration properties, compiler flags, options that I use (most are auto-generated by platformio/zephyr). I gathered them from a custom script accessing Environment definition (using Scons python module).
I compile FW and libraries with the exact same set of options and flags.
I trigger the library compilation (using CMake) from a custom target for my board and bridge every options to CMake, toolchain included.
Platformio.ini
build_flags:
-std=gnu99 -std=gnu++11 // Using this to make my libraries compiling
build_unflags:
-std=c99 -std=c++11
Zephyr prj.conf
CONFIG_FPU=y
CONFIG_ARM_MPU=n
CONFIG_HW_STACK_PROTECTION=n
CONFIG_NEWLIB_LIBC=y
CONFIG_NEWLIB_LIBC_NANO=n
CONFIG_NEWLIB_LIBC_FLOAT_PRINTF=y
CONFIG_NEWLIB_LIBC_MIN_REQUIRED_HEAP_SIZE=65536
CONFIG_NEWLIB_LIBC_ALIGNED_HEAP_SIZE=65536
CONFIG_CPLUSPLUS=y
CONFIG_LIB_CPLUSPLUS=y
CONFIG_MAIN_STACK_SIZE=130000 // -> faulty
CFlags
'-std=gnu99',
'-Os',
'-ffreestanding',
'-fno-common',
'-g',
'-fdiagnostics-color=always',
'-mcpu=cortex-m7',
'-mthumb',
'-mabi=aapcs',
'-mfpu=fpv5-d16',
'-mfloat-abi=hard',
'-Wall',
'-Wformat',
'-Wformat-security',
'-Wno-format-zero-length',
'-Wno-main',
'-Wno-pointer-sign',
'-Wpointer-arith',
'-Wexpansion-to-defined',
'-Wno-unused-but-set-variable',
'-Werror=implicit-int',
'-fno-asynchronous-unwind-tables',
'-fno-pie',
'-fno-pic',
'-fno-strict-overflow',
'-fno-reorder-functions',
'-fno-defer-pop',
'-ffunction-sections',
'-fdata-sections'
CXXFLAGS
'-std=gnu++11',
'-Os',
'-fcheck-new',
'-Wno-register',
'-fno-exceptions',
'-fno-rtti',
'-ffreestanding',
'-fno-common',
'-g',
'-fdiagnostics-color=always',
'-mcpu=cortex-m7',
'-mthumb',
'-mabi=aapcs',
'-mfpu=fpv5-d16',
'-mfloat-abi=hard',
'-Wall',
'-Wformat',
'-Wformat-security',
'-Wno-format-zero-length',
'-Wno-main',
'-Wpointer-arith',
'-Wexpansion-to-defined',
'-Wno-unused-but-set-variable',
'-fno-asynchronous-unwind-tables',
'-fno-pie',
'-fno-pic',
'-fno-strict-overflow',
'-fno-reorder-functions',
'-fno-defer-pop',
'-ffunction-sections',
'-fdata-sections'
LINKFLAGS
'-T',
'$LDSCRIPT_PATH',
'-Wl,-Map=../zephyr/zephyr_prebuilt.map',
'-mcpu=cortex-m7',
'-mthumb',
'-mabi=aapcs',
'-mfpu=fpv5-d16',
'-Wl,--gc-sections',
'-Wl,--build-id=none',
'-Wl,--sort-common=descending',
'-Wl,--sort-section=alignment',
'-Wl,-u,_OffsetAbsSyms',
'-Wl,-u,_ConfigAbsSyms',
'-nostdlib',
'-static',
'-no-pie',
'-Wl,-X',
'-Wl,-N',
'-Wl,--orphan-handling=warn',
'-Wl,-lc',
'-u_printf_float',
'-Wl,-lgcc'