Why this small code doesn't fit in flash? (STM32)

I have a STM32H7 project that compiles and runs with plenty of spare RAM and FLASH, However, when I add to main.cpp a small test, the build fails, saying that it can’t be fit in Flash. Any idea what’s going on and how to fix it? The probject is based on files generated by Cube IDE, and the .ld file (below) is an exact copy of the Cube IDE file.

Build without the code

Building in debug mode
Checking size .pio\build\my_env\firmware.elf
Advanced Memory Usage is available via "PlatformIO Home > Project Inspect"
RAM:   [==        ]  18.1% (used 23788 bytes from 131072 bytes)
Flash: [==        ]  16.9% (used 88732 bytes from 524288 bytes)

The test code that was added:

class A {
 public:
  A() {}
  virtual ~A() {}
  virtual void foo() = 0;
};

template <uint16_t N>
class B : public A {
 public:
  B() {}
  virtual ~B() {}
  virtual void foo() {}
};

B<10> b;

Building with the test code.

Compiling .pio\build\my_env\src\host_link.o
Compiling .pio\build\my_env\src\main.o
Compiling .pio\build\my_env\src\tasks.o
Linking .pio\build\my_env\firmware.elf
c:/users/user/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/bin/../lib/gcc/arm-none-eabi/7.2.1/../../../../arm-none-eabi/bin/ld.exe: .pio\build\my_env\firmware.elf section `.rodata' will not fit in region `FLASH'
c:/users/user/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/bin/../lib/gcc/arm-none-eabi/7.2.1/../../../../arm-none-eabi/bin/ld.exe: region `FLASH' overflowed by 4432 bytes
collect2.exe: error: ld returned 1 exit status
*** [.pio\build\my_env\firmware.elf] Error 1

platformio.ini

[env:my_env]
platform = ststm32
extra_scripts = extra_script.py
board = weact_mini_h750vbtx
build_type = debug
debug_tool = stlink
upload_protocol = stlink
debug_build_flags = -O0 -ggdb3 -g3
board_build.ldscript = STM32H750VBTX_FLASH.ld
monitor_port = COM6
lib_archive = no
lib_deps = 
  cube_ide
  serial_packets
build_flags =
  -fmax-errors=5
  -mfpu=fpv5-sp-d16 
  -mfloat-abi=hard 
  -Wl,-Map,${BUILD_DIR}/firmware.map
  -mthumb 
  -D DEBUG
  -D USE_HAL_DRIVER
  -DSTM32_THREAD_SAFE_STRATEGY=4
  -fstack-usage
  -std=gnu11
  -Ilib/cube_ide/Core/Inc
  -Ilib/cube_ide/Core/ThreadSafe
  -Ilib/cube_ide/Drivers/CMSIS/Device/ST/STM32H7xx/Include
  -Ilib/cube_ide/Drivers/CMSIS/Include
  -Ilib/cube_ide/Drivers/STM32H7xx_HAL_Driver/Inc
  -Ilib/cube_ide/Middlewares/ST/STM32_USB_Device_Library/Class/CDC/Inc
  -Ilib/cube_ide/Middlewares/ST/STM32_USB_Device_Library/Core/Inc
  -Ilib/cube_ide/Middlewares/Third_Party/FreeRTOS/Source/CMSIS_RTOS
  -Ilib/cube_ide/Middlewares/Third_Party/FreeRTOS/Source/CMSIS_RTOS_V2/Include
  -Ilib/cube_ide/Middlewares/Third_Party/FreeRTOS/Source/include
  -Ilib/cube_ide/Middlewares/Third_Party/FreeRTOS/Source/portable/GCC/ARM_CM4F
  -Ilib/cube_ide/USB_DEVICE/App
  -Ilib/cube_ide/USB_DEVICE/Target
  -D CONFIG_MAX_PACKET_DATA_LEN=100
  -D CONFIG_MAX_PENDING_COMMANDS=5

STM32H750VBTX_FLASH.ld

/*
******************************************************************************
**
**  File        : LinkerScript.ld
**
**  Author      : STM32CubeIDE
**
**  Abstract    : Linker script for STM32H7 series
**                128Kbytes FLASH and 1056Kbytes RAM
**
**                Set heap size, stack size and stack location according
**                to application requirements.
**
**                Set memory bank area and size if external memory is used.
**
**  Target      : STMicroelectronics STM32
**
**  Distribution: The file is distributed as is, without any warranty
**                of any kind.
**
*****************************************************************************
** @attention
**
** Copyright (c) 2023 STMicroelectronics.
** All rights reserved.
**
** This software is licensed under terms that can be found in the LICENSE file
** in the root directory of this software component.
** If no LICENSE file comes with this software, it is provided AS-IS.
**
****************************************************************************
*/

/* Entry Point */
ENTRY(Reset_Handler)

/* Highest address of the user mode stack */
_estack = ORIGIN(RAM_D1) + LENGTH(RAM_D1);    /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x1000;      /* required amount of heap  */
_Min_Stack_Size = 0x400; /* required amount of stack */

/* Specify the memory areas */
MEMORY
{
  FLASH (rx)     : ORIGIN = 0x08000000, LENGTH = 128K
  DTCMRAM (xrw)  : ORIGIN = 0x20000000, LENGTH = 128K
  RAM_D1 (xrw)   : ORIGIN = 0x24000000, LENGTH = 512K
  RAM_D2 (xrw)   : ORIGIN = 0x30000000, LENGTH = 288K
  RAM_D3 (xrw)   : ORIGIN = 0x38000000, LENGTH = 64K
  ITCMRAM (xrw)  : ORIGIN = 0x00000000, LENGTH = 64K
}

/* Define output sections */
SECTIONS
{
  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
    *(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

  /* Constant data goes into FLASH */
  .rodata :
  {
    . = ALIGN(4);
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    . = ALIGN(4);
  } >FLASH

  .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
  .ARM : {
    __exidx_start = .;
    *(.ARM.exidx*)
    __exidx_end = .;
  } >FLASH

  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH

  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH

  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT(.fini_array.*)))
    KEEP (*(.fini_array*))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH

  /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);

  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data :
  {
    . = ALIGN(4);
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */
    *(.RamFunc)        /* .RamFunc sections */
    *(.RamFunc*)       /* .RamFunc* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM_D1 AT> FLASH

  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss section */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM_D1

  /* User_heap_stack section, used to check that there is enough RAM left */
  ._user_heap_stack :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
    . = . + _Min_Stack_Size;
    . = ALIGN(8);
  } >RAM_D1

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }
}

Why is that 128K when the flash length is half a megabyte at that display?

In any case, this code size explosion is really weird. Yes it uses templates and virtual functions, but that shouldn’t cause such a explosion. Maybe there’s some missing flag along the lines of -fno-rtti.

Can you do the following: In the linker script, modify the 128K to 256K. That should force linking to go through (of course not operatable). But it should have generated the .elf and .map file. Can you upload either of those? These can be inspected to see what exactly modules have what codesize.

Thanks @maxgerhardt. I am suing bare chip configuration, without the external flash, which I will plan not to include in my custom board. I added -no-rtti but the problem persisted. Information below.

Also, why does platformio says that the flash size is 512k regardless of the flash specification in the .ld file as 128k or 256k? Are these pulled from the board definition?

FLASH (rx)     : ORIGIN = 0x08000000, LENGTH = 128K

platformio.ini:

[env:my_env]
platform = ststm32
extra_scripts = extra_script.py
board = weact_mini_h750vbtx
build_type = debug
debug_tool = stlink
upload_protocol = stlink
debug_build_flags = -O0 -ggdb3 -g3
board_build.ldscript = STM32H750VBTX_FLASH.ld
monitor_port = COM6
lib_archive = no
lib_deps = 
  cube_ide
  serial_packets
build_flags =
  -fno-rtti
  -fmax-errors=5
  -mfpu=fpv5-sp-d16 
  -mfloat-abi=hard 
  -Wl,-Map,${BUILD_DIR}/firmware.map
  -mthumb 
  -D DEBUG
  -D USE_HAL_DRIVER
  -DSTM32_THREAD_SAFE_STRATEGY=4
  -fstack-usage
  -std=gnu11
  -Ilib/cube_ide/Core/Inc
  -Ilib/cube_ide/Core/ThreadSafe
  -Ilib/cube_ide/Drivers/CMSIS/Device/ST/STM32H7xx/Include
  -Ilib/cube_ide/Drivers/CMSIS/Include
  -Ilib/cube_ide/Drivers/STM32H7xx_HAL_Driver/Inc
  -Ilib/cube_ide/Middlewares/ST/STM32_USB_Device_Library/Class/CDC/Inc
  -Ilib/cube_ide/Middlewares/ST/STM32_USB_Device_Library/Core/Inc
  -Ilib/cube_ide/Middlewares/Third_Party/FreeRTOS/Source/CMSIS_RTOS
  -Ilib/cube_ide/Middlewares/Third_Party/FreeRTOS/Source/CMSIS_RTOS_V2/Include
  -Ilib/cube_ide/Middlewares/Third_Party/FreeRTOS/Source/include
  -Ilib/cube_ide/Middlewares/Third_Party/FreeRTOS/Source/portable/GCC/ARM_CM4F
  -Ilib/cube_ide/USB_DEVICE/App
  -Ilib/cube_ide/USB_DEVICE/Target
  -D CONFIG_MAX_PACKET_DATA_LEN=100
  -D CONFIG_MAX_PENDING_COMMANDS=5

Configuration 1: without the test code and flash = 128k

map file: temp_public/firmware1.map at main · zapta/temp_public · GitHub

log:

RAM:   [==        ]  18.1% (used 23788 bytes from 131072 bytes)
Flash: [==        ]  16.9% (used 88732 bytes from 524288 bytes)

Configuration 2: with the test code and flash = 128k

Map file: N/A.

log:

Linking .pio\build\my_env\firmware.elf
c:/users/user/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/bin/../lib/gcc/arm-none-eabi/7.2.1/../../../../arm-none-eabi/bin/ld.exe: .pio\build\my_env\firmware.elf section `.rodata' will not fit in region `FLASH'
c:/users/user/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/bin/../lib/gcc/arm-none-eabi/7.2.1/../../../../arm-none-eabi/bin/ld.exe: region `FLASH' overflowed by 4432 bytes
collect2.exe: error: ld returned 1 exit status
*** [.pio\build\my_env\firmware.elf] Error 1

Configuration 3: with test code and flash = 128k

Map file: temp_public/firmware3.map at main · zapta/temp_public · GitHub

Log:

RAM:   [==        ]  18.2% (used 23836 bytes from 131072 bytes)
Flash: [===       ]  25.6% (used 134104 bytes from 524288 bytes)

Load the file into amap and you’ll see some big functions in .text

Same for .rodata (read only data)

and .data.

Wonder where __gcclibcxx_demangle_callback is coming from.

There’s also “unwind backtrace” stuff in there.

Hmm two thoughts:

  1. Are you linking against nanolibc by passing --specs=nano.specs as linker flags in your extra_script.py? Otherwise you’re getting a really big C library.
  2. Do you overwrite functions like __cxa_pure_virtual so that their default code might not want to generate huge code and things like backtrace unwinding?

Thanks @maxgerhardt, the --specs=nano.specs did the trick. The __cxa_pure_virtual and __cxa_deleted_virtual, didn’t change the code size.

Adding --specs=nano.specs (without the test classes) reduced the code size is 73K

RAM:   [==        ]  16.3% (used 21328 bytes from 131072 bytes)
Flash: [=         ]  13.9% (used 73080 bytes from 524288 bytes)

And adding the test classes added ‘only’ 600 bytes.

RAM:   [==        ]  16.3% (used 21340 bytes from 131072 bytes)
Flash: [=         ]  14.0% (used 73652 bytes from 524288 bytes)

I also realized that the 100% flash that platformio shows me assumes that the external quadspi flash is used (which I though I didn’t need) so I will need to either configure the external flash or use a custom board definition with only 128K flash.

Thanks for your help! I am good now.