Compiler optimization levels for STM32

What are the options required to get the various optimization levels when compiling and linking? My current platformio.ini file looks like this.

[env:nucleo_l432kc]
platform = ststm32
board = nucleo_l432kc
framework = mbed
build_flags = -O2 -DPIO_FRAMEWORK_MBED_RTOS_PRESENT

The -O2 (thinking this was a higher optimization level than 0 had no effect on the size of the image for the target.

1 Like

Some optimization level may already be in effect. Add -v to the build flags to see the verbose output during build. You’ll see all flags your project is built with.

Execute pio settings set force_verbose Yes in a shell and recompile. You’ll see the full gcc/g++ commands.

-Os (which is -O2 with a few more things) is always on by default (just like in the Arduino IDE). You might want to build_unflags = -Os followed by build_flags = -O3 or some more options. SIze optimization is a really wide topic though, and adding a compiler switch and expecting the firmware size to go down significantly doesn’t always happen. Must be done on an a per-case basis.

For that you’d need to let the compiler output a linker map file (.map) which you can then analyze with other tools.

Thanks @azarubkin and @maxgerhardt. Here is what the compile command looks like in verbose mode.

arm-none-eabi-g++ -o .pioenvs/nucleo_l432kc/src/main.o -c -std=gnu++98 -fno-rtti -Wvla -mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=softfp -c -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -fmessage-length=0 -fno-exceptions -fno-builtin -ffunction-sections -fdata-sections -funsigned-char -MMD -fno-delete-null-pointer-checks -fomit-frame-pointer -Os -DNDEBUG -g1 -include mbed_config.h -v -D__MBED__=1 -DDEVICE_I2CSLAVE=1 -D__FPU_PRESENT=1 -DDEVICE_PORTOUT=1 -DDEVICE_PORTINOUT=1 -DTARGET_RTOS_M4_M7 -DDEVICE_LOWPOWERTIMER=1 -DDEVICE_RTC=1 -DTOOLCHAIN_object -DDEVICE_SERIAL_ASYNCH=1 -D__CMSIS_RTOS -DTOOLCHAIN_GCC -DDEVICE_CAN=1 -DTARGET_CORTEX_M -DDEVICE_I2C_ASYNCH=1 -DTARGET_LIKE_CORTEX_M4 -DDEVICE_ANALOGOUT=1 -DTARGET_M4 -DTARGET_UVISOR_UNSUPPORTED -DTARGET_STM32L4 -DDEVICE_SPI_ASYNCH=1 -DDEVICE_PWMOUT=1 -DTARGET_STM32L432xC -DTARGET_CORTEX -DDEVICE_I2C=1 -DTRANSACTION_QUEUE_SIZE_SPI=2 -D__CORTEX_M4 -DDEVICE_STDIO_MESSAGES=1 -DTARGET_FAMILY_STM32 -DMBED_BUILD_TIMESTAMP=1524075042.65 -DTARGET_FF_ARDUINO -DDEVICE_PORTIN=1 -DTARGET_RELEASE -DTARGET_STM -DTARGET_STM32L432KC -DDEVICE_SERIAL_FC=1 -DDEVICE_TRNG=1 -DTARGET_LIKE_MBED -D__MBED_CMSIS_RTOS_CM -DDEVICE_SLEEP=1 -DTOOLCHAIN_GCC_ARM -DDEVICE_SPI=1 -DDEVICE_INTERRUPTIN=1 -DDEVICE_SPISLAVE=1 -DDEVICE_ANALOGIN=1 -DDEVICE_SERIAL=1 -DDEVICE_FLASH=1 -DTARGET_NUCLEO_L432KC -DARM_MATH_CM4 -DSTM32L432xx -DPIO_FRAMEWORK_MBED_RTOS_PRESENT -DMBED_CONF_RTOS_PRESENT -DPLATFORMIO=30503

I see a -o right at the beginning but without any number following it.

-o .pioenvs/nucleo_l432kc/src/main.o means output file (main.o in this case). In the command you have -Os -fomit-frame-pointer as the only optimization flags that I can see.

Ok, making progress. I can add various numbers after the -O option and I am getting different sizes in this output for text, which I believe is what ends up in Flash as the basic “program size”, not counting data, etc.

text       data     bss     dec     hex filename
58756      2932    7836   69524   10f94 .pioenvs/nucleo_l432kc/firmware.elf

I also found this link when searching for optimization options for the arm-none-eabi-g++ compiler.

So I’ll try some of this out and see how my code size changes. What I am really interested in trying to achieve is a very small footprint (sort of the minimum optimized using the mbed rtos with a single thread just doing blink). I know, for example, that you can get to about 5K on an 8-bit arduino using freeRTOS.

If “use the least flash” is the absolute goal, you might consider dropping mbed entirely and either trying to go more lightweight stuff like CMSIS or STM32Cube-LL or STM32Cube-HAL. With optimization flags you can only do so much – the bulk of the usage comes from the framework and your app code.

Here’s a short comparison.

For example, the cmsis-blink example compiles down to 500 byte of flash usage with 0 bytes of RAM used.

arm-none-eabi-size -B -d .pioenvs/bluepill_f103c8/firmware.elf
text	   data	    bss	    dec	    hex	filename
500	      0	      0	    500	    1f4	.pioenvs/bluepill_f103c8/firmware.elf

Arduino Blink needs

arm-none-eabi-size -B -d .pioenvs/bluepill_f103c8/firmware.elf
text	   data	    bss	    dec	    hex	filename
11932	   2152	    976	  15060	   3ad4	.pioenvs/bluepill_f103c8/firmware.elf

STM32Cube-LL needs

arm-none-eabi-size -B -d .pioenvs/nucleo_f401re/firmware.elf
text	   data	    bss	    dec	    hex	filename
1364	   1084	   1600	   4048	    fd0	.pioenvs/nucleo_f401re/firmware.elf

STM32Cube-HAL needs

text	   data	    bss	    dec	    hex	filename
1924	   1092	   1604	   4620	   120c	.pioenvs/nucleo_f401re/firmware.elf

mbed-blink needs by far the most:

arm-none-eabi-size -B -d .pioenvs/nucleo_f401re/firmware.elf
text	   data	    bss	    dec	    hex	filename
11696	   2400	    676	  14772	   39b4	.pioenvs/nucleo_f401re/firmware.elf
1 Like

Thanks @maxgerhardt. I actually have implementations that are sufficiently complex that a bare metal implementation is not what I am looking for. The Blink was just to see the minimum footprint with mbed. I’m really wanting to do code using an RTOS. The mbed RTOS, I believe is just a wrapper on the CMSIS-RTOS. If the free-rtos for arduino can get down to using just 5K I was curious how low I can go with mbed. Understanding that I am now on a 32-bit architecture rather than an 8-bit I figure I won’t get to 5K, but maybe 20 to 25K? My stm32 has a large amount of Flash, much more than I will need but was more an intellectual exercise such that I could see if I could scope down the MCU to a smaller amount of Flash and RAM and save some cost on the parts in volume.

Your data above, however, is very valuable. Thank you so much for doing and sharing that.

This table shows optimzation results using the various optimzation flags. Note that the platformio.ini file looks like this:

[env:nucleo_l432kc]
platform = ststm32
board = nucleo_l432kc
framework = mbed
build_flags = -DPIO_FRAMEWORK_MBED_RTOS_PRESENT -Og

Note, I use the -DPIO_FRAMEWORK_MBED_RTOS_PRESENT to include the RTOS when building which is what I ultimately want for my project.

Optimization Level text data bss Tota
None 25,184 2,568 6,972 34,724
O1 28,196 2,568 7,036 37,800
O2 27,580 2,568 6,972 37,120
O3 29,364 2,568 6,972 38,904
Os 25,184 2,568 6,972 34,724
Og 28,352 2,568 7,100 38,020
1 Like