What are the options required to get the various optimization levels when compiling and linking? My current platformio.ini file looks like this.
[env:nucleo_l432kc]
platform = ststm32
board = nucleo_l432kc
framework = mbed
build_flags = -O2 -DPIO_FRAMEWORK_MBED_RTOS_PRESENT
The -O2 (thinking this was a higher optimization level than 0 had no effect on the size of the image for the target.
1 Like
Some optimization level may already be in effect. Add -v
to the build flags to see the verbose output during build. You’ll see all flags your project is built with.
Execute pio settings set force_verbose Yes
in a shell and recompile. You’ll see the full gcc/g++ commands.
-Os
(which is -O2
with a few more things) is always on by default (just like in the Arduino IDE). You might want to build_unflags = -Os
followed by build_flags = -O3
or some more options. SIze optimization is a really wide topic though, and adding a compiler switch and expecting the firmware size to go down significantly doesn’t always happen. Must be done on an a per-case basis.
For that you’d need to let the compiler output a linker map file (.map
) which you can then analyze with other tools.
Thanks @azarubkin and @maxgerhardt. Here is what the compile command looks like in verbose mode.
arm-none-eabi-g++ -o .pioenvs/nucleo_l432kc/src/main.o -c -std=gnu++98 -fno-rtti -Wvla -mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=softfp -c -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -fmessage-length=0 -fno-exceptions -fno-builtin -ffunction-sections -fdata-sections -funsigned-char -MMD -fno-delete-null-pointer-checks -fomit-frame-pointer -Os -DNDEBUG -g1 -include mbed_config.h -v -D__MBED__=1 -DDEVICE_I2CSLAVE=1 -D__FPU_PRESENT=1 -DDEVICE_PORTOUT=1 -DDEVICE_PORTINOUT=1 -DTARGET_RTOS_M4_M7 -DDEVICE_LOWPOWERTIMER=1 -DDEVICE_RTC=1 -DTOOLCHAIN_object -DDEVICE_SERIAL_ASYNCH=1 -D__CMSIS_RTOS -DTOOLCHAIN_GCC -DDEVICE_CAN=1 -DTARGET_CORTEX_M -DDEVICE_I2C_ASYNCH=1 -DTARGET_LIKE_CORTEX_M4 -DDEVICE_ANALOGOUT=1 -DTARGET_M4 -DTARGET_UVISOR_UNSUPPORTED -DTARGET_STM32L4 -DDEVICE_SPI_ASYNCH=1 -DDEVICE_PWMOUT=1 -DTARGET_STM32L432xC -DTARGET_CORTEX -DDEVICE_I2C=1 -DTRANSACTION_QUEUE_SIZE_SPI=2 -D__CORTEX_M4 -DDEVICE_STDIO_MESSAGES=1 -DTARGET_FAMILY_STM32 -DMBED_BUILD_TIMESTAMP=1524075042.65 -DTARGET_FF_ARDUINO -DDEVICE_PORTIN=1 -DTARGET_RELEASE -DTARGET_STM -DTARGET_STM32L432KC -DDEVICE_SERIAL_FC=1 -DDEVICE_TRNG=1 -DTARGET_LIKE_MBED -D__MBED_CMSIS_RTOS_CM -DDEVICE_SLEEP=1 -DTOOLCHAIN_GCC_ARM -DDEVICE_SPI=1 -DDEVICE_INTERRUPTIN=1 -DDEVICE_SPISLAVE=1 -DDEVICE_ANALOGIN=1 -DDEVICE_SERIAL=1 -DDEVICE_FLASH=1 -DTARGET_NUCLEO_L432KC -DARM_MATH_CM4 -DSTM32L432xx -DPIO_FRAMEWORK_MBED_RTOS_PRESENT -DMBED_CONF_RTOS_PRESENT -DPLATFORMIO=30503
I see a -o right at the beginning but without any number following it.
-o .pioenvs/nucleo_l432kc/src/main.o
means output file (main.o
in this case). In the command you have -Os -fomit-frame-pointer
as the only optimization flags that I can see.
Ok, making progress. I can add various numbers after the -O option and I am getting different sizes in this output for text, which I believe is what ends up in Flash as the basic “program size”, not counting data, etc.
text data bss dec hex filename
58756 2932 7836 69524 10f94 .pioenvs/nucleo_l432kc/firmware.elf
I also found this link when searching for optimization options for the arm-none-eabi-g++ compiler.
So I’ll try some of this out and see how my code size changes. What I am really interested in trying to achieve is a very small footprint (sort of the minimum optimized using the mbed rtos with a single thread just doing blink). I know, for example, that you can get to about 5K on an 8-bit arduino using freeRTOS.
If “use the least flash” is the absolute goal, you might consider dropping mbed entirely and either trying to go more lightweight stuff like CMSIS or STM32Cube-LL or STM32Cube-HAL. With optimization flags you can only do so much – the bulk of the usage comes from the framework and your app code.
Here’s a short comparison.
For example, the cmsis-blink example compiles down to 500 byte of flash usage with 0 bytes of RAM used.
arm-none-eabi-size -B -d .pioenvs/bluepill_f103c8/firmware.elf
text data bss dec hex filename
500 0 0 500 1f4 .pioenvs/bluepill_f103c8/firmware.elf
Arduino Blink needs
arm-none-eabi-size -B -d .pioenvs/bluepill_f103c8/firmware.elf
text data bss dec hex filename
11932 2152 976 15060 3ad4 .pioenvs/bluepill_f103c8/firmware.elf
STM32Cube-LL needs
arm-none-eabi-size -B -d .pioenvs/nucleo_f401re/firmware.elf
text data bss dec hex filename
1364 1084 1600 4048 fd0 .pioenvs/nucleo_f401re/firmware.elf
STM32Cube-HAL needs
text data bss dec hex filename
1924 1092 1604 4620 120c .pioenvs/nucleo_f401re/firmware.elf
mbed-blink needs by far the most:
arm-none-eabi-size -B -d .pioenvs/nucleo_f401re/firmware.elf
text data bss dec hex filename
11696 2400 676 14772 39b4 .pioenvs/nucleo_f401re/firmware.elf
1 Like
Thanks @maxgerhardt. I actually have implementations that are sufficiently complex that a bare metal implementation is not what I am looking for. The Blink was just to see the minimum footprint with mbed. I’m really wanting to do code using an RTOS. The mbed RTOS, I believe is just a wrapper on the CMSIS-RTOS. If the free-rtos for arduino can get down to using just 5K I was curious how low I can go with mbed. Understanding that I am now on a 32-bit architecture rather than an 8-bit I figure I won’t get to 5K, but maybe 20 to 25K? My stm32 has a large amount of Flash, much more than I will need but was more an intellectual exercise such that I could see if I could scope down the MCU to a smaller amount of Flash and RAM and save some cost on the parts in volume.
Your data above, however, is very valuable. Thank you so much for doing and sharing that.
This table shows optimzation results using the various optimzation flags. Note that the platformio.ini file looks like this:
[env:nucleo_l432kc]
platform = ststm32
board = nucleo_l432kc
framework = mbed
build_flags = -DPIO_FRAMEWORK_MBED_RTOS_PRESENT -Og
Note, I use the -DPIO_FRAMEWORK_MBED_RTOS_PRESENT to include the RTOS when building which is what I ultimately want for my project.
Optimization Level |
text |
data |
bss |
Tota |
None |
25,184 |
2,568 |
6,972 |
34,724 |
O1 |
28,196 |
2,568 |
7,036 |
37,800 |
O2 |
27,580 |
2,568 |
6,972 |
37,120 |
O3 |
29,364 |
2,568 |
6,972 |
38,904 |
Os |
25,184 |
2,568 |
6,972 |
34,724 |
Og |
28,352 |
2,568 |
7,100 |
38,020 |
1 Like