STM32g030k6p6 simple sketch overflows the flash memory?

Dear Friends,

I am confused, i have prototyped a project using arduino nano in the past and now I am developing the same application on an STM32 chip. How come such a simple code overflows the flash memory, what is the chip good for if we cannot read a pressure sensor on it? Am I doing anything wrong? should I only build in release mode?

CODE:

#include <Arduino.h>
#include <Wire.h>
#include <Adafruit_MPRLS.h>

// You dont need a reset and EOC pin for most uses, so we set to -1 and don’t connect
#define RESET_PIN -1 // set to any GPIO pin # to hard-reset on begin()
#define EOC_PIN -1 // set to any GPIO pin to read end-of-conversion by pin
Adafruit_MPRLS mpr = Adafruit_MPRLS(RESET_PIN, EOC_PIN);

void setup() {
Serial.begin(115200);
Serial.println(“MPRLS Simple Test”);
if (! mpr.begin()) {
Serial.println(“Failed to communicate with MPRLS sensor, check wiring?”);
while (1) {
delay(10);
}
}
Serial.println(“Found MPRLS sensor”);
}

void loop() {
float pressure_hPa = mpr.readPressure();
Serial.print("Pressure (hPa): "); Serial.println(pressure_hPa);
Serial.print("Pressure (PSI): "); Serial.println(pressure_hPa / 68.947572932);
delay(1000);
}

Footer

© 2023 GitHub, Inc.

Footer navigation

Arduino and its libraries aren’t exactly known for low-overhead. That doesn’t reflect badly on the capabilities of the chip, you can always use other frameworks to program it (like STM32Cube, libcm3, …).

1 Like

Thank you for your reply, but that means using the current packages (GENERICTM32 for the Arduino Framework) makes most 32k chips unusable… Some people say that building with release mode with no debugging can reduce the overhead significally! is that correct? Because the Cube framework for me is a bit complicated since it is closer to the bare metal. I am good with the arduino framework though.

Yes but that’s already the default setting in PlatformIO (-Os compiler optimization level). You would have to explicitly add build_type = debug to override this.

You can fake / modify the linkerscript to force it to compile then look into what’s taking up all that space, similiarly to how I did it in here, just that your linkerscript would be here.

Let me try this quickly.

So first I setup PlatformIO with a boards/genericSTM32G030K6.json with

{
    "build": {
      "core": "stm32",
      "cpu": "cortex-m0plus",
      "extra_flags": "-DSTM32G0xx -DSTM32G030xx",
      "f_cpu": "64000000L",
      "mcu": "stm32g030k6t6",
      "product_line": "STM32G030xx",
      "variant": "STM32G0xx/G030K(6-8)T"
    },
    "debug": {
      "default_tools": [
        "stlink"
      ],
      "jlink_device": "STM32G030K6",
      "openocd_target": "stm32g0x",
      "svd_path": "STM32G030.svd"
    },
    "frameworks": [
      "arduino",
      "cmsis",
      "libopencm3",
      "stm32cube"
    ],
    "name": "STM32G030K6P6 (8k RAM. 32k Flash)",
    "upload": {
      "maximum_ram_size": 8192,
      "maximum_size": 32768,
      "protocol": "serial",
      "protocols": [
        "blackmagic",
        "dfu",
        "jlink",
        "serial",
        "stlink"
      ]
    },
    "url": "https://www.st.com/en/microcontrollers-microprocessors/stm32g030k6.html",
    "vendor": "Generic"
  }

and then in the platformio.ini I configured it as

[env:genericSTM32G030K6]
platform = ststm32
board = genericSTM32G030K6
framework = arduino
lib_deps =
   adafruit/Adafruit MPRLS Library@^1.2.0
   SPI
; force successfull compilation by pretending we have more flash
board_upload.maximum_size = 64000
; generate debug symbols (good for size report)
build_flags = -g3 -ggdb

With the above code I generate the firmware and then the elf report for the firmware with mbed-os-linker-report, see Deleted

The basic breakdown is of the 34332 bytes in .text (code) are

  • code in STM32 Arduino core: 21306 bytes
    • code in system drivers: 12398 bytes
      • UART HAL: 4483 bytes (because sketch uses Serial)
      • I2C HAL: 4316 bytes (because I2C is used)
      • RCC HAL: 1600 bytes (clock control)
      • Timer Hal: 572 bytes (arduino-internal timers)
      • RCC Ex: 460 bytes (extension of RCC)
      • some other smaller ones, sum ~ 1000 bytes
  • Arduino core libraries: 6162 bytes
    • Wire (for I2C): 3096 bytes
    • “SrcWrapper” (abstraction layer on top of system drivers): 3066 bytes
  • Arduino core objects: 2558 bytes
    • HardwareSerial: 1108 bytes
    • Print: 514 bytes
    • wiring_digital (digitalWrite, pinMode, …): 416 bytes
    • Print.h, Stream.cpp: 330 bytes…
  • used external libraries: 782 bytes
    • Adafruit MPRLS: 488 bytes
    • Adafruit BusIO: 294 bytes (abstraction on top of I2C/SPI)
  • sketch code: 200 bytes
  • LibC code (standard-C library functions or support functions from the compiler): 11950 bytes
    • __aebi_dsub (Double-floatingpoint subtraction): 1828 bytes
    • __aebi_dadd (Double-floatingpoint addition): 1748 bytes
    • __aebi_ddiv (Double-floatingpoint division): 1488 bytes
    • __aebi_dmul (Double-floatingpoint multiplication): 1240 bytes
    • __aebi_fadd (Single-floatingpoint add): 824 bytes
    • __aebi_fmul (Single-floatingpoint multiplication): 564 bytes
    • __aebi_fdiv (Single-floatingpoint division): 536 bytes
    • __udivsi3 (unsigned 32-bit division): 266 bytes
    • lots of other smaller functions…

And for .rodata (constants or initial value of variables) the sketch has 1380, this also lands in flash.

So what stands out here is:

  • lots of abstraction layers from the hardware registers, first with the system drivers, then SrcWrapper, but on top of that the Arduino objects (HardwareSerial) and libraries (Wire), and even on top of that library-made abstractions like Adafruit’s BusIO library
  • The Cortex-M0+ core of the STM32G030K6 chip lacks a floating point unit (FPU) and even hardware integer multiply. This means that when the code uses floating point numbers or wants to divide two general integers, the compiler has to supply a software routine that computes the result, instead of letting the core’s ALU or FPU do the work with a specialized instruction. This is incredibly costly in your example because your sensor values are float and double type and working requires all these needed functions. Even worse, the sketch or library seems to use both float and double, requiring to put routines for both of these types in the firmware. If the firmware could only use float or double it would be less expensive.
1 Like

Great Idea,

By the way, when i use -flto it shrinks and fits, but i lose the Serial functions!!! dunnow why and how related.

And just to prove my point up there about floating point numbers:

There are two places where double is used, unneededly. The MPRLS library has in .pio\libdeps\genericSTM32G030K6\Adafruit MPRLS Library\Adafruit_MPRLS.cpp

  _OUTPUT_min = (uint32_t)((float)COUNTS_224 * (OUTPUT_min / 100.0) + 0.5);
  _OUTPUT_max = (uint32_t)((float)COUNTS_224 * (OUTPUT_max / 100.0) + 0.5);

so the 100.0 and 0.5 are double-type constants. 100.0f and 0.5f would be float-type constants.

  _OUTPUT_min = (uint32_t)((float)COUNTS_224 * (OUTPUT_min / 100.0f) + 0.5f);
  _OUTPUT_max = (uint32_t)((float)COUNTS_224 * (OUTPUT_max / 100.0f) + 0.5f);

Additionally, printing the results with pressure_hPa being a float

  Serial.print("Pressure (hPa): "); Serial.println(pressure_hPa);
  Serial.print("Pressure (PSI): "); Serial.println(pressure_hPa / 68.947572932);

has the problem that the .println() API only has a double as type, not a float.

size_t Print::println(double num, int digits)

So, as a test, forcing them to be int and again doing a float division

  Serial.print("Pressure (hPa): "); Serial.println((int)pressure_hPa);
  Serial.print("Pressure (PSI): "); Serial.println((int)(pressure_hPa / 68.947572932f));

Doing that brings the firmware down to a total of

RAM:   [==        ]  15.1% (used 1240 bytes from 8192 bytes)
Flash: [=====     ]  45.3% (used 28968 bytes from 64000 bytes)

with the .text section now only being 29608 bytes compared to the previous 34332 bytes. That’s saving approximately 5000 bytes by not including double.

1 Like

-flto is often times too aggressive and optimizes away the interrupt handlers for serial etc. One can attempt to use __attribute__(used) on the interrupt handles to prevent that but that would be modifying the arduino core.

Man, I wont take more of your time, and my knowledge is moderate when it comes to compilers and linkers, so you start sounding chinese to me when you go deeper… I really appreciate the insights you gave me.

By the way, it is just a headache to learn the whole HAL and LL thing, and my code will need to grow, i am speaking of two sensors with a radio transiever. so I guess i need to shift to maybe and ESP32-s2 for what it takes.

Time for redesigning the whole thing, I should have guessed the flash size i need to choose the right chip for my project, this is my first independent project, so i will cut myself slack for learning.