STM32F103 STM32Cube Framework - How to integrate latest CMSIS-DSP Library(v1.16.2)?

jaishankar · February 16, 2025, 7:11am

Reason

I want do to Real FFT using CMSIS DSP library (v1.16.2) via (arm_rfft_init_q15() && arm_rfft_q15() function) and get memory overflow issue region FLASH' overflowed compilation error.
Idea here to move latest CMSIS DSP library (v1.16.2) and to all the memory optimizing option

MCU Details: STM32F103C8 (the board is a BluePill), 64KB Flash, 20KB RAM.

Question

How to integrate the latest CMSIS-DSP Library(v1.16.2) into code?
What are available memory optimization option available to make the real FFT to ## Reason
I want do to Real FFT using CMSIS DSP library (v1.16.2) via (arm_rfft_init_q15() && arm_rfft_q15() function) and get memory overflow issue region FLASH' overflowed compilation error.
Idea here to move latest CMSIS DSP library (v1.16.2) and to all the memory optimising option

MCU Details: STM32F103C8 (the board is a BluePill), 64KB Flash, 20KB RAM.

Question

How to integrate the latest CMSIS-DSP Library(v1.16.2) into code?
What are available memory optimization option available to make the real FFT arm_rfft_q15() to work with this MCU?

Note: I am not sure this Real FFT from CMSIS-DSP library will work with this MCU.

Steps followed

Reference GitHub - ARM-software/CMSIS-DSP: CMSIS-DSP embedded compute library for Cortex-M and Cortex-A

Create the folder cmsis-dsp under lib folder.
Copied Include , PrivateInclude & Source from CMSIS-DSP Library(v1.16.2) to lib\cmsis-dsp
To solve compliation issue, copied all the files in the folder lib\cmsis-dsp\PrivateInclude to lib\cmsis-dsp\Include folder

Complete Code details at GitHub - Jaishankar872/9-STM32F1-Bluepill-STM32Cube-Latest-CMSIS-DSP-library

Error

Below the log during compilation, I think I missed many step during the integration step for the library

Archiving .pio\build\bluepill_f103c8_128k\lib558\libcmsis-dsp.a
Archiving .pio\build\bluepill_f103c8_128k\libFrameworkCMSISDevice.a
Linking .pio\build\bluepill_f103c8_128k\firmware.elf
c:/users/asus/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/bin/../lib/gcc/arm-none-eabi/7.2.1/../../../../arm-none-eabi/lib/thumb/v7-m\libc_nano.a(lib_a-writer.o): In function `_write_r':
writer.c:(.text._write_r+0x10): undefined reference to `_write'
collect2.exe: error: ld returned 1 exit status
*** [.pio\build\bluepill_f103c8_128k\firmware.elf] Error 1

cc: @maxgerhardt
Earlier Discussion on similar topics Stm32cube framework - How to build with proper CMSIS DSP library - #2 by maxgerhardt

maxgerhardt · February 16, 2025, 12:16pm

First of all, you can build the latest CMSIS-DSP 1.16.2 from source rather easily. All we have to do is download the libary and add a library.json file to tell PlatformIO how to build the library.

Since most flags (CPU architecture, FPU settings etc.) are already given by PlatformIO, we just have to take about compiling the right files and adding the right include folders. This is already documented in e.g. the Makefile and the README. This is done with this library.json file.

Given that the STM32F103 is a Cortex-M3 based chip with no floating point unit (FPU), the configuration settings for that are rather sparse. There is no FPU, no Helium or Neon instruction set, no major vectorization possibilities. Only with regards to the q15_t (16-bit) type, it may process two 16-bit values simulatenously in a 32-bit register. So most of the optimization options mostly come from the compiler (-Os, -Ofast, …). Those are stated in e.g. here, here.

Using the arm_rfft_init_q15() function is a bad idea and will easily lead to a FLASH overflow. The reason is that this function accepts the FFT length parameter (e.g., 256, 512, 1024 bin FFT) dynamically, and therefore has a reference to all constant tables (coefficients, twiddle table) for all sizes, which end up in the compiled binary, taking up space. The link-time-optimization (LTO) does not seem smart enough to strip those away. However, this problem goes away when you chose to call only the init function for your wanted FFT size, e.g., arm_rfft_init_1024_q15().

You can see that when you e.g. choose a bigger microcontroller to allow compilation (e.g. a 128k chip) and then use PlatformIO Home → Inspect → Inspect Memory, side by side; left, calling arm_rfft_init_q15, right, calling arm_rfft_init_1024_q15().

A basic test program that computes the 1024-bin FFT over 2048 q15_t sample takes up

Checking size .pio\build\bluepill_f103c8_128k\firmware.elf
Advanced Memory Usage is available via "PlatformIO Home > Project Inspect"
RAM:   [=         ]  10.9% (used 2232 bytes from 20480 bytes)
Flash: [====      ]  36.4% (used 47656 bytes from 131072 bytes)

for me.

https://github.com/maxgerhardt/pio-cmsis-dsp-1-16-2-test

maxgerhardt · February 16, 2025, 12:35pm

As a note, no matter the settings, CMSIS-DSP with a Q15 format FFT will always use these coefficient tables

    extern const q15_t realCoefAQ15[8192];
    extern const q15_t realCoefBQ15[8192];

These are 32 kilobytes big (16K elements, each 2 bytes big). So, with only 64K of Flash memory, half of it is already used up by just these tables.

You might find some other libraries which take different approaches, reducing memory cost by increasing computation cost:

jaishankar · February 17, 2025, 5:02pm

You are correct, using the CMSIS-DSP Library for STM32F103 64KB is real bad idea. It consume lot of memory space.

Currently, I am using the stm32cube framework, But I am unable to use other libraries which you suggested only work with Arduino framework.

Is there any other option to do FFT with stm32cube framework?

maxgerhardt · February 17, 2025, 8:45pm

Fixed-point 16-bit FFT codes like https://gist.github.com/Tomwi/3842231 don’t depend on any framework, only standard compiler headers like stdint.h. This one also features a much, much smaller table for onle a sine wave. Maybe it suits better?

jaishankar · February 22, 2025, 3:46pm

I tried using the library, got some output. But I am not sure way of library handling is correct or not. Can you please review this code?

Updated Code:

Serial Output