Edge Impulse DSP Block (Arduino Nano 33 BLE): Arduino IDE Build 6X Faster

I’m using Edge Impulse to build a Tiny ML test application to classify audio.

Using the Edge Impulse example for keyword detection, I see a very large performance difference between a PlatformIO build and the same code built using the Arduino IDE.

For the Arduino Nano 33 BLE, the DSP processing block is almost 6X slower when the application is built using PlatformIO versus the Arduino IDE. However, for an ESP32 DevKitC (env:esp-wrover-kit) target, the two builds are comparable in performance. Here is a summary of the DSP times:

PlatformIO
Arduino Nano 33 BLE: DSP 857 ms, inference 8 ms, anomaly 0 ms
ESP32 DevKitC: DSP 298 ms, inference 5 ms, anomaly 0 ms

Arduino IDE
Arduino Nano 33 BLE: DSP 128 ms, inference 6 ms, anomaly 0 ms
ESP32 DevKitC: DSP 285 ms, inference 3 ms, anomaly 0 ms

Here is my platform.ini:

[env:nano33ble]
platform = nordicnrf52
board = nano33ble
framework = arduino
lib_deps =
    SNMArduino
    ArduinoBLE
    BluetoothManager
    PDM
    LED
; extra_scripts = extra_script.py
upload_port = /dev/ttyACM0

[env:esp-wrover-kit]
platform = espressif32
board = esp-wrover-kit
framework = arduino
lib_deps =
    SNMArduino`Preformatted text`
    ArduinoBLE
    BluetoothManager
    PDM
    LED

The application code is the same. It classifies a static frame of audio data. I chose the Arduino library deployment option in Edge Impulse. The resulting source is then added as a library to my build.

Any ideas why the performance is much faster with the Arduino IDE build? Again, it’s just for the Arduino Nano 33 BLE target. The ESP32 builds are roughly the same.

I have done verbose builds on each platform to compare the compiler options and defines but I don’t see anything obvious, but I am new to this domain. :slightly_smiling_face: I can provide the complete compiler command lines for a sample C++ file for each IDE if needed.

I run PlatformIO and the Arduino IDE in an Ubuntu WSL instance.

Thanks in advance for your help!

  1. Verbose build log for the Arduino IDE? (FIle → Preferences → Verbose Build)
  2. Verbose build log for PlatformIO? (CLIpio run -t clean && pio run -e nano33ble > compile.txt)

The log files are large so here are links. If you prefer having them pasted into the response, I can do that.

Full Arduino IDE Build Log
Full PlatformIO Build Log

Here is an example of the command line for kiss_fft.cpp, one of the Edge Impulse DSP library files. In the Arduino log, it also includes the expansion of defines.txt and cxxflags.txt.
Example DSP File
Arduino Command Line
PlatformIO Command Line

Thanks.

The PlatformIO invocation is weird. It has both -mfpu=softfp (which would use DSP / FPU instructions to accelerate compuations) as well as -mfpu=soft(which uses no hardware accelleration, just CPU computations).

arm-none-eabi-gcc […] -Os -Wall -Wextra -Wno-missing-field-initializers -Wno-unused-parameter -c -fdata-sections -ffunction-sections -fmessage-length=0 -fno-exceptions -fomit-frame-pointer -funsigned-char -mcpu=cortex-m4 -mfloat-abi=softfp -mfpu=fpv4-sp-d16 -mthumb […] -nostdlib -mfloat-abi=soft -mfpu=fpv4-sp-d16

This is not good. Can you upload the exact PlatformIO for this?

I noticed that, too. Earlier, I did some quick experiments with build_unflags to remove the -mfloat-abi=soft (leaving only -mfloat-abi=softfp) but there was no difference in my trial. Maybe not implemented right…

Would that impact only the nano33ble and not the ESP32?

What do you mean by the complete PlatformIO? The entire project? If so, should I ZIP it and send a link?

I really appreciate your help.

I think the xtensa-gcc does not use -mfloat-abi, only ARM targets do.

Yes, the entire PlatformIO project, zipped. The project should be cleaned before (removal of .pio folder).

SNM Project

Intersting, even though -mfloat-abi=soft is present next to softfp, it still compiles in FPU instructions. There does not seem to be a noticable difference in the assembly code or number of FPU instructions if soft is removed (commenting out the call to configure_fpu_flags() in C:\Users\<user>\.platformio\platforms\ststm32\builder\frameworks\arduino\mbed-core\arduino-core-mbed.py).

Can you also upload the Arduino project alongside the exact used libraries?

Here is the ZIP of the Arduino IDE project. It uses the same ArduinoSNM library as the PlatformIO ZIP sent earlier. The ArduinoSNM library is the inferencing engine built by Edge Impulse for my TinyML project.

In the Arduino IDE build, the DSP block runs in about 130ms compared to roughly 800ms in the PlatformIO build.

Arduino IDE SNM Project

Circling back now…apologies for the delay. Away from home today.

Thank you for your help.

The Edge Impulse site has a section on Slow DSP Operations for the Arduino library.

It looks like both the Arduino IDE and PlatformIO use the same version of mbed (6.17.0) per mbed_version.h.

Adding the defines to my source per the article did not make a difference in the DSP processing times. PlatformIO is still significantly slower (~850ms vs 130ms).

#define EIDSP_USE_CMSIS_DSP 1
#define EIDSP_LOAD_CMSIS_DSP_SOURCES 1

Opened the same issue on the Edge Impulse forum.
DSP Performance: PlatformIO Build 6X Slower vs. Arduino IDE Build - Report bugs - Edge Impulse

I followed Jan’s suggestion to add additional defines to enable CMSIS-DSP and CMSIS-NN but I did not see any difference.

As Max identified, the issue does seem to be that both

-mfloat-abi=soft
-mfloat-abi=softfp

are set in the build which is a conflict. I revisited the build_unflags experiment and found it successful with only;

build_unflags = -mfloat-abi=soft

Now the DSP processing step is comparable between the PlatformIO and Arduino CLI builds.