HAL_UARTEx_ReceiveToIdle_DMA() vs HardwareSerial::read()

Hi :wave:

I’m porting a small FW from STM32Cube to arduino(ststm32,gd32).

In STM32Cube I used HAL_UARTEx_ReceiveToIdle_DMA() configured with a ring buffer, together with idle detection HAL_UARTEx_RxEventCallback().
In addition I configured the preemption priorities in that way that UART-DMA had higher priority than my secondary interrupts like my hardware timers.
For sure, quite optimal because of UART-to-DMA, quick interrupt/callback once data is ready, while other possibly interrupts/callback got a lower priority.

Now in arduino, I switched to HardwareSerial::read().
Work so far, but it’s much slower! That slow, that the test-suite which send some hundreds of serial commands to my FW, outperforms my FW processing.
I already increased SERIAL_RX_BUFFER_SIZE but this only delays the problem till ringbuffer “tail” eat ringbuffer “head” :yum:

My questions are now:
1.) Could it be that framework-arduinoststm32 is that much “slower” than my previous implementation via HAL API?
2.) Is it possible to get some kind of HAL_UARTEx_RxEventCallback() with HardwareSerial instead of polling it via HardwareSerial::read()
3.) Do I have the possibility in arduino to priorize the usart and timer interrupts?
4.) Any other ideas?

As far as I remember, GitHub - stm32duino/Arduino_Core_STM32: STM32 core support for Arduino implements a classical interrupt based → push to ringbuffer approach, no DMA.

Though in standard Arduino fashion, the core’s main() function does run a “serialEventRun” after your loop function has finished

which as you can see per Arduino_Core_STM32/WSerial.cpp at 76887a45b43c9e919db17df9e039d96bd01641d1 · stm32duino/Arduino_Core_STM32 · GitHub calls into these weak functions.

So in your src/main.cpp you can have (depending on which Serial you are using, see what Serial maps to)

void serialEvent1() {
   while(Serial1.available()) {
      char c = (char) Serial1.read(); // or bufferblock-wise instead of char-wise

Note that this is not happening directly in the interrupt, only after your loop() function returns and data is available.

Okay, more worse (if I understand right).
That means arduino itself polls/processes the ringbuffer earliest at loop()'s end.

So if I don’t care about my previously lowly priorized timer callbacks, they might block now reaching end of loop() to get serialEventRun() called.

I’ll rearrange that.

:+1: :pray:

There’s also no shame in tossing the Arduino framework away if it doesn’t fit your applications requirements in terms of performance – performing a surgery on the Arduino core to fit in DMA utilization may be overkill and a pain to maintain.

Well the PCB is only a bunch of buttons, LEDs and Mosfets and no moon-challenger :wink:

After your explanation that serialEventRun() get processed at the end of arduino’s loop() (which I wasn’t aware of before), it got clear that my timer callbacks delayed reaching that point.

In addition it turned out that printf() @ STM32 has wasted another huge amount of CPU cycles, which I simply switched of with a -DDEBUG_UART=NP build flag.
Interestingly the GD didn’t had that problems, but I guessed a.) because of it has a floating point unit and b.) I couldn’t find any similar of the DEBUG_UART stuff within the framework-arduinogd32 sources.

Performance and serial delay is all fine now!