How to profile the cycles and timing of a inference in tflite-micro

I am trying to get information on the number of cycles and the time taken to run a inference in a teensy board with M7 cortex running at 600 Mhz.

The result from serial monitor :
Total cycles used: 410
Prediction Time: 411 us
Output Tensor:
0.08 0.25

Looks weird , shouldnt it be 410/600000000 =0.6667 microseconds ?

Both of these user micros(). Your “Prediction Time” directly, the “Total Cycles used” as returned by the difference between two calls to tflite::GetCurrentTimeTicks() indirectly.

So your code is not measuring clock cycles at all.

You can measure clock cycles of the Cortex-M7 with the ARM DWT (debug, watchpoints, traces) facility. An example for that is seen here.

#define CPU_RESET_CYCLECOUNTER    do { ARM_DEMCR |= ARM_DEMCR_TRCENA;          \
                                       ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA; \
                                       ARM_DWT_CYCCNT = 0; } while(0)
volatile int cycles;

void some_function_that_does_work() {
  CPU_RESET_CYCLECOUNTER; // reset to 0
  do_heavy_work();
  cycles = ARM_DWT_CYCCNT; // capture cycle count
  // print number of cycles etc
}
1 Like

Ah I see ! Is there a universal cycle counter that can be used for all chips ?

This code works for any arduino platform chips but now the ARM specific code might not work with esp32 chips right?

The CPU cycle count is not abstracted away by the Arduino API or the TFLite API. They only care about microsecond or millisecond delays / timing. So you will have to create some abstraction layer or case work yourself.

#ifdef __arm__ 
/* some ARM CPU, assume we can use the ARM DWT */
#define CPU_RESET_CYCLECOUNTER    do { ARM_DEMCR |= ARM_DEMCR_TRCENA;          \
                                       ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA; \
                                       ARM_DWT_CYCCNT = 0; } while(0)
#define CPU_GET_CYCLECOUNTER()  ARM_DWT_CYCCNT
#elif defined(ESP32) // or ARDUINO_ARCH_ESP32 or __xtensa__ I guess
/* some ESP32 stuff, use their SDK */
#define CPU_RESET_CYCLECOUNTER    do { esp_cpu_set_cycle_count(0); } while(0)
#define CPU_GET_CYCLECOUNTER()  ((uint32_t) esp_cpu_get_cycle_count())
#else
#error "tf is this architecture"
#endif

// use CPU_RESET_CYCLECOUNTER and CPU_GET_CYCLECOUNTER in code

Similiarly, the ESP32 SDK has esp_cpu_get_cycle_count, providing the same functionality.

1 Like