Debugging state machine timing discontinuity

Copying from Arduino forum. I am using PIO with the Black Magic Probe for GDB debugging on the Arduino Due.

TLDR: UART communication causes slightly irregular state machine timing, not sure how to approach debugging.

I am updating my company’s Arduino stack software from the Arduino Uno to the Arduino Due. It utilizes a software state machine that gets data from sensors, stores the data, and sends it off the board through UART. The state machine must run at 45 Hertz, or 22.2 ms per cycle.

The problem is this: every 404 cycles (exactly 404 cycles, every time), the logic that maintains timing regularity fails, and for 13 cycles (exactly 13 cycles, every time), the state machine goes directly into the next cycle as soon as the last step is complete.

The program maintains an internal timer by triggering an RC compare interrupt, as follows:

void configureTimerInterrupt(){
  // Enable the clock to the TC0 peripheral
  pmc_enable_periph_clk(ID_TC0);

  /*Configures the timer:
    - First two parameters set it to RC compare waveform mode. This means the timer resets when it reaches the value in RC.
    - The third parameter sets the clock source to MCK/128. MCK is at 84 MHz, so this sets the clock to 656.25 kHz.
  */
  TC_Configure(TC0, 0, TC_CMR_WAVE | TC_CMR_WAVSEL_UP_RC | TC_CMR_TCCLKS_TIMER_CLOCK4);
  //RC sets the value that the counter reaches before triggering the interrupt
  //This sets it to 10kHz
  TC_SetRC(TC0, 0, 64.625); // 

  // Enable the interrupt RC compare interrupt
  TC0->TC_CHANNEL[0].TC_IER = TC_IER_CPCS;
  // Disable all other TC0 interrupts
  TC0->TC_CHANNEL[0].TC_IDR = ~TC_IER_CPCS;
  NVIC_EnableIRQ(TC0_IRQn);
  NVIC_SetPriority(TC0_IRQn, 0);

  TC_Start(TC0, 0);
}

void TC0_Handler(){
  // Clear the status register. This is necessary to prevent the interrupt from being called repeatedly.
  TC_GetStatus(TC0, 0);
  if (micros() - timer >= SAMPLE_PERIOD) {
		newCycle = true;
		timer = micros();
	}
}

Then, at the end of each cycle, to maintain timing regularity, the state machine waits until the internal timer reaches 22.2 ms, as follows:

case idle: {
        volatile uint32_t irq_state = __get_PRIMASK();  
        __disable_irq();                       
        if (micros() - timer > SWEEP_OFFSET) {
            currentState = startSweep;
        }
        __set_PRIMASK(irq_state);  
        break;
    }

Interrupts are disabled during this stage due to a problem that arose before I joined the company, and removing this logic does not resolve the problem.

This glitch is caused by the function that manages UART communication (removing this function removes the glitch). Because comms have to happen simultaneously with another processor function (due to timing constraints), the program copies all data into a contiguous memory block and has the UART’s peripheral DMA controller manage communication:

void sendData(){
        p_memory_block = memory_block;
        memcpy(p_memory_block, sweepSentinel, sizeof(sweepSentinel));
        p_memory_block += sizeof(sweepSentinel);
        memcpy(p_memory_block, p_sweepTimeStamp, sizeof(sweepTimeStamp));
        p_memory_block += sizeof(sweepTimeStamp);
        memcpy(p_memory_block, sweep_buffer, sizeof(sweep_buffer));
        p_memory_block += sizeof(sweep_buffer);
        memcpy(p_memory_block, imuSentinel, sizeof(imuSentinel));
        p_memory_block += sizeof(imuSentinel);
        memcpy(p_memory_block, p_IMUTimeStamp, sizeof(IMUTimeStamp));
        p_memory_block += sizeof(IMUTimeStamp);
        memcpy(p_memory_block, IMUData, sizeof(IMUData));
        p_memory_block += sizeof(IMUData);
        memcpy(p_memory_block, imuSentinelBuf, sizeof(imuSentinelBuf));
        p_memory_block += sizeof(imuSentinelBuf);
        memcpy(p_memory_block, ramBuf + IMU_TIMESTAMP_OFFSET, sizeof(IMUTimeStamp));
        p_memory_block += sizeof(IMUTimeStamp);
        memcpy(p_memory_block, ramBuf + IMU_DATA_OFFSET, sizeof(IMUData));
        p_memory_block += sizeof(IMUData);
        memcpy(p_memory_block, sweepSentinelBuf, sizeof(sweepSentinelBuf));
        p_memory_block += sizeof(sweepSentinelBuf);
        memcpy(p_memory_block, ramBuf + SWEEP_TIMESTAMP_OFFSET, sizeof(sweepTimeStamp));
        p_memory_block += sizeof(sweepTimeStamp);
        memcpy(p_memory_block, ramBuf + SWEEP_DATA_OFFSET, sizeof(sweep_buffer));
        p_memory_block = memory_block;
        pdc.send(memory_block, totalSize);    
}
    template <typename T>
    void send(T* buffer, int size){
        //check if UART is ready for transmit
        if(*p_UART_SR & TXBUFE){
                //set buffer and size
                *(volatile uint32_t*)p_UART_TPR = (uint32_t)buffer;
                *p_UART_TCR = size;
           
        } else{
            //wait until ready
            while(!(*p_UART_SR & TXBUFE)){
                ;
            }
            //same as above
                *(volatile uint32_t*)p_UART_TPR = (uint32_t)buffer;
                *p_UART_TCR = size;
        }
        
    }

I have am not sure where to begin with resolving this. Both our oscilloscope readings and the data we get suggests that absolutely nothing changes in the UART communication from the 403rd to the 404th cycle. But, it is mission critical that the cycles have regular timing.

I attempted to be thorough, but please let me know if I have left anything important out of this post.