Bit of a weird possible compiler issue with AVR-G++

normandunbar · January 6, 2021, 2:51pm

Well, this was (and still is) seriously weird. I’m suspecting a bug in the version of avr-g++ installed by the latest updates to PlatformIo – but that’s not a PlatformIO problem as such.

Has anyone seen anything as weird as the following?

TL;DR

One line of code in an ISR:

m += MILLIS_INC;

didn’t get compiled into the binary. After changing the line to something equivalent, but hard coded:

m++;

it did work. Changing back to the original now works!

Gory Details

Version of avr-g++:

norman@Hubble $ ./avr-g++ --version 

avr-g++ (AVR_8_bit_GNU_Toolchain_3.6.2_1759) 5.4.0

Version of PlatformIO:

norman@Hubble $ pio system info

PlatformIO Core             5.0.4
Python                      3.6.9-final.0
System Type                 linux_x86_64
Platform                    Linux-5.4.0-58-generic-x86_64-with-glibc2.25
File System Encoding        utf-8
Locale Encoding             UTF-8
PlatformIO Core Directory   /home/norman/.platformio
PlatformIO Core Executable  /home/norman/.local/bin/platformio
Python Executable           /usr/bin/python3
Global Libraries            0
Development Platforms       4
Tools & Toolchains          19

Background

I’m writing a small library to be used when I’m not using the Arduino framework, and writing plain AVR C++. I’ve got a need for the millis() (and micros()) function, so I’ve come up with a class (AVR_millis) which handles the Timer/counter 0 Overflow Interrupt in a very similar manner to how the Arduino does it. The difference is that mine is encapsulated in a class.

The interrupt handling code is as follows:

void AVR_millis::interrupt() {
    // Grab the current values.
    uint32_t m = timer0_millis;
    uint8_t f = timer0_fract;

    // Increment millis counter by the number of
    // milliseconds per overflow period.
    m += MILLIS_INC;

    // Accumulate the fractions of a millisecond
    // per overflow period.
    f += FRACT_INC;

    // Do we have a "leap" milliseconds yet? If so,
    // add it in and adjust the fraction.
    if (f >= FRACT_MAX) {
        f -= FRACT_MAX;
        m += 1;
    } 

    // Copy the new values back to the class variables.
    timer0_fract = f;
    timer0_millis = m;

    // How many times have we overflowed since "time began"?
    timer0_overflow_count++;
}

So, I wrote a small test to blink an LED (how original) using the new class and nothing happened. Yes, I did enable global interrupts – which is unusual for me I admit.

Grabbing millis(), delaying 20 milliseconds and grabbing millis() again should have resulted in a difference of around 20, they were the same value. WTH?

Maybe I’ve not set the rehisters up correctly to enable the interrupt. I added another LED and turned it on in the ISR. I now know that the ISR is definitely executing thanks to the LED. So why was nothing else working properly in the ISR?

I ran avr-objdump -S to get a dissassembly, the ISR generated the following code. Comments are mine. Can you see what’s missing?

Disassembly of section .text:

00000000 <__vectors>:
RESET:
   0:	0c 94 35 00 	jmp	0x6a	; 0x6a <__ctors_end>
...
TIMER0_OVF
  40:	0c 94 54 00 	jmp	0xa8	; 0xa8 <__vector_16>
...

000000a8 <__vector_16>:
ISR(TIMER0_OVF_vect)
  ISR preamble, saving lots of registers.
  a8:	1f 92       	push	r1
  ...

  m = timer0_millis

  c2:	40 91 04 01 	lds	r20, 0x0104	; 0x800104 <_edata+0x4>
  c6:	50 91 05 01 	lds	r21, 0x0105	; 0x800105 <_edata+0x5>
  ca:	60 91 06 01 	lds	r22, 0x0106	; 0x800106 <_edata+0x6>
  ce:	70 91 07 01 	lds	r23, 0x0107	; 0x800107 <_edata+0x7>

  f = timer0_fract

  d2:	80 91 08 01 	lds	r24, 0x0108	; 0x800108 <_edata+0x8>

  if (f >= FRACT_MAX)

  d6:	8d 37       	cpi	r24, 0x7D	; 125
  d8:	28 f0       	brcs	.+10     	; 0xe4 <__vector_16+0x3c>

  f -= FRACT_MAX

  da:	8d 57       	subi	r24, 0x7D	; 125 FRACT_MAX

  m++  (// Adding 1 by subtracting -1 weird!)

  dc:	4f 5f       	subi	r20, 0xFF	; 255
  de:	5f 4f       	sbci	r21, 0xFF	; 255
  e0:	6f 4f       	sbci	r22, 0xFF	; 255
  e2:	7f 4f       	sbci	r23, 0xFF	; 255

  timer0_fract = f

  e4:	80 93 08 01 	sts	0x0108, r24	; 0x800108 <_edata+0x8>

  timer0_millis = m

  e8:	40 93 04 01 	sts	0x0104, r20	; 0x800104 <_edata+0x4>
  ec:	50 93 05 01 	sts	0x0105, r21	; 0x800105 <_edata+0x5>
  f0:	60 93 06 01 	sts	0x0106, r22	; 0x800106 <_edata+0x6>
  f4:	70 93 07 01 	sts	0x0107, r23	; 0x800107 <_edata+0x7>

  timer0_overflow_count++

  f8:	80 91 00 01 	lds	r24, 0x0100	; 0x800100 <_edata>
  fc:	90 91 01 01 	lds	r25, 0x0101	; 0x800101 <_edata+0x1>
 100:	a0 91 02 01 	lds	r26, 0x0102	; 0x800102 <_edata+0x2>
 104:	b0 91 03 01 	lds	r27, 0x0103	; 0x800103 <_edata+0x3>
 108:	01 96       	adiw	r24, 0x01	; 1
 10a:	a1 1d       	adc	r26, r1
 10c:	b1 1d       	adc	r27, r1
 10e:	80 93 00 01 	sts	0x0100, r24	; 0x800100 <_edata>
 112:	90 93 01 01 	sts	0x0101, r25	; 0x800101 <_edata+0x1>
 116:	a0 93 02 01 	sts	0x0102, r26	; 0x800102 <_edata+0x2>
 11a:	b0 93 03 01 	sts	0x0103, r27	; 0x800103 <_edata+0x3>


  ISR postamble, restoring lots of registers.
  ...
 134:	1f 90       	pop	r1
 136:	18 95       	reti

That’s correct, this line of source code is not compiled into the ISR:

m += MILLIS_INC;

I know, because I worked it out, that MILLIS_INC is the value 1. Much random messing about with volatile and register made absolutely no difference. That one line remained missing from the binary.

Given that the increment is 1, I changed the above line to:

m++;

A quick rebuild and upload, and it worked – yippee!

So, everything is working fine, however, I need to un-hardcode the increment because the increment is required to be calculated against F_CPU. Let’s try again.

I changed it back and recompiled. After an upload I was expecting to see a blank LED, but no, it was indeed flashing away happily. Huh?

So, in summary:

Compiling with m += MILLIS_INC didn’t work;
Compiling with m++ did work;
Compiling again with m += MILLIS_INC now works!

This is worrying as it has been a long time since I have seen this type of behaviour. That was COBOL and some code I wrote worked fine if I added a DISPLAY statement, and failed with the statement removed – it had to be left in as even the compiler writers couldn’t find a problem. (ICL mainframe.)

The other worry is that my code almost never works first time, I’m now in a situation of wondering if the problem is my code, or another gem like this one. Still, I’m getting lots of experience looking through what the compiler outputs for my source!

So, that’s my tale of woe and is how I spent yesterday afternoon and early evening, debugging this one.

May you never suffer from anything like this, ever!

Cheers,
Norm.