NAME¶
optimization - Compiler optimization
Problems with reordering code¶
Author:
Jan Waclawek
Programs contain sequences of statements, and a naive compiler would execute
  them exactly in the order as they are written. But an optimizing compiler is
  free to 
reorder the statements - or even parts of them - if the
  resulting 'net effect' is the same. The 'measure' of the 'net effect' is what
  the standard calls 'side effects', and is accomplished exclusively through
  accesses (reads and writes) to variables qualified as volatile. So, as long as
  all volatile reads and writes are to the same addresses and in the same order
  (and writes write the same values), the program is correct, regardless of
  other operations in it. (One important point to note here is, that time
  duration between consecutive volatile accesses is not considered at all.)
Unfortunately, there are also operations which are not covered by volatile
  accesses. An example of this in avr-gcc/avr-libc are the 
cli() and
  
sei() macros defined in < 
avr/interrupt.h>, which convert
  directly to the respective assembler mnemonics through the 
asm()
  statement. These don't constitute a variable access at all, not even volatile,
  so the compiler is free to move them around. Although there is a 'volatile'
  qualifier which can be attached to the 
asm() statement, its effect on
  (re)ordering is not clear from the documentation (and is more likely only to
  prevent complete removal by the optimiser), as it (among other) states:
Note that even a volatile asm instruction can be moved relative to other
  code, including across jump instructions. [...] Similarly, you can't expect a
  sequence of volatile asm instructions to remain perfectly consecutive.
See also:
There is another mechanism which can be used to achieve something similar:
  
memory barriers. This is accomplished through adding a special 'memory'
  clobber to the inline asm statement, and ensures that all variables are
  flushed from registers to memory before the statement, and then re-read after
  the statement. The purpose of memory barriers is slightly different than to
  enforce code ordering: it is supposed to ensure that there are no variables
  'cached' in registers, so that it is safe to change the content of registers
  e.g. when switching context in a multitasking OS (on 'big' processors with
  out-of-order execution they also imply usage of special instructions which
  force the processor into 'in-order' state (this is not the case of AVRs)).
However, memory barrier works well in ensuring that all volatile accesses before
  and after the barrier occur in the given order with respect to the barrier.
  However, it does not ensure the compiler moving non-volatile-related
  statements across the barrier. Peter Dannegger provided a nice example of this
  effect:
#define cli() __asm volatile( "cli" ::: "memory" )
#define sei() __asm volatile( "sei" ::: "memory" )
unsigned int ivar;
void test2( unsigned int val )
{
  val = 65535U / val;
  cli();
  ivar = val;
  sei();
}
compiles with optimisations switched on (-Os) to
00000112 <test2>:
 112:   bc 01           movw    r22, r24
 114:   f8 94           cli
 116:   8f ef           ldi     r24, 0xFF       ; 255
 118:   9f ef           ldi     r25, 0xFF       ; 255
 11a:   0e 94 96 00     call    0x12c   ; 0x12c <__udivmodhi4>
 11e:   70 93 01 02     sts     0x0201, r23
 122:   60 93 00 02     sts     0x0200, r22
 126:   78 94           sei
 128:   08 95           ret
where the potentially slow division is moved across 
cli(), resulting in
  interrupts to be disabled longer than intended. Note, that the volatile access
  occurs in order with respect to 
cli() or 
sei(); so the 'net
  effect' required by the standard is achieved as intended, it is 'only' the
  timing which is off. However, for most of embedded applications, timing is an
  important, sometimes critical factor.
See also:
Unfortunately, at the moment, in avr-gcc (nor in the C standard), there is no
  mechanism to enforce complete match of written and executed code ordering -
  except maybe of switching the optimization completely off (-O0), or writing
  all the critical code in assembly.
To sum it up:
  - •
 
  - memory barriers ensure proper ordering of volatile accesses
 
  - •
 
  - memory barriers don't ensure statements with no volatile accesses to be
      reordered across the barrier