Sometimes you have to use common sense:

Modern memory models don't respect common sense very much.

thread A:
finished = false;
  /* do work */

thread B:
/* do work */
finished = true;

If finished is not volatile and compiler optimizations are
enabled thread A may loop forever.

And making finished volatile doesn't change anything in this
regard. At least not with Sun CC or g++ under Solaris, g++
under Linux on PC, and VC++8.0 under Windows on a 64 bit PC.

The behaviour of optimizing compilers in the real world can
make volatile necessary to get correct behaviour in
multi-threaded designs.

As has been pointed out: volatile is never sufficient, and when
you use whatever is sufficient, volatile ceases to be necessary.

You don't always have to use a memory barriers or a mutexes
when performing an atomic read of some state shared by more
than one thread.

Only if you want it to work.

