If volatile does the synchronization, why we need the InterLockedXXX

Volatile does not do any synchronization, you are mistaken.

Actually, as of VC8, the compiler generates memory barrier instructions =
(on those architectures that need them) when accessing volatile =
variables. See

the part that talks about acquire and release semantics. This is =
MS-specific and non-portable.

So, volatile works in following cases (I can't think of more):
1. hardware that has no CPU cache and code relies on some peripheral
equipment to change main memory contents
2. concurrent access on a multi-CPU systems with no per-CPU cache
3. concurrent access on a single-CPU system

4. Multi-CPU system that features strong cache coherence - as is the =
case with all x86 CPUs. Systems with weak cache coherence (the kind =
where one CPU can write a value to memory but another can observe a =
stale old value from the cache indefinitely, the kind that provides and =
requires memory barrier instructions) are actually not all that =
