Re: Any unusual C++ implementations?
On Jan 4, 5:53 am, "Tom=C3=A1s =C3=93 h=C3=89ilidhe" <t...@lavabit.com> wrot=
e:
Juha Nieminen <nos...@thanks.invalid> wrote in comp.lang.c++:
For instance I might
assume that an unsigned long has no padding and then go on to use it
to manipulate sizeof(unsigned long) bytes at a time.
That sounds like hacker optimization to me, which is in many cases
useless. Exactly how would you "manipulate sizeof(unsigned long) bytes
at a time"? Care to give an example?
I'd love to... but... I can't help feeling I'd be responding to an
accusation rather than simply conversing.
Have you ever seen an optimised version of memcpy?
Not only seen, written.
A fully-portable
implementation might be something like:
void *memcpy(void *const pvdest,void const *const pvsrc,size_t len)
{
char unsigned *dest = static_cast<char unsigned*>(pvdest);
char unsigned const *src = static_cast<char unsigned const*>(pvsrc);=
for ( ; len; --len) *dest++ = *src++;
return pvdest;
}
, whereas an optimised one for a particular platform might be:
void *VoidAddition(void *const in,size_t const x)
{
return reinterpret_cast<char*>(in) + x;
}
void *memcpy(void *const pvdest,void const *const pvsrc,size_t len)
{
size_t const quantity_ints = len / sizeof(int);
len %= sizeof(int);
unsigned *pidest = static_cast<unsigned*>(pvdest);
unsigned const *pisrc = static_cast<unsigned const*>(pvsrc);
for (size_t i = quantity_ints; i; --i) *pidest++ = *pisrc++;
size_t const offset = quantity_ints*sizeof(int);
char unsigned *dest = static_cast<char unsigned*>(VoidAddition
(pvdest,offset));
char unsigned const *src = static_cast<char unsigned const*>
(VoidAddition(pvdest,offset));
for ( ; len; --len) *dest++ = *src++;
return pvdest;
}
That sort of implementation might have been used 20 years ago
(although you'd also have to add logic handling alignment). The
usual solution I've seen, however, is something like:
extern void* memcpy( void*, void const*, size_t ) ;
#define memcpy( d, s, l ) __buildin_memcpy( d, s, l )
with the library version using the straightforward
implementation. A lot of older processors had special
instructions which could be used to great advantage in memcpy.
Alternatively, inline assembler was used in the implementation
(or memcpy was written completely in assembler anyway---which
was what I did).
On modern machines, the straightforward implementation is likely
to be just as fast as anything more complicated, because of the
way pipelining and the various caching work at the hardware
level: the limiting factor will be memory bandwidth, and the
memory interface pipelines will ensure that all writes and reads
are cache line width (e.g.64 bytes).
--
James Kanze (GABI Software) mailto:james.kanze@gmail.com
Conseils en informatique orient=EF=BF=BDe objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=EF=BF=BDmard, 78210 St.-Cyr-l'=EF=BF=BDcole, France, +33 (0)1 30 2=
3 00 34