Re: SSO
On Jul 29, 11:03 am, Juha Nieminen <nos...@thanks.invalid> wrote:
tni wrote:
What advantages does a string class with SSO have over one
which always allocates the string dynamically and uses CoW?
Copying is much faster for small strings (at least on some
important platforms like x86).
I don't see how that would be the case. With a regular CoW
string a copy entails assigning one pointer and updating one
counter at the end of that pointer. Copying a string which
uses SSO entails a check whether the string is small or
dynamically allocated, and then (assuming it is small) a copy
of a member array (which is probably several times larger than
a pointer) and the size member variable.
I could even believe that with optimal cache behavior and such
you could even achieve similar speeds, but much faster? I
don't see that happening.
Fine grained locality plays a very large role. Most modern
processors access memory in more or less large chunks; when you
read the size (to test whether the local buffer is used or not),
the hardware also loads the local buffer at the same time, and
it's already in the pipeline when you start to copy it.
But IIRC, the real speed gain occurs in multi-threaded
environments. The fact that no part implementation is shared
means that no synchronization is required. And on modern
processors, synchronization is expensive. (On the other hand...
in the applications I've worked on, most of the strings are
large enough that the small string optimization wouldn't apply,
and dynamic allocation also requires synchronization, so
COW---even using pthread_mutex for synchronization---is still an
optimization if I'm copying strings a lot.)
(And of course if the string was not small, but dynamically
allocated, the copying will be basically identical except that
there will be the additional check before copying.)
For COW, you generally need some form of reference counting
with atomic counters.
The STL, at least the implementation in gcc, has never
bothered with atomicity...
The implementation of std::string in g++ is not thread safe.
That's going to cost a few dozen clock cycles for a counter
access on x86 and causes the bus to be locked (which is a
very bad thing if you have lots of CPUs/cores). So, in a
highly parallel environment, you probably don't want COW.
OTOH always deep-copying can be detrimental for speed in many
situations, such as when sorting an array of large strings.
I'd bet that even using locking would be faster.
I've wondered about that too. In most of my work, I'd guess
that the shortest strings are about 20 characters (more than the
SSO handles in most of the implementations I've seen), and the
average is probably double that. On the other hand, I tend to
deal with objects more complex than strings---in a lot of cases,
the strings themselves are in objects which use COW (or aren't
copiable). So I really don't know what the best solution is.
For some numbers (populating a vector with 8 char long
strings via push_back(), running on a Core i7 on Windows):
Small string optimization (as I posted, 8-char buffer): 780ms
Small string optimization (as I posted, 32-char buffer): 1550ms
Dynamic allocation: 15'000ms
Thread-safe COW String (Qt QByteArray): 3400ms
std::basic_string (Dinkumware, uses small string optimization): 3900ms
Naturally allocation will be significantly slower than simply
assigning to a member array. But only using 8 char long
strings for such a test feels a bit artificial.
It depends on what you're doing. In my case, where I'm working
now, it is artificial. But I've seen applications where most of
the strings were either numerical values (six to eight digits),
or country or currency codes (two or three characters).
This is definitely a case where one size doesn't fit all.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34