Re: Bitwise OR on large memory regions
oswald.kluge@web.de wrote:
Dear Reader,
what efficient ways are there to OR two large memory regions?
I'm especially interested in applying this operation to two large
character arrays. I know memset and memcopy lets you initialise or copy
this arrays very fast, but are there similar functions for bitwise OR?
If speed is paramount you could consider using sse-like instructions
for your target platform.
Otherwise, you could use standard c++ algorithms. Something like this:
#include <iostream>
#include <algorithm>
#include <vector>
#include <numeric>
#include <boost/lambda/lambda.hpp>
int main(int ac, char**)
{
std::vector<unsigned long> v1(ac * 1000000);
std::vector<unsigned long> v2(ac * 1000000);
generate(v1.begin(), v1.end(), rand);
generate(v2.begin(), v2.end(), rand);
std::vector<unsigned long> r(v1.size());
transform(v1.begin(), v1.end(), v2.begin(), r.begin(),
boost::lambda::_1 | boost::lambda::_2);
unsigned long x = accumulate(r.begin(), r.end(), 0ul);
std::cout << x << '\n';
}
The machine code generated from the code above may be not that bad.
With gcc 4.1.1 the transform loop looks like this (-O3
-fomit-frame-pointer -march=pentium-m)
..L38:
mov %eax, DWORD PTR [%ebx]
add %ebx, 4
or %eax, DWORD PTR [%ecx]
add %ecx, 4
mov DWORD PTR [%edx], %eax
add %edx, 4
cmp %edi, %ecx
jne .L38
Some unix platforms (linux among them) have type long of the same size
as void*. For some popular x86-64 processors it may imply that the
processor is able to deal with 64-bit integers efficiently. On opteron
(-fomit-frame-pointer -march=opteron) the same loop is:
..L38:
mov %rax, QWORD PTR [%rsi]
or %rax, QWORD PTR [%rcx]
add %rcx, 8
add %rsi, 8
mov QWORD PTR [%rdx], %rax
add %rdx, 8
cmp %r9, %rcx
jne .L38
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]