Re: Concurrent code performance
On Oct 9, 2:42 pm, Saeed Amrollahi <amrollahi.sa...@gmail.com> wrote:
Hi All C++ developers
I learned (from theory and practice) the most important benefit of
qualified multi-threaded programming is better performance.
Here it is a (not well-written) concurrent (two worker threads)
and sequential version
of Summation of [0, 2000000[ interval.
Of course my concurrent code is not good. It's for exposition only.
// concurrent code
#include <thread>
#include <iostream>
using namespace std;
long long s1 = 0, s2 = 0;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
while (first < last)
res += first++;
}
int main()
{
long long r1 = 0, r2 = 0;
thread t1(Sum, 0, 1000000, std::ref(r1));
thread t2(Sum, 1000000, 2000000, std::ref(r2));
t1.join(); t2.join();
cout << r1 + r2 << '\n';
return 0;
}
// sequential code
#include <iostream>
using namespace std;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
while (first < last)
res += first++;
}
int main()
{
long long r = 0;
Sum(0, 2000000, r);
cout << r<< '\n';
return 0;
}
I compiled and ran two codes using the following commands:
$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
concurrent_sum
$ g++ -std=c++0x -pedantic -o2 sequential_sum.c++ -o sequential_sum
$ time ./concurrent_sum
1999999999000
real 0m0.014s
user 0m0.016s
sys 0m0.004s
$ time ./sequential_sum
1999999999000
real 0m0.021s
user 0m0.020s
sys 0m0.000s
Of course the time command differs in several
execution, but honestly, I didn't see so much difference between two
execution.
In addition the generated code for sequential_sum is about 6 KB, but
the size
of concurrent_code is about 45 KB.
Is there any problem in my measurement? Why the concurrent object code
is 7+ times bigger than sequential version? How do you explain it?
I don't see a problem. On a single-core machine, I am surprised you
saw even that much of an improvement. I guess, just like Pavel, that
hyper-threading is working. Your code should give better performance
on a multi-core machines. Given what it does, it should hardly touch
memory, if at all, so memory should not be a bottleneck.
Your executable is 7 times bigger because you linked in (part of)
pthreads in it.
Goran.