.NET, Stopwatch, System.Diagnostics.Stopwatch, Speed Testing

From:

eliza sahoo

Newsgroups:

microsoft.public.vc.mfc

Date:

Fri, 07 May 2010 00:17:15 -0700

Message-ID:

<20105731714eliza.sahoo@gmail.com>

When we jump into performance of our application. Its essential to track which of the code segment is taking more time. To track the time taken by your code or your query you can use System.Diagnostics.Stopwatch instance before execution started of the specific code. And after completion of the code execution calculate the time elapsed. It will shows you the time taken by your application.

http://www.mindfiresolutions.com/Measure-execution-time-of-a-block-of-code-in-NET-880.php

Scott McPhillips [MVP] wrote:

Re: Method execution time
13-Feb-08

<cs1975@gmail.com> wrote in message
news:40f825c8-7b14-42df-aa45-6744ff8dacae@b2g2000hsg.googlegroups.com...

There are no APIs for estimating execution speed. You can do measurements
by executing the code a large number of times (enough to run for several
seconds), with a time check before and after. For the time check you can
use GetTickCount(), or QueryPerformanceCounter(). This approach gives
realistic measurements, but does not truly measure your code. It measures
your code plus whatever other tasks Windows is performing concurrently.

--
Scott McPhillips [VC++ MVP]

Previous Posts In This Thread:

On Wednesday, February 13, 2008 8:40 PM
Scott McPhillips [MVP] wrote:

Re: Method execution time
<cs1975@gmail.com> wrote in message
news:40f825c8-7b14-42df-aa45-6744ff8dacae@b2g2000hsg.googlegroups.com...

There are no APIs for estimating execution speed. You can do measurements
by executing the code a large number of times (enough to run for several
seconds), with a time check before and after. For the time check you can
use GetTickCount(), or QueryPerformanceCounter(). This approach gives
realistic measurements, but does not truly measure your code. It measures
your code plus whatever other tasks Windows is performing concurrently.

--
Scott McPhillips [VC++ MVP]

On Wednesday, February 13, 2008 10:01 PM
cs197 wrote:

Reg: Method execution time
Hi All,

I wanted to calculate the time take for execution of method based on
CPU Instruction execution Speed.

Is it possible to estimate total time to be taken for the following
code, before executing the code?
are there are APIs provided for this purpose?

void Run()
{
    int i = 100;

    while(i)
    {
        printf(" cnt %d\n",i);
        i--;
    }
}

thanks in advance,
Chandrasekhar

On Thursday, February 14, 2008 12:07 PM
Joseph M. Newcomer wrote:

GetTickCount is rarely useful for performance measurement.
GetTickCount is rarely useful for performance measurement.

The GetTickCount time is either 10ms (uniprocessor) or 15ms (multiprocessor). On a 2.8GHz
machine, the clock cycle time is 350ps, 10ms is enough time to execute about 28,000,000
instructions (56,000,000 if the machine can do full superscalar dispatch, which means
every clock cycle, two instructions are dispatched, unless one of them is floating point,
in which case there can be 3 instructions/clock cycle).

The example below is truly a horrendous example, because the time for printf is going to
be dominated by both kernel time (to do the I/O) and blocking of the thread while the I/O
takes place.

To estimate the time taken by examining the instruction stream, you need to know the CPU
clock time, the fact that typically two instructions can be dispatched in a single CPU
clock cycle (3 instructions if one is floating point), model the behavior of the L1 cache,
the L2 cache, and the instruction cache; understand the instruction pipeline, and model
correctly the effects of multiprocessor memory and I/O interference on the bus. If you
have time to do all that, good luck. Note that most of this will apply to one model of
CPU and one model of the motherboard chipset. If a page fault occurs, don't worry about
trying to estimate the time; a page fault will slow your computation down by six orders of
magnitude, so anything you estimate will be completely meaningless.

Remember that if you are preempted by the scheduler, this will introduce delays in the
order of tens to hundreds of milliseconds.

To compute the actual time spent, you need to use QueryPeformanceCounter and
QueryPerformanceFrequency, do several thousand measurements of code significantly shorter
than one timeslice (to try to minimize scheduler effects), discard the outliers (the
scheduler effects), and generally know the mean and standard deviation. Since the
outliers are usually quite far out, they skew the results, and you have to model the
bi-modal or multi-modal statistical distributions to get meaningful data.

I learned to do measurement by being trained by a physicist about what measurement means.
A good course in statistics doesn't hurt (I actually keep my statistics text on the
bookshelf behind the computer). The best you can hope for in doing actual measurements is
a statistical probability of the likelihood that a certain block of code can execute in a
certain amount of time, ?some value based on the standard deviation.
                        joe

On Wed, 13 Feb 2008 20:40:33 -0500, "Scott McPhillips [MVP]" <org-dot-mvps-at-scottmcp>
wrote:

Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

On Friday, February 15, 2008 1:46 AM
Jerry Coffin wrote:

Re: Method execution time
In article <VMadnUwMmOWOAS7anZ2dnUVZ_oGjnZ2d@comcast.com>, "Scott
McPhillips [MVP]" <org-dot-mvps-at-scottmcp> says...

[ ... ]

GetProcessTimes and GetThreadTimes allow you to separate execution time
from wall time for a given process/thread respectively.

--
    Later,
    Jerry.

The universe is a figment of its own imagination.

On Friday, February 15, 2008 8:04 PM
Joseph M. Newcomer wrote:

There appears to be no documentation about the granularity of these times,
There appears to be no documentation about the granularity of these times, just the
resolution. A resolution in 100ns units does not mean that is measuring time to 100ns
units.
                    joe

On Thu, 14 Feb 2008 23:46:56 -0700, Jerry Coffin <jcoffin@taeus.com> wrote:

Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

On Friday, February 15, 2008 11:32 PM
Joseph M. Newcomer wrote:

GetThreadTimes appears to have no better resolution than the system clock,
GetThreadTimes appears to have no better resolution than the system clock, 15ms on a
multiprocessor.

Here's the raw data from my experiment (copy and paste to a fixed-pitch font if your
reader uses variable-pitch fonts, and you'll see it in nice columns)

              ----GetThreadTimes---- ---QueryPerformanceCounter---
         N Total Average Total Average
         1: 0 0 0.000,001,262 0.000,001,262
        10: 0 0 0.000,001,900 0.000,000,190
       100: 0 0 0.000,008,598 0.000,000,086
     1,000: 0 0 0.000,156,468 0.000,000,156
    10,000: 0 0 0.001,546,871 0.000,000,155
    20,000: 0 0 0.001,545,677 0.000,000,077
    30,000: 0 0 0.002,229,537 0.000,000,074
    40,000: 156,250 3.90625 0.003,130,517 0.000,000,078
    50,000: 0 0 0.003,714,743 0.000,000,074
    60,000: 0 0 0.004,474,028 0.000,000,075
    70,000: 0 0 0.005,199,027 0.000,000,074
    80,000: 156,250 1.95313 0.010,827,863 0.000,000,135
    90,000: 156,250 1.73611 0.007,250,312 0.000,000,081
   100,000: 0 0 0.007,441,591 0.000,000,074
   200,000: 156,250 0.78125 0.015,256,485 0.000,000,076
   300,000: 312,500 1.04167 0.022,836,569 0.000,000,076
   400,000: 312,500 0.78125 0.031,100,642 0.000,000,078
   500,000: 312,500 0.625 0.038,261,605 0.000,000,077
   600,000: 468,750 0.78125 0.046,740,833 0.000,000,078
   700,000: 625,000 0.892857 0.055,636,940 0.000,000,079
   800,000: 625,000 0.78125 0.066,111,900 0.000,000,083
   900,000: 781,250 0.868056 0.070,138,831 0.000,000,078
1,000,000: 937,500 0.9375 0.091,186,943 0.000,000,091
10,000,000: 8,437,500 0.84375 0.851,144,656 0.000,000,085

Note that 156,250 100ns units is 15,625,000 ns or 15.625000 ms

Therefore, GetThreadTimes is useless for fine-grain measurements. It essentially has the
same resolution as GetTickCount.
                    joe

On Thu, 14 Feb 2008 23:46:56 -0700, Jerry Coffin <jcoffin@taeus.com> wrote:

Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

On Saturday, February 16, 2008 1:15 AM
natha wrote:

Re: Method execution time
In article <stocr3l9qjohe0gf1mbf151kl1f3rcraqg@4ax.com>,
Joseph M. Newcomer <newcomer@flounder.com> wrote:

   Use a real profiler, e.g. AMD's CodeAnalyst to determine things:
http://developer.amd.com/tools/codeanalystwindows/Pages/default.aspx
Free download; they just want an email address. Doesn't require an AMD
CPU.

   Of course, all profiling is based on what's running on your system,
CPU load, HD load, what webpage(s) you have open, how much your system
decides to swap out to the HD, etc. You're not going to get 100%
repeatable results in **any** circumstance. Windows is not a realtime
operating system. Variability is a fact of life. The question is:
are you prepared to live with that?

Nathan Mates
--
<*> Nathan Mates - personal webpage http://www.visi.com/~nathan/

On Saturday, February 16, 2008 3:52 PM
Jerry Coffin wrote:

Re: Method execution time
In article <tidcr3dd10an5g484v8kgbef8tjhj9bfvt@4ax.com>,
newcomer@flounder.com says...

Quite true. IIRC, these are updated by the thread scheduler, so they're
normally updated when thread scheduling happens.

Mostly I find that measurements separate fairly cleanly into two
categories: The first is small pieces of code that you normally expect
to execute during a single scheduling quanta. For that level of
measurement, QPC is usable, though I prefer to use rdtsc directly. For
this kind of measurement, GetTickCount and GetThreadTimes are utterly
worthless.

The other possibility is dealing with larger pieces of code that will
typically require several scheduling quanta to finish execution. In this
case, you're talking about code that takes at least tens of milliseconds
to execute. Measurements in the nanosecond (or picosecond) range rarely
have much meaning at all for code like this -- you just don't often need
a precision of .001%. At the same time, separating this code's execution
time from the time used by other code on the system becomes quite
important unless you can do better than most at getting your system
_really_ close to idle otherwise.

--
    Later,
    Jerry.

The universe is a figment of its own imagination.

On Saturday, February 16, 2008 4:03 PM
Joseph M. Newcomer wrote:

Thanks for the AMD pointer!
Thanks for the AMD pointer!
joe

Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

On Saturday, February 16, 2008 4:47 PM
Joseph M. Newcomer wrote:

Note that rtdsc by itself does not guarantee any form of correctness, unless
Note that rtdsc by itself does not guarantee any form of correctness, unless you place a
"serializing instruction" ahead of it. The only serializing instruction I know that is
executable at user level is the CPUID instruction. Failure to serialize the query may
mean the rtdsc instruction will be done out-of-order with respect to the instruction
stream you think you might be measuring.

GetThreadTimes will be meaningful only for intervals > 60ms; below that, you will be
subject to massive sampling "gate error" effects (see Nyquist Sampling Theorem, for
example), because the timeslices are nominally 30ms.

One problem with this kind of measurement is that it doesn't tell you very much. For
example, I recently did the PowerPoint Indexer, and no profiling would have told me that
the bulk of the time is in the out-of-process automation server, because when PowerPoint
is running, no time is charged to my process or thread. Blocking I/O calls have a similar
effect. Page faults will show up as wall clock time but not thread time.

I've never found a simple answer to performance measurement, and I did it for 15 years.
                    joe

On Sat, 16 Feb 2008 13:52:51 -0700, Jerry Coffin <jcoffin@taeus.com> wrote:

Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

On Saturday, February 16, 2008 6:55 PM
Jerry Coffin wrote:

Re: Method execution time
In article <e0mer3lmaot8ov3oq0h6g5c88odvc7bodd@4ax.com>,
newcomer@flounder.com says...

Quite true, and you also have to be careful with CPUID -- though most
people don't realize it, at least for some CPUs, it's documented that
the first couple of times you execute CPUID, its execution speed can
vary. Intel's recommendation is to only use it for serialization on at
least your third consecutive execution.

.....or 40ms, depending on the kernel.

Blocking I/O partly shows up charged to your thread as kernel time.
Basically, the time actually spent transferring data is charged to the
thread (mostly as kernel time) but the time during which your thread is
blocked (of course) only shows up as wall time, since some other process
can be using the CPU during that time.

I don't think there's a simple answer to find. There are lots of
questions, some of them only minimally related to each other. Having a
single, simple answer to all those questions would require little short
of a miracle.

Profilers aren't a panacea either. They're useful (_really_ useful), but
have shortcomings of their own. For one thing, most are semi-passive --
they basically just interrupt the program being profiled at regular
intervals, and walk the stack to see what's executing at those times. A
small, fast function can often execute _many_ times between these
interruptions. The ones that are active (i.e. work by injecting
profiling calls into your code) can have other side-effects, such as
making your code larger so caching no longer works quite the same.

I also had one program using multimedia timers a few years ago that
seemed to run in sync with the profiler, so the profiler showed
essentially all the time spent in a couple of specific functions, even
though I knew those functions were fast and not even called very often.
Running most of the same code in a test rig without the multi-media
timers involved showed that the real profile was a LOT different.

Don't get me wrong: when you have a performance problem, a profiler is
almost a reasonable first tool to grab -- but sometimes you need a few
more tools and/or _careful_ interpretation to get all the information
you need.

--
    Later,
    Jerry.

The universe is a figment of its own imagination.

On Sunday, February 17, 2008 1:09 PM
Joseph M. Newcomer wrote:

Some years ago, after I had spent over a month optimizing the storage
Some years ago, after I had spent over a month optimizing the storage allocator, one of
hte product groups in the company came to me. "We've profiled our compiler", they said,
"and all of the time is being spent in your storage allocator!". I looked at the profile
and there was a MASSIVE peak at the address range of the allocator functions.

Having written what was for its time the world's fastest storage allocator (we were
contemplating how to have the compiler generate inline code to do allocation), I didn't
believe the numbers. So I started their compiler under the debugger, reached in and
tweaked a couple bits in the allocator that caused it to keep statistics, and let it run.

The reason the allocator was the "time hog" was that they had called it over 4,000,000
times, so the probability the code was in the allocator when the profiler took its
snapshot was incredibly high. This was due to a design error in their code, that had an
intense loop that did something of the form (expressed in C)

for(i = 0; i < HUGE_NUMBER; i++)
    {
     char * p = malloc(length_computation());
     DoSomething(p);
     Report(p);
     free(p);
    }

So I asked, "why are you calling the allocator each time?" and the answer was "We don't
know how big the string needs to be, so we have to allocate each time". I solved the
problem by writing

char buffer[100];
char * p = buffer;
DWORD len = sizeof(buffer);

for(i = 0; i < HUGE_NUMBER; i++)
    {
     int n = length_computation();
     if(n > len)
        { /* reallocate */
         if(p != buffer)
            free(p);
         p = malloc(n);
         len = n;
        } /* reallocate */
     DoSomething(p);
     Report(p);
    }

if(p != buffer)
    free(p);

The result was that they ended up calling the allocator a very small number of times in
this loop (under 10, if I recall) and only about 100K-200K calls for the whole compilation
process. The storage allocator essentially disappeared from the profile. The compiler
ran measurably faster (in wall clock time as well as CPU cycles).

The choice of 100 was made by having instrumented the code to compute the mean and
standard deviation of all the computed values of length_computation(); by choosing 100 we
got two sigmas of cases essentially free.

I tend to favor instruction-counting for getting fine-grain performance data, and I have
an instruction-counting profiler which, if Microsoft ever fixes the pdb library to be
usable, I will consider finishing and releasing. Instruction-counting is a pretty solid
(albeit slow) method of getting detailed performance reports, and is always surprising.
But instruction-counting won't tell you if you are using the L2 cache well or badly, and
that usage can be an order of magnitude performance change for some algorithms!
                    joe
On Sat, 16 Feb 2008 16:55:41 -0700, Jerry Coffin <jcoffin@taeus.com> wrote:

Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Submitted via EggHeadCafe - Software Developer Portal of Choice
Free Online Courses Available for Eggheadcafe.com Users
http://www.eggheadcafe.com/tutorials/aspnet/5261083e-6e03-4b25-8728-fc3cf6855293/free-online-courses-avail.aspx