Re: micro-benchmarking
On Sat, 2 May 2009, Arved Sandstrom wrote:
Lew wrote:
Giovanni Azua wrote:
[ SNIP ]
A good idea (I think brought up by Tom) would be to measure each iteration
separately and then discard outliers by e.g. [sic] discarding those that
exceed the abs [sic] diff [sic] between the mean and the stddev [sic].
That technique doesn't seem statistically valid.
In the first place, you'd have to use the outliers to calculate the mean
and "stddev".
I've seen techniques before that discard the endmost data points, but never
ones that required statistical analysis to decide what to include or reject
for the statistical analysis.
Doing this is acceptable if it's a step in identifying outliers for
examination, rather than being an automatic elimination step. What
Giovanni suggested might not be the statistical procedure of choice
however; something like Grubb's test would be common enough if your
clean data is normally distributed.
I would have said Chauvenet's criterion rather than Grubb's test - but
only because i'm more familiar with the former! Grubb's test looks more
rigorous to me.
A less aggressive alternative would just be to describe the data by a
median and an interquartile range, thus effectively ignoring all big or
small values. You're not claiming they're 'wrong' in any sense, just not
focusing on them.
What we're really trying to do (data QA is a a very well-established
discipline in geophysics & nuclear physics, for example) is _detect_ outliers
to see if those data points represent _contamination_. About 15 years back I
helped on the programming side with the production of climatological atlases
for bodies of water off the eastern coast of Canada. One of the first data
quality control steps was actually to apply a bandpass filter - something
along the lines of, water temperature in February in this region is simply
not going to be less than T1 nor higher than T2 (*). There may actually be
several ranges, applied iteratively.
Point being that data QA/QC attempts to determine why a data point should be
rejected. You don't just do it because it's 5 SD's out; you try to find out
if it's bad data. In his case that we're examining I'd sure like to see a
reason for why any outliers should be identified as contamination.
In this case, though, i can't see any way to do that. If a run took 150 ms
instead of 100, all you know is that it took 50 ms longer. There's no way
to retrospectively ask 'did GC happen?', 'did the OS preempt us to do some
housekeeping?' etc.
tom
--
The sun just came out, I can't believe it