Re: Inexplicable segfault

From:

"kanze" <kanze@gabi-soft.fr>

Newsgroups:

comp.lang.c++.moderated

Date:

27 Sep 2006 09:03:01 -0400

Message-ID:

<1159346716.475822.36690@b28g2000cwb.googlegroups.com>

Francis Glassborow wrote:

In article <pan.2006.09.25.10.03.09.998000@cui.unige.ch>, Jonas Latt
<Jonas.Latt@cui.unige.ch> writes

[...]

class Individual {
public:
Individual(int size) { build(size); }
int size() const { return genome_.size(); }
int& operator[](int i) { return genome_[i]; }
double fitness() const { return compute_fitness(); }
void randomize() { build(size()); }
bool operator<( const Individual& rhs ) const {
   return this->fitness() < rhs.fitness();
}
private:
void build(int size) {
   genome_.resize(size);
   for (int i=0; i<size; ++i) {
     genome_[i] = rand() % 2;
   }
}
double compute_fitness() const {
   int sum = std::count( genome_.begin(), genome_.end(), 1 );
   double n = static_cast<double>( genome_.size() );
   return (n - sum) / n;
}
private:
std::vector<int> genome_;
};

<Snipped>

A fascinating example of problems with repeatedly recomputing
values. std::sort requires that the values be ordered, but
the repeated recomputation of 'fitness' is resulting in tiny
differences. This is a problem with using floating point,
particularly when there is no intermediate storage of the
results that forces trimming of excess bits.

It's more subtle than that; the repeated calculation always
gives the same results; the iterated part uses integral
arithmetic, which is precise, and the expression in the return
value is fixed, so that the same rounding errors will occur each
time.

As Ian McCulloch pointed out, the problem is due to the fact
that the calculation is done and the return value returned in
registers with more precision than a double. And that one of
the return values is getting stored to memory, and being
truncated to the precision of a double, and the other isn't. So
he ends up with the operator< returning true for two equal
values in certain cases. (Most likely, he ends up with a case
where a < b and b < a both return true.)

Note that this problem depends on the hardware and the compiler,
and probably on the various optimization options passed to the
compiler; his code may work in some configurations, and not in
others.

Note too that as far as I know, Java suffers from the same
problem, so the original poster can tell his collegue that it
was just by chance the code worked with Java as well.
Presumably, the problem will depend on optimization in Java as
well. So the code may work the first time it is executed, and
then fail with exactly the same data set later, after Hot Spot
has gotten its hands on it. (Of course, in Java, he'd get a
bounds check exception, rather a core dump. Which means that if
he ran it in the handling of an AWT event, the error wouldn't be
apparent; he would just get a wrong results. Core dumps aren't
always a bad thing.)

Replace the operator < overload with:

bool operator<( const Individual& rhs ) const {
   double c1= compute_fitness();
   double c2= rhs.compute_fitness();
   return c1 < c2;
}

And now all will work (I think)

I think that the standard requires it to work in this case. On
the other hand, I'm pretty sure that not all implementations
respect the standard; IIRC, I've encountered cases where g++
doesn't truncate precision even in such cases. (I think that
g++ has an option, however, to make it work.) It's a serious
problem, without an easy solution.

even better change the class definition/implementation to:

class Individual {
public:
  Individual(int size) { build(size); fitness_ = compute_fitness(); }
  int size() const { return genome_.size(); }
  int& operator[](int i) { return genome_[i]; }
  double fitness() const { return fitness_; }
  void randomize() { build(size()); }
  bool operator<( const Individual& rhs ) const {
   return fitness_ < rhs.fitness_;
  }
private:
  <snipped>
  std::vector<int> genome_;
  double fitness_;
};

so that fitness is computed and stored just once per instance.

What this really changes is that it makes it a lot more
difficult---practically impossible, in fact---for a clever
optimizer to cache any of the values in a register.

Remember that using temporary floating point values is
dangerous as they will not have extra bits trimmed and what
those bits are may depend on what was in those registers when
they were last used.

No. The extra bits depend on the computaion, and are quite
predictable. The problem occurs because some hardware (Intel,
in particular, but I don't think it is the only one) only has
one precision, long double, and the compiler does all
computations in this. It's not a question of random extra bits,
it's a question of comparing a value which has been correctly
rounded to double with the same value before rounding.

If he wants to be really, really sure that the problem will not
occur, he should compare the (integer) sum, before dividing by
the size. (This, of course, only works if the size is the same
in all cases.)

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]