Re: bad alloc

From:

yatremblay@bel1lin202.(none) (Yannick Tremblay)

Newsgroups:

comp.lang.c++

Date:

Tue, 13 Sep 2011 17:06:09 +0000 (UTC)

Message-ID:

<j4o2i1$nhe$1@dont-email.me>

In article <56e11c1e-de9e-4e8e-8d71-a9a501e4ce17@b20g2000vbz.googlegroups.com>,
Adam Skutt <askutt@gmail.com> wrote:

On Sep 7, 6:20am, yatremblay@bel1lin202.(none) (Yannick Tremblay)
wrote:

In article <1cdb8d42-faa0-4001-91f3-fbb58309f...@d25g2000yqh.googlegroups.com>,
Adam Skutt <ask...@gmail.com> wrote:

On Sep 3, 7:20pm, James Kanze <james.ka...@gmail.com> wrote:

That's the correct response to a programming error or a system
wide problem. For a request which is part of a DOS attack, it
seems to be playing into the attacker's hand, and for a simply
unreasonable client request, it does seem unnecessary to abort
all other client connections.

One, it's not a given that it aborts other client connections, after
all. There could be a higher level mechanism that provides the
illusion of a persistent connection even after failover. Second, it
may be unnecessary, but just because it's unnecessary it doesn't
follow that:
1) The value of trying to handle OOM, instead of terminating, exceeds
it cost.
2) OOM can be handled in a robust fashion.

This arguments can go both ways which you seem to refuse to accept: It
doesn't follow that:

1) The cost of trying to handle *some* OOM error, instead of
terminating, exceeds the value.

Then provide a reliable mechanism to distinguish OOM errors. Thus
far, you have been unable to do so. Otherwise, a reliable mechanism
must treat them all and treat them as the worse case situation.

Sorry for the long discussion below but short simplified and not fully
explicit answers have been previously met with dismissal based on
generalities:

Thus far, I have never attempted to provide a mechanism to
automatically distinguish OOM errors with no other information
whatsoever. Thus far I have posted a small code sample that catch()
following a new that was known to potentially be large. You simply
questionned that you didn't know what "large" was hence the example
was invalid. My answer is that I know it is potentially large because
it was designed that way.

Someone has posted experiences of observing an application recovering
from OOM errors. I have also done the same and repetitively tested it
by purposefully triggering OOM errors (yes feeding purposefully
desgined inputs to a real application in such a way that the application
eventually uses all of the available memory on a system and making
sure it still recovers).

The key is design. Design your application so that you *know*
where are the safe points of failure. You *know* how to cancel a job
safely. You *know* where the application will attempt to allocate a
large amount of memory and design this area in such a way that
recovery is possible *if* the failure is due to the requested
allocation being too large *and* much larger than what is normal.

You are the designer. You should be able to design the code in such a
way that you can ensure that "large" *potentially* recoverable
allocation happens at a known location in the code.

You keep saying "how can you distinguish recoverable vs
non-recoverable allocation?" The answer is design. You design your
application in such a way that you *plan* where the recoverable
allocations will happen.

So here is an compilable example that recovers safely from bad_alloc

----------------------------------------------
#include <iostream>
#include <stdexcept>

size_t const multiplier = 1000000;

bool doIt(size_t size)
{
  int * p = 0;
  try
  {
    p = new int[multiplier * size];
  }
  catch(std::bad_alloc &e)
  {
     // NOTE: I purposefully use iostream here.
     // I realise that this may allocate memory underneath
     // and is not a guarantee nothrow operation but here
     // it highlights that memory is avaiable.
     // In practice, you would probably not do it and just
     // try to log in a *safe* way and
     //start unrolling the stack
     std::cout << "Error: " << e.what() << std::endl;
     // allocate a bit anyway.
     int *z = new int[10];
     std::cout << "Wow, new[] still works" << std::endl;
     delete z;
     return false;
  }
  int * q = new int[5];

  delete[] q;
  delete[] p;
  std::cout << "Did it " << size << std::endl;
  return true;

}
int main()
{
  size_t size;

  for(;;)
  {
    std::cout << "Please enter allocation size: " << std::endl;
    std::cin >> size;

    if(!doIt(size))
    {
      std::cout << "Error happened for " << size << std::endl;
    }
    else
    {
      std::cout << "Job done for " << size << std::endl;
    }
  }
}
-------------------------------------------

This is a simplistic example but the principles are usable in a larger
application and even in a multithreaded application and it would not
matter if the try/catch was 5 function up the stack or directly around
the new.

Note that if the std::bad_alloc happens anywhere outside the
purposefully *designed* *potentially recoverable* area (i.e. when
allocating for q, z or within iostream even including the one in the
catch), then the program will terminate. This is also as-designed.

So the result:

The program can recover from OOM errors if they occur at a
particular location in the code where the designer planned to do
large allocations.

The program behave as you argued for when the OOM error occurs
elsewhere and terminate

The program can process all inputs while only being limited by the
actual current resource limits on the host (not some artificial
limits).

The program can *potentially* recover from OOM error if they are due
to input complexity. If recovery is succesfull, it can continue
processing new, less complex inputs safely. If recovery is not
successful, it will simply terminate.

The program will terminate on OOM error that happen elsewhere are are
*really* unexpected and the designer could not know how to handle.

IMO, the value of implementing this purposefully localised and
targetted OOM error recovery exceeds its cost.

2) That no OOM errors whatsoever can be handled in a robust fashion.

I have never at any point said that or anything close to that. All
I've said is that it's rarely, if ever, worth it, and that it's not
nearly as easy to do robustly as most people here seem to believe.

IMNSHO, this is not as difficult as you suggest if you carefully
design the system for this purpose.

Yannick