Re: Preventing Denial of Service Attack In IPC Serialization

From:

Le Chaud Lapin <jaibuduvin@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Sat, 16 Jun 2007 07:59:55 CST

Message-ID:

<1181963766.301371.272210@u2g2000hsc.googlegroups.com>

On Jun 15, 8:40 pm, I V <ivle...@gmail.com> wrote:

But the _objects being serialized_ don't limit how much data is read, the
objects doing the reading impose those limits. So you'ld do something like:

bool read_connection_data(socket s, std::vector<std::string>& data)
{
        reader r(s, max_size) // where the author chooses max_size
                                // to be appropriate for the context

        try
        {
                r >> data;
                return true;
        }
        catch( reader::size_error e)
        {
                std:: cerr << "The client sent too much data";
        }

        return false;

}

And the basic functions of the reader class (those that read in the
built-in types) check that the size hasn't been exceeded, and throw
reader::size_error if it has.

What's the problem with this approach?

First note that the statement "r >> data;" is highly vague. Here you
have, a "reader" object that is importing into a non-trivial data
structure. Where is the code that defines how this occurs?

But in any case, I think the spirit of what you are suggesting is the
same as several others have suggested, that you can some how supply a
limit to the socket from which data is extracted, and if an object
attempts to extract from that socket data whose size exceeds the
specified limit, and exception is thrown. If this is what you are
suggesting, the problem will still persist.

The key here is "std::vector<std::string>& data". Look closely at
it. It is a vector of strings. In general, there are two things you
do not know:

1. The number of strings in the vector (its count).
2. The length of each individual string in the vector.

Now surely, serialization code that effects "r >> data" will probably
first ascertain how many strings are are to be put into the vector
(total), and also the size of each string as each is read in.

Both of these operations, building the vector and building each
string, presents opportunity for DoS.

Now matter what intermediate trickery is used to incrementally build
the vector, in the end, it is not inconceivable that the source of the
serialization at the other end of the pipe will want the vector to
have 1,000,000. Nor is it inconceivable that the source of the
serialization at the other end of the pipe will want the average
length of each string to be 1,000 characters. So this presents an
opportunity for 1GB of memory to be consumed.

Now, to answer a question that probably popped in your brain before
you finished reading the last paragraph:

"Le Chaud Lapin is not understanding what I am saying. Certainly he
must see that the 'control' of how much data being read in is not part
of the serialization code of the object, that it is in fact the
limiting operation is being provided externally."

The problem is that, in general, the writer of a class that is to be
placed in a library must write the serialization code for each class
of that library, then turn his head, long before the user of that
library employs the library in an actually application.

So the statement: "std::vector<std::string>& data" could very well be
only one statement that helps to implement a larger, serialization
sequence, perhaps as part of a big object.

Now you might say, "That's fine, the big object will know how big to
set the limit..."

While this is true, there, a remaining problem becomes evident when
one realizes that there are situations where 10,000,000 bytes would be
just as reasonable a limit as 1,000 bytes. This would occur when one
is about to set the serialization limit, for example of
std::list<string>, not as part of serializing a larger-scoped object,
but "in an outer naked scope" as you have presented above.

At that point, the only thing question that remains is:

"Is it possible for the malicious attacker to induce consumption of
10,000,000 bytes 'legitimately' in rapid-succession?" Without showing
proof, the answer is 'Yes', unfortunately.

However, I will say this: Your solution is the best so far. I had
thought about this before as a solution. Though I deplore
arbitrariness in software engineering, it is not so unreasonable that
a class might know in advance what a "reasonable maximum size of
itself" should be:

struct Employee
{
   unsigned short int age;
   string first_name;
   string last_name;
   float annual_salary;
   // etc.
} ;

The serialization code for this would know reasonable limits on the
size of an employee. Assuming that floats are 32-bits, short is 16-
bits, and each of the strings can be up to 50 bytes each on average,
128 bytes would be a reasonable limit.

So we would tell the socket to throw exception as you suggested when
128 bytes is exceeded for serializing the employee. It might be
intuitively apparent that, because objects can be arbitrarily nested,
a stack-based model implemented inside the Socket is a appropriate,
where the elements on the stack are limits that the outer-scope of
serialization places as each nested object is to be serialized.

Socket & operator >> (Socket &s, Employee &e)
{
   s.push_limit (128); // e had better be 128 bytes or less
   s >> first_name;
   s >> last_name;
   s >> age;
   s >> annual_salary;
   s.pop_limit();
   return s;
}

If 128 bytes is exceeded, as you pointed out, an exception will be
thrown. If it is not, the some number of bytes, N, will have been
read. Then, when pop_limit() is executed, N is subtracted from the
stack element that was top-most before the 128 was pushed. That way,
if an outer-class had set a limit, and Employee is an inner-class,
then the limit will be honored for the outer class too. If the stack
is empty, then there is no limit set.

This last sentence is the culprit with this scheme. The moment one is
faced with the challenge of trying to set a reasonable limit, and
"reasonable" is a significant fraction of the available RAM on a
typical computer, it falls apart.

Nevertheless, it is good to see someone else thought of this model. If
it were not for this last issue, it would probably have been the
solution I would have chosen (my alternative is no solution).

But again, arbitrariness is typically a serious no-no in software
engineering, so it would make me very nervous to do this.

-Le Chaud Lapin-

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]