John Chamberlain
Developer Diary
 Developer Diary · You Heard It Here First · Thursday 26 February 2004
Principles of Programming: Input Validation
One of the simplest principles of programming is to validate input yet this chore is the single biggest failing of all computer programs. Most of the virus vulnerabilities that exist as well as most bugs are caused by programs receiving unexpected input. I can understand why most programmers do not validate internally generated arguments to function calls but it is incomprehensible to me that so many programmers do not validate input coming from outside the program yet it seems every program out there fails in this regard.

Some people have made a pastime out of crashing well-known programs this way. I remember back around 1996 when NT was new there was a guy who wrote a program that could systematically call every function in its API. He would just feed the API as many possible inputs as he could and then log all the crashes and blue screens. I looked at his log (which was being sent regularly to Microsoft). There were literally thousands of crash combinations listed. Nowadays MS has wised up a little so there are fewer problems, but nevertheless new buffer overruns and other serious input validation failures come up every month even in major operating systems like Windows.

The principle of validated input is actually composed of four seperate principles:

Assume that input may be missing
    It may null, or blank in the case of a string

Assume that input may be out of range
    If it is a number it could the maximum positive value or
    the maximum negative value or the smallest possible postive or
    negative value. If it is a double or float it could be a wierd
    value like NaN or infinity or negative infinity (check the IEEE
    spec for floats/doubles to find out about these wierd values).
    If it is a string it could megabytes long.

Assume that input may be malformed
    Strings with formatting characters are common problem here.
    I put a quotation mark (") in a string field in ChessBase (the
    leading chess database program) and sure enough it truncated
    the rest of the field. This is probably one of a hundred such
    mistakes in that program which is supposedly the best product
    in the market for this category of software
. Hah!

Provide for a way to reject input
    Obviously if you validate input there must be a way to reject
    invalid input in a controlled manner. To you game programmers out there:
    a CTD is not a controlled manner.

These are the most basic of programming principles. Taking the time to follow them will improve the quality of your programs to an extent greater than any other single measure you can do.

return to John Chamberlain's home · diary index
Developer Diary · about · info@johnchamberlain.com · bio · Revised 26 February 2004 · Pure Content