Developer Diary: POP Separate Data from State

Developer Diary

Developer Diary · You Heard It Here First · Thursday 19 February 2004

Principles of Programming: Separate Data from State

Programming is such a new art that as yet its principles are largely unwritten. Naturally there is a body of academic theory but the simple guides of real practice have gone often overlooked. For my own use I have written some of these principles. One of my favorites of these is the principle: separate data from state.

It may seem obvious to create separate variables to hold the state of program (ie, is it failing or succeeding) and the program's data, but in practice it is all too easy to confound the two. The culprit is the limitation of procedural programming languages like C and Java to accept only a single return value from a procedure or method. For example, imagine you would like a procedure to return the index number of the first matching name in a list of names. In fact there are two kinds of information that may be returned: whether the name was found at all (the state), and if it is found the index (the data). Ideally we would like to call the procedure like this:

     zFound, iIndex = searchListForIndex( "xyz" );

and have zFound be a boolean indicating whether "xyz" was found or not. If it was found then iIndex would contain the index of "xyz" in the list. Unfortunately common computer languages do not support multiple return values. So the typical response of the programmer is to put both the state and data in the same variable. For example, the convention might be that if searchList returns -1 then it means that "xyz" was not found, but if it returns any other number, n, then it is presumed that "xyz" is found at index n.

In a similar way many other kinds of double-duty variables find their way into programs. This is a common source of bugs and misunderstandings which we would like to avoid. One approach to avoiding this problem sometimes found in Java is to throw an exception. For example, the primitive wrapper classes throw an exception if asked to parse a string and it is found that the string is not interpretable as a number. This solution is worse than the problem, however, because it is onerous to handle the exception in the normal flow of the program. In other words you are changing a valid state into an error when it does not have to be.

Another method is to return an object instead of a primitive value. The object can then contain both a state variable and a value variable as well as state constants if the states need to be defined. You could also cheat a little bit by making the convention that if the returned object is null, then failure occurred. For example, we could rewrite the earlier search example like this:

     integerIndex = searchListForIndex( "xyz" );

where integerIndex is an Integer object (not a primitive integer value). If the object is null we know that "xyz" was not found and otherwise it is easy to access the primitive value by the accessor integerIndex.intValue(). In my mind this is the optimal solution given the limitations of current computer languages.

In blanket searching my own code I find that this is generally what I do when it is difficult to separate data from state, ie, use an object with null indicating a failure state. Nevertheless I did find a handful of places where I had indeed used -1 as a conventional return value to indicate failure which shows to me just how seductive this kind of mistake is. Now I will go and rewrite those instances.

This is the value of a principle: it provides a guide for you to re-examine your own practice and perfect it.

return to John Chamberlain's home · diary index

Developer Diary · about · info@johnchamberlain.com · bio · Revised 19 February 2004 · Pure Content