The problem
In the last few months, I have been struggling with this problem: Should input parameters be checked? Hundreds of warnings were thrown by static analysis tool in a large base code, most of them regarding pointers which could be null or variables which could index an array outside its bounds. In order to certificate the software, those warnings must be removed. Quickly, the whole team started to code checks in every problematic variable. We were falling into a trap and the code quality started going downhill. Nobody had a magic answer (because there isn’t) and there was no time to do a deep analysis.
Suddenly, a lot of questions arose: Should the input data be validated? Should it be the caller? Should be the function called? Without noticing, these questions lead to another highly related topic: How much redundancy and robustness should software have? The very first question can not be answered if the last one has not been discussed, which is, indeed, the real question.
Fortunately, there are people who have faced this problem before. I strongly encourage to read the references. Here there are some few key points:
Searching an answer
The problem and the solution have to be present in the first stages of the software design and every member of the team should be aware of it. What is more, it is not even a technical question, it is a business one. It is different whether your software can fail and reboot or if it is not an option. It is different whether your software has a hard time restrictions or if it does not matter if it takes 5 ms more. It is different if you are writing an API or safety-critical piece of code. In other words: It is a requirements thing.
If a clear decision is not taken and the decision is left to the programmers as they go, errors will be checked on the same parameter three or four levels into the call stack and the solution implemented through the application won’t be coherent and will become crappy.
So then, how to check input data? So far, we can talk about two approaches.
Design by Contract (DBC)
A concept developed by Bertrand Meyer as a design technique “that focuses on documenting and agreeing to the rights and responsibilities of software modules to ensure program correctness”. It is a contract by which a routine may have some expectations:
- Preconditions: The routine’s requirement. According to [HT00], “a routine should never get called when its preconditions would be violated” and “it is the caller’s responsibility to pass good data. It is its responsibility.” If the caller can’t assure pass good data, has to decide what to do, how to handle it. Again, it is a requirements thing, a design decision.
- Postconditions: The state of the world when the routine is done.
Another benefit of this technique is to detect problems as soon as you can, being able to crash the program earlier, which is better than continuing with corrupted data. “When your code discovers that something that was supposed to be impossible just happened, your program is no longer viable”
But not all languages support DBC in the code. In languages such as C, writing the contract as a comment gets you still a very real benefit. Moreover, it is possible to emulate this using assertion. We will see later.
A lot of authors follow the idea that run-time checks add unneeded complexity and that it is worthy to replace it with asserts and thorough integration testing. One of the classics’ uses of assertions is to check assumptions like that an input parameter’s value falls within its expected range (or an output parameter’s value does) or that a pointer is non-null.
Nevertheless, a function that tests a condition and crashes if the condition is not fulfilled may be OK for debugging, but in most cases is not flexible enough in production code. Or maybe yes, it is the only sensible option1. Again: it is a design-architecture-requirements thing. One workaround could be to use assertion in the development phase, while debugging, and turn it off in release code. The problem is that in order to certificate the software, with that workaround we would be in the same place. Although the code quality will be better, a static analysis tool would be throwing the same warnings.
Defensive Programming
Some may say that DBC and defensive programming are kind of the opposite, but [McC04] includes assertions to check preconditions inside its chapter of Defensive Programming, as a way to handle garbage-in:
“The main idea is that if a routine is passed bad data, it won’t be hurt, even if the bad data is another routine’s fault”. You take responsibility even when it is other’s fault. @rsrajan1 gives us another good definition: “The idea is not to write code that never fails. That is a utopian dream. The idea is to make the code fail beautifully in case of any unexpected issue”.
The main way to handle garbage-in is checking the values of all routine input parameters and once you’ve detected an invalid parameter, deciding how to handle bad inputs.
Depending on the situation, you might choose any of a dozen different approaches. [McC04] in Section 8.3 Error-Handling Techniques, describes in detail some of them (return neutral value, substitute the next piece of valid data, return the same answer as the previous time…). How to decide? Again, McConnel insists: “Deciding on a general approach to bad parameters is an architectural or high-level design decision and should be addressed at one of those levels.” The way in which errors are handled affects the software’s ability to meet requirements related to the correctness, robustness, and other nonfunctional attributes2.
The conclusion.
To summarize, the most important thing to have into account is the need for high-level decisions about what to do with bad input. Once this is clear, take advantage of design techniques such as BDC of features of the language to approach the solution. In the next post, a solution based on C and assertions would be proposed.
References
[HT00] Andrew Hunt and David Thomas. 2000. The pragmatic programmer: from journeyman to master. Addison-Wesley Longman Publishing Co., Inc., USA.r
[McC04] Steve McConnell. 2004. Code Complete, Second Edition. Microsoft Press, USA
[Mey97] Bertrand Meyer. Object-Oriented Software Construction. Prentice Hall, Englewood Cliffs, NJ, second edition, 1997.
- Here we reach to a controversial one. Should assertions be left turned on in production? [HT00] says yes: “If it can’t happen, use assertions to ensure that it won’t” [↩]
- Correctness means never returning an inaccurate result; Robustness means always trying to do something that will allow the software to keep operating, even if that leads to results that are inaccurate. [↩]