Semipredicate problem

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In computer programming, a semipredicate problem occurs when a subroutine intended to return a useful value can fail, but the signalling of failure uses an otherwise valid return value.[1] The problem is that the caller of the subroutine cannot tell what the result means in this case.

Example[edit]

The division operation yields a real number, but fails when the divisor is zero. If we were to write a function that performs division, we might choose to return 0 on this invalid input. However, if the dividend is 0, the result is 0 too. Also, even if a nonzero dividend is required, dividing a small number by a very large one can yield 0 as well, due to rounding errors. This means there is no number we can return to uniquely signal attempted division by zero, since all real numbers are in the range of division.

Practical implications[edit]

Early programmers dealt with potentially exceptional cases, as in the case of division, using a convention that required the calling routine to check the validity of the inputs before calling the division function. This was undesirable for two reasons. First, it greatly encumbers all code that performs division (a very common operation). Second, it violates the important principle of encapsulation in programming, whereby the handling of one issue should be contained in one place in the code. If we imagine a more complicated computation than division, the caller may not even know that invalid input is being handed to the target function; indeed, figuring out that the input is invalid may be as costly as performing the entire computation.

Solutions[edit]

The semipredicate problem is not universal among functions that can fail. If the function's range does not cover the entire data type defined for the function, a value known to be impossible under normal computation can be used. For example, consider the function index, which takes a string and a substring, and returns the integer index of the substring in the main string. If the search fails, the function may be programmed to return -32,768 (or any other negative value), since this can never signify a successful result.

This solution has its problems, though; it overloads the natural meaning of a function with an arbitrary convention. First, the programmer must remember specific failure values for many functions, which of course cannot be identical if the functions have different domains. Second, a different implementation of the same function may choose to use a different failure value, resulting in possible bugs when programmers move from environment to environment. Third, if the failing function wishes to communicate useful information about why it had failed, one failure value is insufficient. Fourth, a signed integer halves the possible index range to be able to store the sign bit.

Multivalued return[edit]

Many languages allow, through one mechanism or another, a function to return multiple values. If this is available, the function can be redesigned to return a boolean value signalling success or failure, in addition to its primary return value. If multiple error modes are possible, the function may instead of a boolean return an enumeration of error codes.

Various techniques for returning multiple values include:

  • Returning a tuple of values.
  • Languages with pointers, and those that allow pass by reference, or an equivalent mechanism, can allow for multivalued return by designating some parameters as "out" arguments. In this case, the function could just return the error value, with a mutable value being passed to the function intended to store the actual result upon success.
  • Some logic programming languages don't even have specific "return" values, returning all data through "out" arguments.

Global exit-status variable[edit]

Similar to an "out" argument, a global variable can store what error occurred (or simply whether an error occurred).

For instance, if an error occurs, and is signalled (generally as above, by an illegal value like -1) the Unix errno variable is set to indicate which value occurred. Using a global has its usual drawbacks: thread safety becomes a concern, and if only one error global is used, its type must be wide enough to contain all interesting information about all possible errors in the system.

Exceptions[edit]

Exceptions are one widely used scheme for solving this problem (as well as others). An error condition is not considered a return value of the function at all; normal control flow is disrupted and explicit handling of the error takes place automatically. Exceptions also clear the clutter associated with checking return values after each call. They are an example of out-of-band signalling.

Expanding the type[edit]

Manually created hybrid types[edit]

In C, a common approach, when possible, is to use a data type deliberately wider than strictly needed by the function. For example, the standard function getchar() is defined with return type int and returns an unsigned char on success or the value EOF (implementation defined but outside the range [0, 255]) on the end of the input or a read error.

Nullable reference types[edit]

In languages with pointers or references, one solution is to return a pointer to a value, rather than the value itself. This return pointer can then be set to null to indicate an error. This approach may cause some overhead, and it is typically suited to functions that return a pointer anyway.

Implicitly hybrid types[edit]

In scripting languages, such as PHP and Lisp, the usual approach is to return "false", "none" or "null" when the function call fails. This works by returning a different type to the normal return type (thus expanding the type). It is a dynamically-typed equivalent to returning a null pointer.

For example, a numeric function normally returns a number (int or float), and while zero might be a valid response; false is not. Similarly, a function that normally returns a string might sometimes return the empty string as a valid response, but return false on failure. This process of type-juggling necessitates care in testing the return value: e.g. in PHP, use === [i.e. equal and of same type] rather than just == [i.e. equal, after automatic type-conversion]. It works only when the original function is not meant to return a boolean value, and still requires that information about the error be conveyed via other means.

Explicitly hybrid types[edit]

In Haskell and other functional programming languages, it is common to use a data type that is just as big as it needs to be to express any possible result. For example, we could write a division function that returned the type Maybe Real, and a getchar function returning Either String Char. The first is an option type, which has only one failure value, Nothing. The second case is a tagged union: a result is either some string with a descriptive error message, or a successfully read character. Haskell's type inference system helps ensure that callers deal with possible errors. Since the error conditions become explicit in the function type, looking at its signature immediately tells the programmer how to treat errors. In addition, tagged unions and option types form monads when endowed with appropriate functions: this may be used to keep the code tidy by automatically propagating unhandled error conditions.

See also[edit]

References[edit]

  1. ^ Norvig, Peter (1992), "The General Problem Solver", Paradigms of artificial intelligence programming: case studies in common LISP (3 ed.), Morgan Kaufmann, p. 946, ISBN 1-55860-191-0