Secure input and output handling
||It has been suggested that this article be merged into Data validation. (Discuss) Proposed since December 2013.|
- 1 Input handling
- 1.1 Input validation
- 1.2 Encode (escape) input
- 1.3 Other solutions
- 2 Output handling
- 3 References
Input handling is how an application, server or other computing system handles the input supplied from users, clients, or a computer network.
Validating (or sanitizing) user input is to ensure that input is safe prior to use.
The most secure way[according to whom?] to do this is to Terminate on suspicious input and use a Whitelist strategy to determine if execution should be terminated or not. This behavior is however not always preferred[by whom?] from a usability point of view.
Whitelist or Blacklist?
In computer security, there are often known good inputs — input the developer is completely certain is safe. There are also known bad characters; input the developer is certain is unsafe (can cause Code injection etc.). Based on this, two different approaches to how input should be managed exists:
- Whitelist (known goods). A Whitelist is a list of "known good inputs". A Whitelist is basically a list which says "A, B and C is good (and everything else is bad)".
- Blacklist (known bads). A Blacklist is a list of "known bad inputs". A Blacklist is basically a list which says "A, B and C is bad (and everything else is good)".
Security professionals[who?] tend to prefer Whitelists, because Blacklists may accidentally treat bad input as safe. However, in some cases a whitelist solution may not be easily implemented.
Terminate/stop/abort on input problems
This is a very safe strategy. If unexpected characters occur in input, abort execution. But if implemented poorly, it can lead to a denial-of-service attack in which the attacker floods the system with unexpected input, forcing the system to expend scarce processing and communication resources on rejecting it.
Filtering input is used as a less orthodox security principle than Terminate/stop/abort on input problems.
- The benefit of the filter approach is that to end-users, the security mechanism often behaves in a less intrusive manner. For example, if "
*" is illegal, then "
I ***LOVE*** you" will just become "
I LOVE you", which is experienced as a minor but acceptable oddity.
- The downside is that the filter approach is a bit difficult to get right — in practice many applications have the filter applied at one place in the code, but the programmer accidentally uses the unfiltered input at another place.
Filter input: Automatic taint checking
Some programming languages[specify] have built-in support for taint checking. These languages throw compile time or run time exceptions whenever a variable derived from user input is used in a risky way, e.g. to execute a shell command.
Filter input: Whitelist filters (Filter in known goods)
- An input filter which expects all characters to be of charset
A-Za-zis used to protect a UNIX application from shell injection.
- Attacker supplies input
; ls -l /to attempt shell injection.
- Filter is applied to input.
; - /are thrown away by filter because they are not in whitelist.
lslare kept by filter because they are in whitelist.
- Exploit attempt fails because only safe input remains.
Filter input: Blacklist filters (Filter out known bads)
A strategy that is usually insufficient is to filter out known bads. If the characters in the set [:;.-/] are known to be bad, but ; ls -l / is received, the original input is replaced with ls l (;-/ are thrown away). This strategy has several problems:
- It does not protect against unknown threats. There may be other "bad" inputs that the developer did not consider.
- It does not protect against future threats. Inputs that are safe at present may obtain a dangerous interpretation if the underlying language changes. For example, a UNIX command line security filter designed to stop attacks against C shell will be insecure if the software is moved to an environment using bash.
Encode (escape) input
To keep malicious inputs contained, any inputs written to the database need to be encoded.
SQL encoding: ' OR 1=1 --' is encoded to \ \'\ OR\ 1\=1\ \-\-'
There may be other solutions, depending on which programming language is used and what type of code injection is being prevented. E.g., the htmLawed PHP script can be used to remove cross-site scripting code.
In particular, to prevent SQL injection, parameterized queries (also known as prepared statements and bind variables) are excellent for improving security while also improving code clarity and performance.
Output handling is how an application, server or system handles the output (e.g. generating HTML, printing, logging, ...). It is important to keep in mind output often contains input supplied from users, clients, network, databases etc.
Secure output handling is primarily associated with preventing Cross-site Scripting (XSS) vulnerabilities, but could also prove to be important in similar areas (e.g. if generating Microsoft Office documents with some API, output management could potentially be required to prevent macro-injections).
Encode (escape) output
"Encoding" processes content that is about to be output so that any potentially dangerous characters are made safe. Characters from a typical known safe charset for the particular destination medium are often left as they are. A simple encoding might leave alone alphanumerics a–z, A–Z and 0–9. Any other characters could be possibly interpreted in an unexpected manner, and are therefore replaced with the appropriate "encoded" representation.
HTML encoding: <script> is encoded to <script>