C character classification
This article needs additional citations for verification. (October 2011) |
C standard library (libc) |
---|
General topics |
Miscellaneous headers |
C character classification is an operation provided by a group of functions in the ANSI C Standard Library for the C programming language. These functions are used to test characters for membership in a particular class of characters, such as alphabetic characters, control characters, etc. Both single-byte, and wide characters are supported.[1]
History
Early C-language programmers working on the Unix operating system developed programming idioms for classifying characters into different types. For example, for the ASCII character set, the following expression identifies a letter, when its value is true:
('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z')
As this may be expressed in multiple formulations, it became desirable to introduce short, standardized forms of such tests that were placed in the system-wide header file ctype.h.
Implementation
Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.
For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written as
#define isdigit(x) (TABLE[x] & 1)
Early versions of Linux used a potentially faulty method similar to the first code sample:
#define isdigit(x) ((x) >= '0' && (x) <= '9')
This can cause problems if the variable x has a side effect. For example, if one calls isdigit(x++) or isdigit(run_some_program()). It is not immediately evident that the argument to isdigit is evaluated twice. For this reason, the table-based approach is generally used.
Overview of functions
The functions that operate on single-byte characters are defined in ctype.h header file (cctype in C++). The functions that operate on wide characters are defined in wctype.h header file (cwctype in C++).
The classification is evaluated according to the effective locale.
Byte character |
Wide character |
Description |
---|---|---|
isalnum
|
iswalnum
|
checks whether the operand is alphanumeric |
isalpha
|
iswalpha
|
checks whether the operand is alphabetic |
islower
|
iswlower
|
checks whether the operand is lowercase |
isupper
|
iswupper
|
checks whether the operand is an uppercase |
isdigit
|
iswdigit
|
checks whether the operand is a digit |
isxdigit
|
iswxdigit
|
checks whether the operand is hexadecimal |
iscntrl
|
iswcntrl
|
checks whether the operand is a control character |
isgraph
|
iswgraph
|
checks whether the operand is a graphical character |
isspace
|
iswspace
|
checks whether the operand is space |
isblank
|
iswblank
|
checks whether the operand is a blank space character |
isprint
|
iswprint
|
checks whether the operand is a printable character |
ispunct
|
iswpunct
|
checks whether the operand is punctuation |
tolower
|
towlower
|
converts the operand to lowercase |
toupper
|
towupper
|
converts the operand to uppercase |
— | iswctype
|
checks whether the operand falls into specific class |
— | towctrans
|
converts the operand using a specific mapping |
— | wctype
|
returns a wide character class to be used with iswctype
|
— | wctrans
|
returns a transformation mapping to be used with towctrans
|
References
- ^ ISO/IEC 9899:1999 specification (PDF). p. 193, § 7.4.
External links