Camel case

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Ännu mindre (talk | contribs) at 12:40, 23 September 2006 (→‎Spread to mainstream usage). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A road sign with CamelCase

CamelCase, camel case or medial capitals is the practice of writing compound words or phrases where the words are joined without spaces, and each word is capitalized within the compound. The name comes from the uppercase "bumps" in the middle of the compound word, suggestive of the humps of a camel.

This practice is known by a large variety of names, including camelBack, BiCapitalization, InterCaps, MixedCase, etc., and many of its users do not ascribe a name to it at all.

CamelCase is a standard identifier naming convention for several programming languages, and has become fashionable in marketing for names of products and companies. Outside these contexts, however, CamelCase is rarely used in formal written English, and most style guides recommend against it.

Variations and synonyms

There are two common varieties of CamelCase, distinguished by their handling of the initial letter of what would otherwise be the first of separate words. Where the first letter is capitalized is commonly called UpperCamelCase, PascalCase (references: WikiWikiWeb, Brad Abrams), or BiCapitalized. Where the first letter is left in lowercase is commonly called lowerCamelCase. This variant has also been occasionally called dromedaryCase or camelCase. For clarity, this article will use the terms UpperCamelCase and lowerCamelCase, respectively.

   camelCaseLooksLikeThis
   lowerCamelCaseLooksTheSame
   UpperCamelCaseLooksLikeThis

The term StudlyCaps is similar — but not necessarily identical — to CamelCase. It is sometimes used in reference to CamelCase but can also refer to random mixed capitalization (as in "MiXeD CaPitALiZaTioN") as popularly used in online culture.

Other synonyms include:

  • camelBack
  • BumpyCaps
  • BumpyCase
  • camelBase Case
  • CamelCaps
  • CamelHumpedWord
  • CapWords in Python (reference)
  • mixedCase (for lowerCamelCase) in Python (reference)
  • ClCl (Capital-lower Capital-lower) and sometimes ClC
  • HumpBackNotation
  • InterCaps
  • InternalCapitalization
  • NerdCaps
  • WordMixing
  • WordsStrungTogether or WordsRunTogether

The name CamelCase is not related to the "Camel book" (Programming Perl), which uses all-lowercase identifiers with underscores in its sample code.

Coding standards

Internal capitalization is recommended or enforced by many computer systems, and mandated by the coding standards of many programming languages — such as Mesa, the systems programming language of the Xerox Alto (late 1970s), or the modern language Java. It is also the official convention for file names in Java and of the Amiga personal computers.

Coding standards often distinguish between UpperCamelCase and lowerCamelCase, typically specifying which variety should be used for specific kinds of entities: variables, record fields, methods, procedures, types, etc..

For instance, the Java coding style dictates that UpperCamelCase be used for classes, and lowerCamelCase be used for instances and members . The original Hungarian notation for programming specifies that a lowercase abbreviation for the "usage type" (not data type) should be prefixed on all variable names, with the remainder of the name in UpperCamelCase; as such it is a form of lowerCamelCase.

NIEM standards require that XML Data Elements use UpperCamelCase and XML Attributes use lowerCamelCase.

Some wikis, especially the earlier ones, use CamelCase to mark words that should be automatically linked. In many modern wikis (such as Wikipedia and other MediaWiki-based wikis) this convention was abandoned in favor of explicit link markup, e.g. with [[…]].

History

Early uses

CamelCase has been sporadically used since ancient times, for example as a traditional spelling style for certain surnames, such as in Scottish names like MacLean ("son of Gilian") and Hiberno-Norman names like FitzGerald (originally, "son of Gerald"), or the French duPont or DuPont ("of/from the bridge"). In the mid-20th century, it was used occasionally for product trademarks, such as CinemaScope and VistaVision, rival widescreen movie formats introduced in the 1950s. CamelCase also occurred sometimes in acronyms like DoD, or technical codes and formulas like HeLa (1983). CamelCase has also been used to transliterate acronyms from alphabets such as Cyrillic where two letters may be required to represent a single character of the original alphabet. An example of this is the DShK (Cyrillic: ДШК).

CamelCase has been used in languages other than English for a variety of purposes, such as the transcription of Tibetan names like rLobsang, or names of Bantu languages like kiSwahili or isiZulu. In French, abbreviations such as OuLiPo (1960) were favored for a time as alternatives to acronyms.

However, the use of CamelCase became widespread only in the 1970s or 1980s, when it was adopted as a standard or alternative naming convention for multi-word identifiers in several programming languages. There are various possible origins, and it may have developed independently from multiple sources. Some of these theories are described below.

Background: multi-word identifiers

In programs of any significant size, there is a need for descriptive (hence multi-word) identifiers, like "previous balance" or "end of file". However, spaces are not typically permitted inside identifiers, as they are treated as delimiters between tokens. Writing the words together as in "endoffile" is not satisfactory because the names often become unreadable. Therefore, the programming language COBOL allowed a hyphen ("-") to be used between words of compound identifiers, as in "END-OF-FILE". The common punched card character sets of the time had no lower-case letters and no special character that would be adequate as a word separator in identifiers. However, by the late 1960s the ASCII character set standard had been established, allowing the designers of the C language to adopt the underscore character "_" as a word joiner. Underscore-separated compounds like "end_of_file" are still prevalent in C programs and libraries.

CamelCase is by no means universal. Users of several modern programming languages, notably those in the Lisp and Forth families, nearly always use hyphens. Among the reasons sometimes given are that doing so does not require shifting on most keyboards, and that the words are more readable when they are separated.

The "Lazy Programmer" origin

One explanation of the origins of CamelCase in computing claims that the style originated within the culture of C programmers and hackers, who found it more convenient than the standard underscore-based style.

On most keyboards, the underscore key is inconveniently placed. Additionally, in some fonts the underscore character can be confused with a minus sign; it can be overlooked because it falls below the string of characters, or it can be lost entirely when displayed or printed underlined, or when printed on a dot-matrix printer with a defective pin or misaligned ribbon. Moreover, compiler limits on identifier length and the small computer displays available in the 1970s worked together to encourage brevity. Many programmers thus chose to use CamelCase for it yielded legible compound names with fewer keystrokes and fewer characters. Hungarian Notation is an extension of this style.

The "Paranoid Programmer" origin

Programmers working in the tradition of linkage oriented languages, especially the Unix C tradition (and later C++), had many concerns to address. Early Unix systems (and early personal computers in general) provided linkage models where external identifiers were distinguished to a short length, often as few as the initial eight characters. Many clashes were possible within the external identifier linkage space which potentially commingles code generated by various high level compilers, runtime libraries required by each of these compilers, compiler generated helper functions, and program startup code, of which some fraction was inevitably compiled from system assembly language. Within this collision domain the underscore character quickly became entrenched as the primary mechanism for differentiating the external linkage space. It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support. Furthermore, when the C/C++ compiler needed to introduce names into external linkage as part of the translation process, these names were often distinguished with some combination of multiple leading or trailing underscores.

This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.

A second, independent collision domain was the C preprocessor. The C language preprocessor is unusual in that it does not respect any language-defined scoping model or reserved namespace, not even C language keywords. This problem was generally addressed by writing macros in macro case which mostly mixes upper case letters with dividing underscores:

#define OPEN_FILE_LIMIT  (15)  

Once again the implementation must often supply hidden macros, and once again dressing up these "hidden behind the scenes" identifiers with multiple leading or trailing underscores became accepted practice. As this practice became pervasive on both levels, the underscore gained a cognitive association with system level programming, hidden technicalities, and the messy entrails of language support.

The C language linkage model further complicated matters by not supporting a strong module-level linkage model. In the C language the concept of module was initially rather loose. There was no language distinction between function names intended for linkage to other compilation units and function names intended only for use within a single compilation unit to simplify the implementation. The C language provides the static keyword which makes it possible to hide names from external linkage, but this was rarely employed, as it also obscured these names from most runtime debugging tools.

A common early convention was to use names (often prosaic) consisting mostly of lower case letters and underscores for names in external linkage not intended for use by other translation units such as a local function named count_obscure_piddly_flags and camel case or some variant for primary application calls such as EditSaveFile.

The "Alto Keyboard" origin

Another explanation is that CamelCase started at Xerox PARC around 1978, with the Mesa programming language developed for the Xerox Alto computer. This machine lacked an underscore key, and the hyphen and space characters were not permitted in identifiers, leaving CamelCase as the only viable scheme for readable multiword names. The PARC Mesa Language Manual (1979) included a coding standard with specific rules for Upper- and lowerCamelCase which was strictly followed by the Mesa libraries and the Alto operating system.

The Smalltalk language, which was developed originally on the Alto and became quite popular in the early 1980s, may have been instrumental in spreading the style outside PARC. CamelCase was also used by convention for many names in the PostScript page description language (invented by Adobe Systems founder and ex-PARC scientist John Warnock). Further boost was provided by Niklaus Wirth — the inventor of Pascal — who acquired a taste for CamelCase during a sabbatical at PARC, and used it in Modula, his next programming language.

Spread to mainstream usage

During the same period in which personal computers exposed hacker culture to a more mainstream audience in the 1980s and 1990s, CamelCase became fashionable for corporate trade names, first in computer-related fields but later expanding further into the mainstream. During the dot-com bubble of in the late 1990s, in particular, the lowercase prefixes "e" (for "electronic") and "i" (for "Internet", "information", or perhaps "intelligent") became quite common. Here are some examples ranging from the 1960s to the 2000s, sorted by year:

This fashion has become so pervasive that it is often incorrectly applied to names that do not use it officially, as in TransAmerica (Transamerica), FireFox (Firefox), UseNet (Usenet), TimeWarner (Time Warner, whose new logo does appear in CamelCase fashion), GameBoy (Game Boy), MicroSoft (Microsoft, previously correct), MacWorld (Macworld), KarmelKorn (Karmelkorn), PhotoShop (Photoshop, previously correct) and BlackBox (Blackbox).

History of the name

The original name of the practice, used in media studies, grammars, and the Oxford English Dictionary, was "medial capitals". The fancier names such as "InterCaps", "CamelCase", and variations thereof are relatively recent, and seem more common in computer-related communities.

The earliest known occurrence of InterCaps on Usenet is in an April 1990 post to the group alt.folklore.computers by Avi Rappoport [1], with BiCapitalization appearing slightly later in a 1991 post by Eric S. Raymond to the same group [2]. The earliest use of the name "CamelCase" occurs in 1995, in a post by Newton Love. [3]. "With the advent of programming languages having these sorts of constructs, the humpiness of the style made me call it HumpyCase at first, before I settled on CamelCase. I had been calling it CamelCase for years," said Newton, [4] "The citation above was just the first time I had used the name on USENET."

See also

References

External links