Jump to content

Primitive data type: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Integer numbers: shorten signed to remove linebreak
tried to fix the lead, ended up rewriting the article. Java is front and center because I didn't want to lookup hardware specifics.
Line 1: Line 1:
{{Short description|An extremely basic/core data type provided by a programming language}}
{{Short description|An extremely basic/core data type provided by a programming language}}
{{More citations needed|date=March 2015}}
{{More citations needed|date=March 2015}}
In [[computer science]], '''primitive data types''' are a set of basic [[data type]]s from which all other data types are constructed.<ref>{{cite book |last1=Stone |first1=R. G. |last2=Cooke |first2=D. J. |title=Program Construction |date=5 February 1987 |publisher=Cambridge University Press |isbn=978-0-521-31883-9 |page=18 |url=https://www.google.com/books/edition/Program_Construction/0k_xz8O2SewC?hl=en&gbpv=1&pg=PA18 |language=en}}</ref> Specifically it often refers to the limited set of data representations in use by a particular [[central processing unit|processor]], which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary.<ref>{{cite book |last1=Wikander |first1=Jan |last2=Svensson |first2=Bertil |title=Real-Time Systems in Mechatronic Applications |date=31 May 1998 |publisher=Springer Science & Business Media |isbn=978-0-7923-8159-4 |page=101 |url=https://www.google.com/books/edition/Real_Time_Systems_in_Mechatronic_Applica/fDCNR7VwG-AC?hl=en&gbpv=1&pg=PA101 |language=en}}</ref> More generally "primitive data types" may refer to the standard data types built into a [[programming language]].<ref>{{cite book |last1=Khurana |first1=Rohit |title=Data and File Structure (For GTU), 2nd Edition |publisher=Vikas Publishing House |isbn=978-93-259-6005-3 |page=2 |url=https://www.google.com/books/edition/Data_and_File_Structure_For_GTU_2nd_Edit/s0JDDAAAQBAJ?hl=en&gbpv=1&pg=PA2 |language=en}}</ref><ref>{{cite book |last1=Chun |first1=Wesley |title=Core Python Programming |date=2001 |publisher=Prentice Hall Professional |isbn=978-0-13-026036-9 |page=77 |url=https://www.google.com/books/edition/Core_Python_Programming/mh0bU6NXrBgC?hl=en&gbpv=1&pg=PA77 |language=en}}</ref> Data types which are not primitive are referred to as derived or [[composite data type]]s.
In [[computer science]], '''primitive data type''' is either of the following:{{Citation needed|date=May 2009}}
* a ''basic type'' is a [[data type]] provided by a [[programming language]] as a basic building block. Most languages allow more complicated ''[[composite type]]s'' to be recursively constructed starting from basic types.
* a ''built-in type'' is a data type for which the programming language provides built-in support.


Primitive types are almost always [[value type]]s, but composite types may also be value types.<ref>{{cite book |last1=Olsen |first1=Geir |last2=Allison |first2=Damon |last3=Speer |first3=James |title=Visual Basic .NET Class Design Handbook: Coding Effective Classes |date=1 January 2008 |publisher=Apress |isbn=978-1-4302-0780-1 |page=80 |url=https://www.google.com/books/edition/Visual_Basic_NET_Class_Design_Handbook/DUQnCgAAQBAJ?hl=en&gbpv=1&pg=PA80 |language=en}}</ref>
In most programming languages, all basic data types are built-in. In addition, many languages also provide a set of composite data types.


==Common primitive data types==
Depending on the language and its implementation, primitive data types may or may not have a one-to-one correspondence with objects in the computer's memory. However, one usually expects operations on basic primitive data types to be the fastest language constructs there are.{{Citation needed|date=May 2009}} Integer addition, for example, can be performed as a single machine instruction, and some [[central processing unit|processors]] offer specific instructions to process sequences of characters with a single instruction.<ref>{{Cite web|url=https://www.sciencedirect.com/topics/computer-science/single-instruction-single-data|title=Single Instruction Single Data - an overview &#124; ScienceDirect Topics}}</ref> In particular, the [[C (programming language)|C]] standard mentions that "a <nowiki>'plain'</nowiki> int object has the natural size suggested by the architecture of the execution environment."{{citation needed|date=October 2020}} This means that <code>int</code> is likely to be 32 bits long on a 32-bit architecture. Basic primitive types are almost always [[value type]]s.


The [[Java (programming language)|Java]] virtual machine's set of primitive data types is:<ref>{{cite book |last1=Lindholm |first1=Tim |last2=Yellin |first2=Frank |last3=Bracha |first3=Gilad |last4=Buckley |first4=Alex |title=The Java® Virtual Machine Specification |date=13 February 2015 |url=https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html#jvms-2.3 |chapter=Chapter 2. The Structure of the Java Virtual Machine}}</ref>
Most languages do not allow the behavior or capabilities of primitive (either built-in or basic) data types to be modified by programs. Exceptions include [[Smalltalk]], which permits all data types to be extended within a program, adding to the operations that can be performed on them or even redefining the built-in operations.{{Citation needed|date=May 2009}}
* [[integer (computer science)|Integer]] types with a variety of ranges and [[precision (computer science)|precisions]] (<code>byte</code>, <code>short</code>, <code>int</code>, <code>long</code>, <code>char</code>)
* [[Floating point|Floating-point number]] with single or double [[precision (computer science)|precisions]]; (<code>float</code>, <code>double</code>)
* [[Boolean data type|Boolean]], logical values '''true''' and '''false'''. (<code>boolean</code>)
* A value referring to an executable memory address. (<code>returnAddress</code>) This is not accessible from the Java programming language and is usually left out.<ref>{{cite book |last1=Cowell |first1=John |title=Essential Java Fast: How to write object oriented software for the Internet |date=18 February 1997 |publisher=Springer Science & Business Media |isbn=978-3-540-76052-8 |page=27 |url=https://www.google.com/books/edition/Essential_Java_Fast/5M9_fBX4QicC?hl=en&gbpv=1&pg=PA27 |language=en}}</ref><ref>{{cite book |last1=Rakshit |first1=Sandip |last2=Panigrahi |first2=Goutam |title=A Hand Book of Objected Oriented Programming With Java |date=December 1995 |publisher=S. Chand Publishing |isbn=978-81-219-3001-7 |page=11 |url=https://www.google.com/books/edition/A_Hand_Book_of_Objected_Oriented_Program/aAsbEAAAQBAJ?hl=en&gbpv=1&pg=PA11 |language=en}}</ref>


These primitive types are in general precisely those supported by computer hardware, except possibly for varying integer sizes or hardware that is missing floating point. Operations on such primitives are usually quite efficient. Primitive data types which are native to the processor have a one-to-one correspondence with objects in the computer's memory, and operations on these types are often the fastest possible in most cases.<ref name="Agner">{{cite web |last1=Fog |first1=Agner |title=Optimizing software in C++ |url=https://www.agner.org/optimize/optimizing_cpp.pdf#page=29 |access-date=28 January 2022 |page=29 |quote=Integer operations are fast in most cases, [...]}}</ref> Integer addition, for example, can be performed as a single machine instruction, and some offer specific instructions to process sequences of characters with a single instruction.<ref>{{Cite web|url=https://www.sciencedirect.com/topics/computer-science/single-instruction-single-data|title=Single Instruction Single Data - an overview &#124; ScienceDirect Topics}}</ref> But the choice of primitive data type may affect performance, for example it is faster using [[SIMD]] operations and data types to operate on an array of floats.{{r|Agner|p=113}}
==Overview==
The actual range of primitive data types that is available is dependent upon the specific programming language that is being used. For example, in [[C Sharp (programming language)|C#]], [[string (computer science)|strings]] are a composite but built-in data type, whereas in modern dialects of [[BASIC]] and in [[JavaScript]], they are assimilated to a primitive data type that is both basic and built-in.<ref>{{Cite web|title=Primitive Data Types (The Java™ Tutorials > Learning the Java Language > Language Basics)|url=https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html|website=docs.oracle.com|access-date=2020-05-01}}</ref><ref>{{Cite web|title=Data Types in C|url=https://www.geeksforgeeks.org/data-types-in-c/|date=2015-06-30|website=GeeksforGeeks|language=en-US|access-date=2020-05-01}}</ref>


The set of basic [[C_data_types#Basic_types|C data types]] is similar to Java's. Minimally, there are four types, <code>char</code>, <code>int</code>, <code>float</code>, and <code>double</code>, but the qualifiers <code>short</code>, <code>long</code>, <code>signed</code>, and <code>unsigned</code> mean that C contains numerous target-dependent integer and floating-point primitive types.<ref>{{cite book |last1=Kernighan |first1=Brian W.|last2=Ritchie|first2=Dennis M. |title=The C programming language|chapter=2.2 Data Types and Sizes |date=1988 |location=Englewood Cliffs, N.J. |isbn=0131103709 |page=36 |edition=Second}}</ref>
Classic basic primitive types may include:
* [[character (computing)|Character]] (<code>character</code>, <code>char</code>);
* [[integer (computer science)|Integer]] (<code>integer</code>, <code>int</code>, <code>short</code>, <code>long</code>, <code>byte</code>) with a variety of [[precision (computer science)|precisions]];
* [[Floating point|Floating-point number]] (<code>float</code>, <code>double</code>, <code>real</code>, <code>double precision</code>);
* [[fixed-point arithmetic|Fixed-point number]] (<code>fixed</code>) with a variety of [[precision (computer science)|precisions]] and a programmer-selected [[order of magnitude|scale]].
* [[Boolean data type|Boolean]], logical values '''true''' and '''false'''.
* [[reference (computer science)|Reference]] (also called a ''[[pointer (computer programming)|pointer]]'' or ''handle'' or ''descriptor''), a value referring to another object. The reference can be a memory address, or an index to a collection of values.

:The above primitives are generally supported more or less directly by computer hardware, except possibly for floating point, so operations on such primitives are usually fairly efficient. Some programming languages support text strings as a primitive (e.g. BASIC) while others treat a text string as an array of characters (e.g. C). Some computer hardware (e.g. x86) has instructions which help in dealing with text strings, but complete hardware support for text strings is rare.

[[String (computer science)|Strings]] could be any series of characters in the used [[Character encoding|encoding]]. To separate strings from code, most languages enclose them by single or double quotes. For example "Hello World" or 'Hello World'. Note that "200" could be mistaken for an integer type but is actually a string type because it is contained in double quotes.

<!-- Please note: having constructs provided by the standard library does not make them "built-in" -->
More sophisticated types which can be built-in include:
* [[Tuple]] in [[Standard ML]], [[Python (programming language)|Python]], [[Scala (programming language)|Scala]], [[Swift (programming language)|Swift]], [[Elixir (programming language)|Elixir]]
* [[List (abstract data type)|List]] in [[Common Lisp]], [[Python (programming language)|Python]], [[Scheme (programming language)|Scheme]], [[Haskell (programming language)|Haskell]]
* [[Complex data type|Complex number]] in [[C99]], [[Fortran]], [[Common Lisp]], [[Python (programming language)|Python]], [[D (programming language)|D]], [[Go (programming language)|Go]]
* [[Rational data type|Rational number]] in [[Common Lisp]], [[Haskell (programming language)|Haskell]]
* [[Associative array]] in [[Perl]], [[PHP]], [[Python (programming language)|Python]], [[Ruby (programming language)|Ruby]], [[JavaScript]], [[Lua (programming language)|Lua]], [[D (programming language)|D]], [[Go (programming language)|Go]]
* [[First-class function]], in all [[functional programming|functional]] languages, [[JavaScript]], [[Lua (programming language)|Lua]], [[D (programming language)|D]], [[Go (programming language)|Go]], and in newer standards of [[C++]], [[Java (programming language)|Java]], [[C Sharp (programming language)|C#]], [[Perl]]

==Specific primitive data types==


===Integer numbers===
===Integer numbers===
{{Main|Integer (computer science)}}
An [[integer (computer science)|integer]] data type represents some [[interval (mathematics)|range of mathematical integers]]. Integers may be either signed (allowing negative values) or unsigned ([[non-negative integer]]s only). Common ranges are:
An [[integer (computer science)|integer]] data type represents some [[interval (mathematics)|range of mathematical integers]]. Integers may be either signed (allowing negative values) or unsigned ([[non-negative integer]]s only). Common ranges are:


Line 71: Line 52:
| −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807
| −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807
| 0 to 18,446,744,073,709,551,615
| 0 to 18,446,744,073,709,551,615
|-
| unlimited/8<!--Add footnotes?: For real Bignum, that can grow, there needs to be a pointer overhead-->
| unlimited
| [[Bignum]]
| –2<sup>unlimited</sup>/2 to +(2<sup>unlimited</sup>/2 − 1)
| 0 to 2<sup>unlimited</sup> − 1
|}
|}

Literals for integers can be written as regular [[Arabic numerals]], consisting of a sequence of digits and with negation indicated by a [[hyphen-minus|minus sign]] before the value. However, most programming languages disallow use of commas or spaces for [[digit grouping]]. Examples of integer literals are:

* <tt>42</tt>
* <tt>10000</tt>
* <tt>-233000</tt> <!-- do not use the real minus sign, as that isn't part of a literal -->

There are several alternate methods for writing integer literals in many programming languages:

* Most programming languages, especially those influenced by [[C (programming language)|C]], prefix an integer literal with <tt>0X</tt> or <tt>0x</tt> to represent a [[hexadecimal]] value, e.g. <tt>0xDEADBEEF</tt>. Other languages may use a different notation, e.g. some [[assembly language]]s append an <tt>H</tt> or <tt>h</tt> to the end of a hexadecimal value.
* [[Perl]], [[Ruby (programming language)|Ruby]], [[Java (programming language)|Java]], [[Julia (programming language)|Julia]], [[D (programming language)|D]], [[Rust (programming language)|Rust]] and [[Python (programming language)|Python]] (starting from version 3.6) allow embedded [[underscore]]s for clarity, e.g. <tt>10_000_000</tt>, and fixed-form [[Fortran]] ignores embedded spaces in integer literals.
* In [[C (programming language)|C]] and [[C++]], a leading zero indicates an [[octal]] value, e.g. <tt>0755</tt>. This was primarily intended to be used with [[Modes (Unix)|Unix modes]]; however, it has been criticized because normal integers may also lead with zero.<ref>ECMAScript 6th Edition draft: https://people.mozilla.org/~jorendorff/es6-draft.html#sec-literals-numeric-literals {{webarchive|url=https://web.archive.org/web/20131216202526/https://people.mozilla.org/~jorendorff/es6-draft.html |date=2013-12-16 }}</ref> As such, [[Python (programming language)|Python]], [[Ruby (programming language)|Ruby]], [[Haskell (programming language)|Haskell]], and [[OCaml]] prefix octal values with <tt>0O</tt> or <tt>0o</tt>, following the layout used by hexadecimal values.
* Several languages, including [[Java (programming language)|Java]], [[C Sharp (programming language)|C#]], [[Scala (programming language)|Scala]], [[Python (programming language)|Python]], [[Ruby (programming language)|Ruby]], and [[OCaml]], can represent binary values by prefixing a number with <tt>0B</tt> or <tt>0b</tt>.


===Floating-point numbers===
===Floating-point numbers===
{{Main|Floating-point arithmetic}}
A [[floating-point]] number represents a limited-precision [[rational number]] that may have a fractional part. These numbers are stored internally in a format equivalent to [[scientific notation]], typically in [[binary numeral system|binary]] but sometimes in [[decimal]]. Because floating-point numbers have limited precision, only a subset of [[real number|real]] or [[rational number|rational]] numbers are exactly representable; other numbers can be represented only approximately.
A [[floating-point]] number represents a limited-precision [[rational number]] that may have a fractional part. These numbers are stored internally in a format equivalent to [[scientific notation]], typically in [[binary numeral system|binary]] but sometimes in [[decimal]]. Because floating-point numbers have limited precision, only a subset of [[real number|real]] or [[rational number|rational]] numbers are exactly representable; other numbers can be represented only approximately. Many languages have both a [[single precision]] (often called "float") and a [[double precision]] type (often called "double").

Many languages have both a [[single precision]] (often called "float") and a [[double precision]] type.

Literals for floating point numbers include a decimal point, and typically use <tt>e</tt> or <tt>E</tt> to denote scientific notation. Examples of floating-point literals are:

* <tt>20.0005</tt>
* <tt>99.9</tt>
* <tt>-5000.12</tt><!-- do not use ndash, as that isn't part of a literal(?)-->
* <tt>6.02e23</tt>

Some languages (e.g., [[Fortran]], [[Python (programming language)|Python]], [[D (programming language)|D]]) also have a [[complex number]] type comprising two floating-point numbers: a real part and an imaginary part.

===Fixed-point numbers===
A [[fixed-point arithmetic|fixed-point]] number represents a limited-precision [[rational number]] that may have a fractional part. These numbers are stored internally in a scaled-integer form, typically in [[binary numeral system|binary]] but sometimes in [[decimal]]. Because fixed-point numbers have limited precision, only a subset of [[real number|real]] or [[rational number|rational]] numbers are exactly representable; other numbers can be represented only approximately. Fixed-point numbers also tend to have a more limited range of values than [[floating point]], and so the programmer must be careful to avoid overflow in intermediate calculations as well as the final result.


===Booleans===
===Booleans===
Line 114: Line 63:
Many languages (e.g. [[Java (programming language)|Java]], [[Pascal (programming language)|Pascal]] and [[Ada (programming language)|Ada]]) implement booleans adhering to the concept of ''boolean'' as a distinct logical type. Languages, though, may implicitly convert booleans to ''numeric types'' at times to give extended semantics to booleans and boolean expressions or to achieve backwards compatibility with earlier versions of the language. For example, early versions of the C programming language that followed [[ANSI C]] and its former standards did not have a dedicated boolean type. Instead, numeric values of zero are interpreted as "false", and any other value is interpreted as "true".<ref>{{cite book |first1= Brian W |last1= Kernighan |author-link1= Brian Kernighan |first2= Dennis M |last2= Ritchie |author-link2= Dennis Ritchie |page= [https://archive.org/details/cprogramminglang00kern/page/41 41] |title= [[The C Programming Language]] |edition= 1st |publisher= [[Prentice Hall]] |year= 1978 |location= [[Englewood Cliffs, NJ]] |isbn= 0-13-110163-3}}</ref> The newer [[C99]] added a distinct boolean type that can be included with [[stdbool.h]],<ref>{{cite web|url=https://devdocs.io/c/types/boolean|access-date=October 15, 2020|title=Boolean type support library|website=devdocs.io}}</ref> and [[C++]] supports <code>bool</code> as a built-in type and "true" and "false" as reserved words.<ref>{{cite web|url=https://www.geeksforgeeks.org/bool-data-type-in-c/|access-date=October 15, 2020|title=Bool data type in C++|website=GeeksforGeeks|date=5 June 2017}}</ref>
Many languages (e.g. [[Java (programming language)|Java]], [[Pascal (programming language)|Pascal]] and [[Ada (programming language)|Ada]]) implement booleans adhering to the concept of ''boolean'' as a distinct logical type. Languages, though, may implicitly convert booleans to ''numeric types'' at times to give extended semantics to booleans and boolean expressions or to achieve backwards compatibility with earlier versions of the language. For example, early versions of the C programming language that followed [[ANSI C]] and its former standards did not have a dedicated boolean type. Instead, numeric values of zero are interpreted as "false", and any other value is interpreted as "true".<ref>{{cite book |first1= Brian W |last1= Kernighan |author-link1= Brian Kernighan |first2= Dennis M |last2= Ritchie |author-link2= Dennis Ritchie |page= [https://archive.org/details/cprogramminglang00kern/page/41 41] |title= [[The C Programming Language]] |edition= 1st |publisher= [[Prentice Hall]] |year= 1978 |location= [[Englewood Cliffs, NJ]] |isbn= 0-13-110163-3}}</ref> The newer [[C99]] added a distinct boolean type that can be included with [[stdbool.h]],<ref>{{cite web|url=https://devdocs.io/c/types/boolean|access-date=October 15, 2020|title=Boolean type support library|website=devdocs.io}}</ref> and [[C++]] supports <code>bool</code> as a built-in type and "true" and "false" as reserved words.<ref>{{cite web|url=https://www.geeksforgeeks.org/bool-data-type-in-c/|access-date=October 15, 2020|title=Bool data type in C++|website=GeeksforGeeks|date=5 June 2017}}</ref>


== Built-in types ==
===Characters and strings===
A [[character (computing)|character]] type (typically called "char") may contain a single [[letter (alphabet)|letter]], [[numerical digit|digit]], [[punctuation mark]], [[symbol]], formatting code, [[control code]], or some other specialized code (e.g., a [[byte order mark]]). In [[C (programming language)|C]], <code>char</code> is defined as the smallest addressable unit of memory. On most systems, this is 8 [[bit]]s; Several standards, such as [[POSIX]], require it to be this size. Some languages have two or more character types, for example a single-byte type for [[ASCII]] characters and a multi-byte type for [[Unicode]] characters. The term "character type" is normally used even for types whose values more precisely represent [[code unit]]s, for example a [[UTF-16]] code unit as in [[Java (programming language)|Java]] (support limited to 16-bit characters only <ref>{{cite web |last1=Mansoor |first1=Umer |title=The char Type in Java is Broken |url=https://codeahoy.com/2016/05/08/the-char-type-in-java-is-broken/ |website=CodeAhoy |date=8 May 2016 |access-date=10 February 2020 |ref=3}}</ref> )and [[JavaScript]].


<!-- Please note: having constructs provided by the standard library does not make them "built-in" -->
Characters may be combined into [[string (computer science)|strings]]. The string data can include numbers and other numerical symbols but is treated as text. For example, the mathematical operations that can be performed on a numerical value (e.g. 200) generally cannot be performed on that same value written as a string (e.g. "200").
Types which can be built-in to sophisticated programming languages include:
* Characters and strings (see [[#Characters and strings|below]])
* Ranges (see [[#Ranges|below]])
* [[Tuple]] in [[Standard ML]], [[Python (programming language)|Python]], [[Scala (programming language)|Scala]], [[Swift (programming language)|Swift]], [[Elixir (programming language)|Elixir]]
* [[List (abstract data type)|List]] in [[Common Lisp]], [[Python (programming language)|Python]], [[Scheme (programming language)|Scheme]], [[Haskell (programming language)|Haskell]]
* [[fixed-point arithmetic|Fixed-point number]] with a variety of [[precision (computer science)|precisions]] and a programmer-selected [[order of magnitude|scale]].
* [[Complex data type|Complex number]] in [[C99]], [[Fortran]], [[Common Lisp]], [[Python (programming language)|Python]], [[D (programming language)|D]], [[Go (programming language)|Go]]. This is two floating-point numbers, a real part and an imaginary part.
* [[Rational data type|Rational number]] in [[Common Lisp]]
* [[Arbitrary-precision arithmetic|Arbitrary-precision]] <code>Integer</code> type in [[Common Lisp]], [[Erlang (programming language)|Erlang]], [[Haskell (programming language)|Haskell]]
* [[Associative array]] in [[Perl]], [[PHP]], [[Python (programming language)|Python]], [[Ruby (programming language)|Ruby]], [[JavaScript]], [[Lua (programming language)|Lua]], [[D (programming language)|D]], [[Go (programming language)|Go]]
* [[reference (computer science)|Reference]] (also called a ''[[pointer (computer programming)|pointer]]'' or ''handle'' or ''descriptor''),
* [[First-class function]], in all [[functional programming|functional]] languages, [[JavaScript]], [[Lua (programming language)|Lua]], [[D (programming language)|D]], [[Go (programming language)|Go]], and in newer standards of [[C++]], [[Java (programming language)|Java]], [[C Sharp (programming language)|C#]], [[Perl]]


===Characters and strings===
Strings are implemented in various ways, depending on the programming language. The simplest way to implement strings is to create them as an array of characters, followed by a delimiting character used to signal the end of the string, usually [[Null character|NUL]]. These are referred to as [[null-terminated string]]s, and are usually found in languages with a low amount of [[hardware abstraction]], such as [[C (programming language)|C]] and [[Assembly language|Assembly]]. While easy to implement, null terminated strings have been criticized for causing [[buffer overflow]]s. Most high-level scripting languages, such as [[Python (programming language)|Python]], [[Ruby (programming language)|Ruby]], and many dialects of [[BASIC]], have no separate character type; strings with a length of one are normally used to represent single characters. Some languages, such as [[C++]] and [[Java (programming language)|Java]], have the capability to use null-terminated strings (usually for backwards-compatibility measures), but additionally provide their own class for string handling (<code>[[String (C++)|std::string]]</code> and <code>java.lang.String</code>, respectively) in the standard library.
A [[character (computing)|character]] type is a type that can represent all [[Unicode characters]], hence must be at least 21 bits wide. C includes a <code>char</code> type, but it is defined to be the smallest addressable unit of memory, which several standards, such as [[POSIX]] require to be 8 [[bit]]s. Hence it is too small to represent all Unicode characters, and instead standards commonly refer to it as an integer type. The term "char" is also used for a 16-bit integer type in [[Java (programming language)|Java]], but again this is not a character type.<ref>{{cite web |last1=Mansoor |first1=Umer |title=The char Type in Java is Broken |url=https://codeahoy.com/2016/05/08/the-char-type-in-java-is-broken/ |website=CodeAhoy |date=8 May 2016 |access-date=10 February 2020 |ref=3}}</ref>. Some languages such as Julia include a true 32-bit Unicode character type as primitive.<ref>{{cite web |title=Strings · The Julia Language |url=https://docs.julialang.org/en/v1/manual/strings/#man-characters |website=docs.julialang.org |access-date=29 January 2022}}</ref>

There is also a difference on whether or not strings are mutable or [[Immutable object|immutable]] in a language. Mutable strings may be altered after their creation, whereas immutable strings maintain a constant size and content. In the latter, the only way to alter strings are to create new ones. There are both advantages and disadvantages to each approach: although immutable strings are much less flexible, they are simpler and completely [[Thread safety|thread-safe]]. Some examples of languages that use mutable strings include [[C++]], [[Perl]] and [[Ruby (programming language)|Ruby]], whereas languages that do not include [[JavaScript]], [[Lua (programming language)|Lua]], [[Python (programming language)|Python]] and [[Go (programming language)|Go]]. A few languages, such as [[Objective-C]], provide different types for mutable and immutable strings.

Literals for characters and strings are usually surrounded by [[quotation marks]]: sometimes, single quotes (<tt>'</tt>) are used for characters and double quotes (<tt>"</tt>) are used for strings. Python accepts either variant for its string notation.

Examples of character literals in C syntax are:

* <tt>'A'</tt>
* <tt>'4'</tt>
* <tt>'$'</tt>
* <tt>'\t'</tt> ([[tab key|tab character]])

Examples of string literals in C syntax are:


Other languages such as [[Javascript]], [[Python (programming language)|Python]], [[Ruby (programming language)|Ruby]], and many dialects of [[BASIC]] do not have a primitive character type but instead add [[String (computer science)|Strings]] as a primitive data type, typically using the [[UTF-8]] encoding. Strings with a length of one are normally used to represent single characters.
* <tt>"A"</tt>
* <tt>"Hello World"</tt>
* <tt>"There are 4 cats."</tt>


Some computer hardware (e.g. x86) has instructions which help in dealing with text strings, but complete hardware support for text strings is rare.
===Numeric data type ranges===
Each numeric data type has its maximum and minimum value known as the [[range (computer science)|range]]. Attempting to store a number outside the range may lead to compiler/runtime errors, or to incorrect calculations (due to [[truncation]]) depending on the language being used.


===Ranges===
The range of a variable is based on the number of bytes used to save the value, and an integer [[data type]] is usually able to store 2<sup>''n''</sup> values (where ''n'' is the number of [[bit]]s that contribute to the value). For other data types (e.g. [[floating-point arithmetic|floating-point]] values) the range is more complicated and will vary depending on the method used to store it. There are also some types that do not use entire bytes, e.g. a [[boolean data type|boolean]] that requires a single [[bit]], and represents a [[binary numeral system|binary]] value (although in practice a byte is often used, with the remaining 7 bits being redundant). Some programming languages (such as [[Ada (programming language)|Ada]] and [[Pascal (programming language)|Pascal]]) also allow the opposite direction, that is, the programmer defines the range and precision needed to solve a given problem and the compiler chooses the most appropriate integer or floating-point type automatically.
A [[range (computer science)|range]] numeric data type has its maximum and minimum value embedded in the type. It is included in some languages such as [[Ada (programming language)|Ada]] and [[Pascal (programming language)|Pascal]]. Attempting to store a number outside the range may lead to compiler/runtime errors, or to incorrect calculations (due to [[truncation]]) depending on the language being used. In practice the compiler chooses the most appropriate primitive integer or floating-point type automatically.


==See also==
==See also==

Revision as of 03:58, 29 January 2022

In computer science, primitive data types are a set of basic data types from which all other data types are constructed.[1] Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary.[2] More generally "primitive data types" may refer to the standard data types built into a programming language.[3][4] Data types which are not primitive are referred to as derived or composite data types.

Primitive types are almost always value types, but composite types may also be value types.[5]

Common primitive data types

The Java virtual machine's set of primitive data types is:[6]

  • Integer types with a variety of ranges and precisions (byte, short, int, long, char)
  • Floating-point number with single or double precisions; (float, double)
  • Boolean, logical values true and false. (boolean)
  • A value referring to an executable memory address. (returnAddress) This is not accessible from the Java programming language and is usually left out.[7][8]

These primitive types are in general precisely those supported by computer hardware, except possibly for varying integer sizes or hardware that is missing floating point. Operations on such primitives are usually quite efficient. Primitive data types which are native to the processor have a one-to-one correspondence with objects in the computer's memory, and operations on these types are often the fastest possible in most cases.[9] Integer addition, for example, can be performed as a single machine instruction, and some offer specific instructions to process sequences of characters with a single instruction.[10] But the choice of primitive data type may affect performance, for example it is faster using SIMD operations and data types to operate on an array of floats.[9]: 113 

The set of basic C data types is similar to Java's. Minimally, there are four types, char, int, float, and double, but the qualifiers short, long, signed, and unsigned mean that C contains numerous target-dependent integer and floating-point primitive types.[11]

Integer numbers

An integer data type represents some range of mathematical integers. Integers may be either signed (allowing negative values) or unsigned (non-negative integers only). Common ranges are:

Size (bytes) Size (bits) Names Signed range (two's complement representation) Unsigned range
1 byte 8 bits Byte, octet, minimum size of char in C99( see limits.h CHAR_BIT) −128 to +127 0 to 255
2 bytes 16 bits x86 word, minimum size of short and int in C −32,768 to +32,767 0 to 65,535
4 bytes 32 bits x86 double word, minimum size of long in C, actual size of int for most modern C compilers,[12] pointer for IA-32-compatible processors −2,147,483,648 to +2,147,483,647 0 to 4,294,967,295
8 bytes 64 bits x86 quadruple word, minimum size of long long in C, actual size of long for most modern C compilers,[12] pointer for x86-64-compatible processors −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 0 to 18,446,744,073,709,551,615

Floating-point numbers

A floating-point number represents a limited-precision rational number that may have a fractional part. These numbers are stored internally in a format equivalent to scientific notation, typically in binary but sometimes in decimal. Because floating-point numbers have limited precision, only a subset of real or rational numbers are exactly representable; other numbers can be represented only approximately. Many languages have both a single precision (often called "float") and a double precision type (often called "double").

Booleans

A boolean type, typically denoted "bool" or "boolean", is typically a logical type that can have either the value "true" or the value "false". Although only one bit is necessary to accommodate the value set "true" and "false", programming languages typically implement boolean types as one or more bytes.

Many languages (e.g. Java, Pascal and Ada) implement booleans adhering to the concept of boolean as a distinct logical type. Languages, though, may implicitly convert booleans to numeric types at times to give extended semantics to booleans and boolean expressions or to achieve backwards compatibility with earlier versions of the language. For example, early versions of the C programming language that followed ANSI C and its former standards did not have a dedicated boolean type. Instead, numeric values of zero are interpreted as "false", and any other value is interpreted as "true".[13] The newer C99 added a distinct boolean type that can be included with stdbool.h,[14] and C++ supports bool as a built-in type and "true" and "false" as reserved words.[15]

Built-in types

Types which can be built-in to sophisticated programming languages include:

Characters and strings

A character type is a type that can represent all Unicode characters, hence must be at least 21 bits wide. C includes a char type, but it is defined to be the smallest addressable unit of memory, which several standards, such as POSIX require to be 8 bits. Hence it is too small to represent all Unicode characters, and instead standards commonly refer to it as an integer type. The term "char" is also used for a 16-bit integer type in Java, but again this is not a character type.[16]. Some languages such as Julia include a true 32-bit Unicode character type as primitive.[17]

Other languages such as Javascript, Python, Ruby, and many dialects of BASIC do not have a primitive character type but instead add Strings as a primitive data type, typically using the UTF-8 encoding. Strings with a length of one are normally used to represent single characters.

Some computer hardware (e.g. x86) has instructions which help in dealing with text strings, but complete hardware support for text strings is rare.

Ranges

A range numeric data type has its maximum and minimum value embedded in the type. It is included in some languages such as Ada and Pascal. Attempting to store a number outside the range may lead to compiler/runtime errors, or to incorrect calculations (due to truncation) depending on the language being used. In practice the compiler chooses the most appropriate primitive integer or floating-point type automatically.

See also

References

  1. ^ Stone, R. G.; Cooke, D. J. (5 February 1987). Program Construction. Cambridge University Press. p. 18. ISBN 978-0-521-31883-9.
  2. ^ Wikander, Jan; Svensson, Bertil (31 May 1998). Real-Time Systems in Mechatronic Applications. Springer Science & Business Media. p. 101. ISBN 978-0-7923-8159-4.
  3. ^ Khurana, Rohit. Data and File Structure (For GTU), 2nd Edition. Vikas Publishing House. p. 2. ISBN 978-93-259-6005-3.
  4. ^ Chun, Wesley (2001). Core Python Programming. Prentice Hall Professional. p. 77. ISBN 978-0-13-026036-9.
  5. ^ Olsen, Geir; Allison, Damon; Speer, James (1 January 2008). Visual Basic .NET Class Design Handbook: Coding Effective Classes. Apress. p. 80. ISBN 978-1-4302-0780-1.
  6. ^ Lindholm, Tim; Yellin, Frank; Bracha, Gilad; Buckley, Alex (13 February 2015). "Chapter 2. The Structure of the Java Virtual Machine". The Java® Virtual Machine Specification.
  7. ^ Cowell, John (18 February 1997). Essential Java Fast: How to write object oriented software for the Internet. Springer Science & Business Media. p. 27. ISBN 978-3-540-76052-8.
  8. ^ Rakshit, Sandip; Panigrahi, Goutam (December 1995). A Hand Book of Objected Oriented Programming With Java. S. Chand Publishing. p. 11. ISBN 978-81-219-3001-7.
  9. ^ a b Fog, Agner. "Optimizing software in C++" (PDF). p. 29. Retrieved 28 January 2022. Integer operations are fast in most cases, [...]
  10. ^ "Single Instruction Single Data - an overview | ScienceDirect Topics".
  11. ^ Kernighan, Brian W.; Ritchie, Dennis M. (1988). "2.2 Data Types and Sizes". The C programming language (Second ed.). Englewood Cliffs, N.J. p. 36. ISBN 0131103709.{{cite book}}: CS1 maint: location missing publisher (link)
  12. ^ a b Fog, Agner (2010-02-16). "Calling conventions for different C++ compilers and operating systems: Chapter 3, Data Representation" (PDF). Retrieved 2010-08-30.
  13. ^ Kernighan, Brian W; Ritchie, Dennis M (1978). The C Programming Language (1st ed.). Englewood Cliffs, NJ: Prentice Hall. p. 41. ISBN 0-13-110163-3.
  14. ^ "Boolean type support library". devdocs.io. Retrieved October 15, 2020.
  15. ^ "Bool data type in C++". GeeksforGeeks. 5 June 2017. Retrieved October 15, 2020.
  16. ^ Mansoor, Umer (8 May 2016). "The char Type in Java is Broken". CodeAhoy. Retrieved 10 February 2020.
  17. ^ "Strings · The Julia Language". docs.julialang.org. Retrieved 29 January 2022.

External links