sizeof

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In the programming languages C and C++, the unary operator sizeof is used to calculate the size of any datatype, measured in the number of bytes required to represent the type. A byte in this context is the same as an unsigned char, and may be larger than 8 bits, although that is uncommon. The result of sizeof is the size of the type of the expression or parenthesized type-specifier that it precedes, and has an unsigned integral type which is usually denoted by size_t. sizeof can be applied to any datatype, including primitive types such as integer and floating-point types, pointer types, or compound datatypes (unions, structs, or C++ classes).

Need for sizeof[edit]

In many programs, there are situations where it is useful to know the size of a particular datatype (one of the most common examples is dynamic memory allocation using the library function malloc). Though for any given implementation of C or C++ the size of a particular datatype is constant, the sizes of even primitive types in C and C++ are implementation-defined (that is, not precisely defined by the standard). This can cause problems when trying to allocate a block of memory of the appropriate size. For example, say a programmer wants to allocate a block of memory big enough to hold ten variables of type int. Because our hypothetical programmer doesn't know the exact size of type int, the programmer doesn't know how many bytes to ask malloc for. Therefore, it is necessary to use sizeof:

int *pointer = malloc(10 * sizeof (int));

In the preceding code, the programmer instructs malloc to allocate and return a pointer to memory. The size of the block allocated is equal to the number of bytes for a single object of type int, multiplied by 10, ensuring enough space for all 10 ints.

It is generally not safe for a programmer to presume to know the size of any datatype. For example, even though most implementations of C and C++ on 32-bit systems define type int to be 4 bytes, the size of an int could change when code is ported to a different system, breaking the code. The exception to this is the char type, whose size is always 1 in any standards-compliant C implementation. In addition, it is frequently very difficult to predict the sizes of compound datatypes such as a struct or union, due to structure "padding" (see Implementation below). Another reason for using sizeof is readability, since it avoids magic numbers.

Use[edit]

The sizeof operator is used to determine the amount of space a designated datatype would occupy in memory. To use sizeof, the keyword "sizeof" is followed by a type name or an expression (which may be merely a variable name). If a type name is used, it must always be enclosed in parentheses, whereas expressions can be specified with or without parentheses. A sizeof expression results in a value equal to the size in bytes of the datatype or expression (with datatypes, sizeof evaluates to the size of the memory representation for an object of the specified datatype; for expressions it evaluates to the representation size for the type that would result from evaluation of the expression, which however is not evaluated). For example, since sizeof(char) is defined to be 1[1] and assuming ints are 4 bytes long, the following code will print 1,4:

/* the following code illustrates the use of sizeof 
 * with variables and expressions (no parentheses needed),
 * and with type names (parentheses needed)
 */
 
char c;
 
printf("%zu,%zu\n", sizeof c, sizeof (int));

Because types are not known to the C preprocessor, sizeof cannot be used in #if expressions.

Certain standard headers such as stddef.h define size_t to denote the unsigned integral type of the result of a sizeof expression, which is always positive. The printf width specifier z should be used to format that type.

Using sizeof with arrays[edit]

When sizeof is applied to the name of a static array (not allocated through malloc), the result is the size in bytes of the whole array. This is one of the few exceptions to the rule that the name of an array is converted to a pointer to the first element of the array, and is possible just because the actual array size is fixed and known at compile time, when sizeof operator is evaluated. The following program uses sizeof to determine the size of a declared array, avoiding a buffer overflow when copying characters:

#include <stdio.h>
#include <string.h>
 
int main(int argc, char **argv)
{
  char buffer[10]; /* Array of 10 chars */
 
  /* Copy at most 9 characters from argv[1] into buffer.
   *  sizeof(char) is defined to always be 1.
   */
  strncpy(buffer, argv[1], sizeof buffer - sizeof buffer[0]);
 
  /* Ensure that the buffer is null-terminated: */
  buffer[sizeof buffer - 1] = '\0';
 
  return 0;
}

Here, sizeof buffer is equivalent to 10*sizeof buffer[0], or 10.

C99 adds support for flexible array members to structures. This form of array declaration is allowed as the last element in structures only, and differs from normal arrays in that no length is specified to the compiler. For a structure named s containing a flexible array member named a, sizeof s is therefore equivalent to offsetof(s, a):

#include <stdio.h>
 
struct flexarray
{
    char val;
    int array[];  /* Flexible array member; must be last element of struct */
};
 
int main(int argc, char **argv)
{
    printf("sizeof (struct flexarray) = %zu\n", sizeof (struct flexarray));
    return 0;
}

Thus, in this case the sizeof operator returns the size of the structure, including any padding, but without any storage allowed for the array. In the above example, the following output will be produced on most platforms:

sizeof (struct flexarray) = 4

C99 also allows variable length arrays where the length is specified at runtime.[2] In such cases, the sizeof operator is evaluated in part at runtime to determine the storage occupied by the array.

#include <stddef.h>
 
size_t flexsize(int n)
{
   char b[n+3];      /* Variable length array */
   return sizeof b;  /* Execution time sizeof */
}
 
int main( void )
{
  size_t size = flexsize(10); /* flexsize returns 13 */
  return 0;
}

sizeof can be used to determine the number of elements in an array, by taking the size of the entire array and dividing it by the size of a single element.

#define Elements_in(arrayname) (sizeof arrayname/sizeof *arrayname)
 
int main( void )
{
   int tab[10];
   cout << "Number of elements in the array: " << Elements_in(tab) << endl; // yields 10
   return 0;
}

Because this works only for the name of a declared array object, non-trivial revision will be necessary when the code is changed to use a pointer instead of an array name.

sizeof and incomplete types[edit]

sizeof can only be applied to "completely" defined types. With arrays, this means that the dimensions of the array must be present in its declaration, and that the type of the elements must be completely defined. For structs and unions, this means that there must be a member list of completely defined types. For example, consider the following two source files:

/* file1.c */
int arr[10];
struct x {int one; int two;};
/* more code */
 
/* file2.c */
extern int arr[];
struct x;
/* more code */

Both files are perfectly legal C, and code in file1.c can apply sizeof to arr and struct x. However, it is illegal for code in file2.c to do this, because the definitions in file2.c are not complete. In the case of arr, the code does not specify the dimension of the array; without this information, the compiler has no way of knowing how many elements are in the array, and cannot calculate the array's overall size. Likewise, the compiler cannot calculate the size of struct x because it does not know what members it is made up of, and therefore cannot calculate the sum of the sizes of the structure's members (and padding). If the programmer provided the size of the array in its declaration in file2.c, or completed the definition of struct x by supplying a member list, this would allow the application of sizeof to arr or struct x in that source file.

sizeof and object members[edit]

C++11 introduced the possibility to apply sizeof parameter to specific members of a class without the necessity of instantiate the object to achieve this.[3] By example:

#include <iostream>
 
using namespace std;
 
struct foo
{
  int a;
  int b;
};
 
int main()
{
  cout << sizeof foo::a << endl << sizeof(foo) << endl;
  return 0;
}

This yields in most platforms:

4
8

sizeof... and variadic template packs[edit]

C++11 introduced variadic templates; the keyword sizeof followed by ellipsis returns the number of elements in a parameter pack.

template <typename... Args>
void print_size(Args... args) 
{
  cout << sizeof...(args) << endl;
}
 
int main( void ) 
{
  print_size(); // outputs 0
  print_size("Is the answer", 42, true); // outputs 3
}

Implementation[edit]

It is the responsibility of compilers to implement the sizeof operator correctly for each target platform. In many cases, there will be an official Application Binary Interface (ABI) document for the platform, specifying formats, padding, and alignment for the data types, to which the compiler must conform. In most cases, sizeof is a compile-time operator, which means that during compilation sizeof expressions get replaced by constant result-values. However, sizeof applied to a variable length array, introduced in C99, requires computation during program execution.

Structure padding[edit]

To calculate the size of any object type, the compiler must take into account any address alignment that may be needed to meet efficiency or architectural constraints. Many computer architectures do not support multiple-byte access starting at any byte address that is not a multiple of the word size, and even when the architecture allows it, usually the processor can fetch a word-aligned object faster than it can fetch an object that straddles multiple words in memory.[4] Therefore, compilers usually align data structures to at least a word alignment boundary, and also align individual members to their respective alignment boundaries. In the following example, the structure student is likely to be aligned on a word boundary, which is also where the member grade begins, and the member age is likely to start at the next word address. The compiler accomplishes the latter by inserting unused "padding" bytes between members as needed to satisfy the alignment requirements. There may also be padding at the end of a structure to ensure proper alignment in case the structure is ever used as an element of an array.

Thus, the aggregate size of a structure in C can be greater than the sum of the sizes of its individual members. For example, on many systems the following code will print 8:

struct student{
  char grade; /* char is 1 byte long */
  int age; /* int is 4 bytes long */
};
 
printf("%zu", sizeof (struct student));

See also[edit]

References[edit]

  1. ^ "C99 standard (ISO/IEC9899)". ISO/IEC. 7 September 2007. 6.5.3.4.3, p. 80. Retrieved 31 October 2010. 
  2. ^ "WG14/N1124 Committee Draft ISO/IEC 9899". 6 May 2005. 6 May 2005. 6.5.3.4 The sizeof operator. 
  3. ^ http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2253.html
  4. ^ Rentzsch, Jonathan (8 February 2005). "Data alignment: Straighten up and fly right". www.ibm.com. Retrieved 29 September 2014.