= Universally unique identifier =

Universally unique identifier
- Acronym: UUID
- Organisation: Open Software Foundation (OSF), ISO/IEC, Internet Engineering Task Force (IETF)
- Digits: 32

A universally unique identifier (UUID) is a 128-bit number used to identify information in computer systems. The term globally unique identifier (GUID) is also used, typically in software created by Microsoft.

When generated according to the standards, UUIDs are, for practical purposes, unique. Their uniqueness does not depend on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicated is not zero, it is close enough to zero to be negligible. Thus, anyone can create large numbers of UUIDs and use them as identifiers with near certainty that they do not duplicate UUIDs that have been, or will be, created by others, with the only coordination required being conformance with the UUID standards. Information labeled with UUIDs by independent parties can therefore coexist in the same databases or channels, with a negligible probability of duplication.

Adoption of UUIDs is widespread, with many computing platforms providing support for generating them and for parsing their textual representation.

== History ==
In the 1980s, Apollo Computer originally used UUIDs in the Network Computing System (NCS). Later, the Open Software Foundation (OSF) used UUIDs for their Distributed Computing Environment (DCE). The design of the DCE UUIDs was partly based on the NCS UUIDs, whose design was in turn inspired by the (64-bit) unique identifiers defined and used pervasively in Domain/OS, an operating system designed by Apollo Computer. Later in the early 1990s, the Microsoft Windows platforms adopted the DCE design as "Globally Unique IDentifiers" (GUIDs).

== Standards ==
UUIDs were standardized by various bodies, starting with the Open Software Foundation in 1996-1997, as part of the Distributed Computing Environment (DCE). The definition was also documented in 1996 as part of ISO/IEC 11578:1996 "Information technology – Open Systems Interconnection – Remote Procedure Call.

In July 2005, the Internet Engineering Task Force (IETF) published the Standards-Track RFC 4122. RFC 4122 also registered a URN namespace for UUIDs. The ITU had also standardized UUIDs, based on the previous standards and early versions of RFC 4122, in ITU-T Rec. X.667 ISO/IEC 9834-8. This is technically equivalent to RFC 4122.

In May 2024, RFC 9562 was published, introducing 3 new "versions", clarifying some ambiguities, and superseding RFC 4122. This applies only to "variant 1" of the RFC 4122 UUID definition, with the other variants being out of scope.

== Format ==
A UUID is a 128-bit number. The meaning of the bits is determined by the variant, of which four are defined. The two most common variants further define eight versions.

=== Variants ===
The variant field is in a variable number of the most-significant bits of the ninth byte. It indicates the format of the UUID. The following variants are defined:

- The Apollo NCS variant 0 (indicated by the one-bit pattern 0xxx_{2}) is for backwards compatibility with the now-obsolete Apollo Network Computing System 1.5 UUID format developed around 1988. The variant field of current UUIDs overlaps the address family octet in NCS UUIDs in such a way that any NCS UUIDs still in use have a 0 in the first bit of the variant field.
- The OSF DCE variant 1 (10_{2}) UUIDs are referred to as RFC 4122/DCE 1.1 UUIDs, or "Leach–Salz" UUIDs, after the authors of the original Internet Draft.
- The Microsoft COM/DCOM variant 2 (110_{2}) is characterized in the RFC as "reserved, Microsoft Corporation backward compatibility" and was used for early GUIDs on the Microsoft Windows platform. The main difference between this variant and variant 1, aside from the extra variant bit, is byte-ordering within the UUID. Current Microsoft tools do not generate this variant. Also, RFC 9562, which added versions 6, 7, and 8, states that the variants other than variant 1 are out of its scope though this is unlikely to result in interoperability problems in practice. The versions applicable to the legacy Microsoft variant 2 are therefore somewhat unclear, but likely include only versions 1, 3, and 4.
- Variant 3 (111_{2}) is reserved.

=== Versions ===
The OSF DCE and Microsoft COM/DCOM variants (1 & 2) have versions, indicated by the value of the high 4 bits of the 7th byte of the UUID. In textual representations of the UUID, this is the character after the second hyphen.

==== Versions 1 and 6 (date-time and MAC address) ====
Version 1 concatenates the 48-bit MAC address of the "node" (that is, the computer generating the UUID), with a 60-bit timestamp. On systems with 64-bit EUI-64 "MAC addresses", the least significant 48 bits are used. A 48-bit random number may also be used.

The timestamp is the number of 100-nanosecond intervals since midnight 15 October 1582 Coordinated Universal Time (UTC), the date on which the Gregorian calendar was first adopted. RFC 4122 states that the time value rolls over around 3400 AD, depending on the algorithm used, which implies that the 60-bit timestamp is a signed quantity. However some software, such as the libuuid library, treats the timestamp as unsigned, putting the rollover time in 5623 AD.

A 13-bit or 14-bit "uniquifying" clock sequence extends the timestamp in order to handle cases where the processor clock does not advance fast enough, or where there are multiple processors and UUID generators per node. When UUIDs are generated faster than the system clock could advance, the lower bits of the timestamp fields can be generated by incrementing it every time a UUID is being generated, to simulate a higher-resolution timestamp.

With each version 1 UUID corresponding to a single point in space (the node) and time (intervals and clock sequence), the chance of two properly generated version 1 UUIDs being unintentionally the same is practically nil. Since the time and clock sequence total 74 bits, 2^{74} (1.8, or 18 sextillion) version 1 UUIDs can be generated per node ID, at a maximal average rate of 163 billion per second per node ID.

The layout of a version 1 UUID is:
  - UUID Version 1 Record Layout**

| Name | Length (bytes) | Length (hex digits) | Contents |
| time_low | 4 | 8 | integer giving the low 32 bits of the time |
| time_mid | 2 | 4 | integer giving the middle 16 bits of the time |
| time_hi_and_version | 2 | 4 | 4-bit "version" in the most significant bits, followed by the high 12 bits of the time |
| clock_seq_hi_and_res clock_seq_low | 2 | 4 | 1 to 3-bit "variant" in the most significant bits, followed by the 13 to 15-bit clock sequence |
| node | 6 | 12 | the 48-bit node id |

Version 6 is the same as version 1 except for the order of the timestamp bits. In Version 6, timestamp bits are ordered from most significant to least significant. This allows systems to sort version 6 UUIDs in order of creation simply by sorting them lexically.

==== Version 2 (date-time and MAC address, DCE security version) ====
RFC 9562 reserves version 2 for "DCE security" UUIDs; but it does not provide any details. For this reason, many UUID implementations omit version 2. However, the specification of version 2 UUIDs is provided by the DCE 1.1 Authentication and Security Services specification.

Version 2 UUIDs are similar to version 1, except that the least significant 8 bits of the clock sequence are replaced by a "local domain" number, and the least significant 32 bits of the timestamp are replaced by an integer identifier meaningful within the specified local domain. On POSIX systems, local-domain numbers 0 and 1 are for user ids (UIDs) and group ids (GIDs) respectively, and other local-domain numbers are site-defined. On non-POSIX systems, all local domain numbers are site-defined.

The ability to include a 40-bit domain/identifier in the UUID comes with a tradeoff. On the one hand, 40 bits allow about 1 trillion domain/identifier values per node ID. On the other hand, with the clock value truncated to the 28 most significant bits, compared to 60 bits in version 1, the clock in a version 2 UUID will "tick" only once every 429.49 seconds, a little more than 7 minutes, as opposed to every 100 nanoseconds for version 1. And with a clock sequence of only 6 bits, compared to 14 bits in version 1, only 64 unique UUIDs per node/domain/identifier can be generated per 7-minute clock tick, compared to 16,384 clock sequence values for version 1.

==== Versions 3 and 5 (namespace name-based) ====
Version 3 and version 5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA-1. This is useful when systems need to generate the same UUID based on a set of other names or identifiers, without coordination.

The namespace identifier is itself a UUID. The specification provides constant UUIDs to represent the namespaces for URLs, fully qualified domain names, object identifiers, and X.500 distinguished names; but any desired UUID may be used as a namespace designator.

To determine the version 3 UUID corresponding to a given namespace and name, the UUID of the namespace is transformed to a string of bytes, concatenated with the input name, then hashed with MD5, yielding 128 bits. Then 6 or 7 bits are replaced by fixed values, the 4-bit version (e.g. 0011_{2} for version 3), and the 2- or 3-bit UUID "variant" (e.g. 10_{2} indicating an RFC 9562 UUIDs, or 110_{2} indicating a legacy Microsoft GUID). Since 6 or 7 bits are thus predetermined, only 121 or 122 bits contribute to the uniqueness of the UUID.

Version 5 UUIDs are similar, but SHA-1 is used instead of MD5. Since SHA-1 generates 160-bit digests, the digest is truncated to 128 bits before the version and variant bits are replaced.

Version 3 and version 5 UUIDs have the property that the same namespace and name will map to the same UUID. However, neither the namespace nor name can be determined from the UUID, even if one of them is specified, except by brute-force search. RFC 4122 recommends version 5 (SHA-1) over version 3 (MD5). This is because it is believed that MD5 is more prone to collisions than SHA-1, though MD5 is somewhat faster. The RFC warns against use of UUIDs of any version as security capabilities.

==== Version 4 (random) ====
A version 4 UUID is randomly generated. As in other UUIDs, 4 bits are used to indicate version 4, and 2 or 3 bits to indicate the variant (10_{2} or 110_{2} for variants 1 and 2 respectively). Thus, for variant 1 (that is, most UUIDs) a random version 4 UUID will have 6 predetermined variant and version bits, leaving 122 bits for the randomly generated part, for a total of 2^{122}, or 5.3 (5.3 undecillion) possible version 4 variant-1 UUIDs. There are half as many possible version 4, variant 2 UUIDs (legacy GUIDs) because there is one less random bit available, 3 bits being consumed for the variant.

==== Version 7 (timestamp and random) ====
Version 7 UUIDs are intended as monotonically ascending creation-time-ordered keys in large databases and distributed systems, contributing to locality and performance. Unlike some other UUID versions, they do not incorporate MAC addresses, and can steer clear of the privacy issues associated with them. They are constructed as follows:
- a 48-bit big-endian unsigned Unix Epoch timestamp in milliseconds.
- the 4-bit version, set to 7.
- 12 bits of a construct to provide increased precision or monotonicity, or pseudorandom data to provide uniqueness.
- the 2-bit variant, set to 10.
- 62 bits of a construct to provide increased precision or monotonicity, or pseudorandom data to provide uniqueness.

The optional monotonicity constructs include such items as an increased precision sub-millisecond timestamp fraction, or a seeded counter.

==== Version 8 (custom) ====
In a custom UUID, the version field is 8, and the variant bits must be 10, totalling 6 bits. The remaining 122 bits are not specified. Thus, uniqueness will be implementation-specific and, according to RFC 9562, must not be assumed.

==== Use of MAC Addresses ====
In contrast to the other UUID versions, versions 1, 2, and 6 are based on MAC addresses from network cards, relying for their uniqueness in part on an identifier issued by a central registration authority, namely the Organizationally Unique Identifier (OUI) part of the MAC address, which is issued by the IEEE to manufacturers of networking equipment. The uniqueness of the UUIDs based on network-card MAC addresses also depends on network-card manufacturers properly assigning unique MAC addresses to their cards, which like other manufacturing processes is subject to error. MAC addresses may not come from network cards. For example, virtual machines receive a MAC address from a range that is configurable in the hypervisor, and some operating systems permit the end user to customise the MAC address, notably OpenWRT. When a device has an EUI-64 64-bit "MAC address", using the least significant 48 bits of it, as recommended by the RFC, may result in the node ID part of the UUID being duplicated. Thus, node IDs based on MAC addresses may not be globally unique.

Usage of the node's network card MAC address for the node ID does often mean that version 1, 2, and 6 UUIDs can be tracked back to the computer that created them. Documents can sometimes be traced to the computers where they were created or edited through UUIDs embedded into them by word processing software. This privacy hole was used when locating the creator of the Melissa virus.

RFC 9562 does allow the MAC address in a version 1, 2 or 6 UUID to be replaced by a random 48-bit node ID, either because the node does not have a MAC address, or because it is not desirable to include it. In that case, the RFC requires that the least significant bit of the first octet of the node ID should be set to 1. This corresponds to the multicast bit in MAC addresses, and setting it serves to differentiate UUIDs where the node ID is randomly generated from UUIDs based on MAC addresses from network cards, which typically have unicast MAC addresses.

== Special values ==
The "nil" UUID is 00000000-0000-0000-0000-000000000000 (that is, all clear bits), which can be useful to express the concept of "no such value". The "max" UUID, sometimes also called the "omni" UUID, is FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF (that is, all set bits), which is reserved for the usage of expressing "end of UUID list".

== Encoding ==
=== Binary representation ===
Initially, Apollo Computer designed the UUID with the following wire format, very similar to version 1

  - Original Apollo Computer NCS UUID Format**

| Name | Offset | Length | Description |
| time_high | | | The first 6 octets are the number of four-microsecond (μs) units of time that have passed since 1980-01-01 00:00 UTC. The time 2^{48} × 4 μs after 1980 started was 2015-09-05 05:58:26.84262 UTC. Thus, the last time at which UUIDs could be generated in this original format was in 2015. |
| time_low | | | |
| reserved | | | These octets are reserved for future use. |
| family | | | This octet is an address family. |
| node | | | These octets are a host ID in the form allowed by the specified address family. |

Later, the UUID was extended by combining the legacy family field with the new variant field. Because the family field only had used the values ranging from 0 to 13 in the past, it was decided that a UUID with the most significant bit set to 0 was a legacy UUID. This gives the following table for the family group:

  - Family / variant field**

| MSB 0 | MSB 1 | MSB 2 | Legacy family field value range | In hex | Description |
| 0 | x | x | 0–127 (Only 0–13 are used) | 0x00–0x7f | The legacy Apollo NCS UUID |
| 1 | 0 | x | 128–191 | 0x80–0xbf | OSF DCE UUID |
| 1 | 1 | 0 | 192–223 | 0xc0–0xdf | Microsoft COM / DCOM UUID |
| 1 | 1 | 1 | 224–255 | 0xe0–0xff | Reserved for future definition |

The legacy Apollo NCS UUID has the format described in the previous table. The OSF DCE UUID variant is described in RFC 9562. The Microsoft COM / DCOM UUID has its variant described in the Microsoft documentation.

==== Endianness ====
When saving UUIDs to binary format, they are sequentially encoded in big-endian. For example, 00112233-4455-6677-8899-aabbccddeeff, a variant 1 UUID, is encoded as the bytes 00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff.

An exception to this are Microsoft's variant 2 UUIDs ("GUID"): historically used in COM/OLE libraries, they use a little-endian format, but appear mixed-endian with the first three components of the UUID as little-endian and last two big-endian. Microsoft's GUID structure defines the last eight bytes as an 8-byte array, which are serialized in ascending order, which makes the byte representation appear mixed-endian. For example, variant 2 UUID 00112233-4455-6677-8899-ccddeeffaabb is encoded as the bytes 33 22 11 00 55 44 77 66 88 99 cc dd ee ff aa bb.

=== Textual representation ===
In most cases, UUIDs are represented as hexadecimal values separated by hyphens. Most used is the 8-4-4-4-12 format, a string of 32 hexadecimal digits with four hyphens, xxxxxxxx-xxxx-vxxx-wxxxx-xxxxxxxxxxxx. The hyphens separate the version 1 fields but the same format is commonly used for all versions. Every hexadecimal digit represents 4 bits; v represents the version byte; and the high-order one to three bits of w are the variant. The Windows registry format is the same but wraps the UUID in {} braces.

Though they are still occasionally omitted, the format with hyphens was introduced with the newer variant system. Before that, the legacy Apollo format used a slightly different format 34dc23469000.0d.00.00.7c.5f.00.00.00. The first part is the time (time_high and time_low combined). The reserved field is skipped. The family field comes directly after the first dot, so in this case 0d (13 in decimal) for DDS (Data Distribution Service). The remaining parts, each separated with a dot, are the node bytes.

Lowercase hexadecimal digits are preferred. ITU-T Rec. X.667 requires lowercase on generation, but also requires the uppercase version to be accepted on input. Since UUIDs are 128-bit numbers, other formats are possible, and occasionally seen, such as decimal digits or binary.

RFC 9562 registers the "uuid" namespace. This makes it possible to make URNs out of UUIDs, like urn:uuid:550e8400-e29b-41d4-a716-446655440000. The normal 8-4-4-4-12 format is used for this. It is also possible to make a OID URN out of UUIDs, like urn:oid:2.25.113059749145936325402354257176981405696. In that case, the unsigned decimal format is used. The "uuid" URN is recommended over the "oid" URN.

== Collisions ==
A collision occurs when the same UUID is generated more than once and is assigned to different referents. In the case of standard version 1, 2, or 6 and some version 7 UUIDs using unique MAC addresses and/or timestamps, collisions can occur only as a result of error, such as manufacturing problems, skewed clocks, or software bugs.

In contrast, with UUID versions generated using processes such as random number generation or hashing, collisions can occur without error, due to chance. The probability of this is normally so small that it can be ignored, and can be computed precisely based on analysis of the birthday problem. For example, the number of random version 4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion, computed as follows:

 \approx 2.71 \times 10^{18}.</math>
}}

This number would be equivalent to generating 1 billion UUIDs per second for about 86 years. A file containing this many UUIDs, at 16 bytes per UUID, would be about 43.4 exabytes (37.7 EiB). The smallest number of version 4 UUIDs which must be generated for the probability of finding of at least one collision to be p is approximated by the formula

.</math>
}}

Thus, the probability to find a duplicate within 103 trillion properly-generated version 4 UUIDs is one in a billion.

== Uses ==
=== Filesystems ===
Several filesystem types (for example, ext4 and Btrfs) use a UUID to uniquely identify each filesystem to the operating system. (NTFS and FAT32 do not, utilising a shorter UID (Unique identifier) instead.)

Filesystem userspace tools, most of which are derived from the original implementation by Theodore Ts'o, therefore make use of UUIDs.

An /etc/fstab file might assign mount points based on these UUIDs (or a UID for a FAT32 EFI system partition (ESP)):

<syntaxhighlight lang="sh">
1. device-uuid mount-point fs-type options dump pass
UUID=b18e3b6c-ccb7-4308-b527-35e5e6ee2145 / btrfs defaults 0 0
UUID=103C-86D6 /efi vfat utf8 0 2
UUID=64f3cb6a-e70e-45e5-8b90-d86cddbab7bb swap swap defaults 0 0
UUID=eda746c6-1f1b-4cf1-9225-d8b0b46511cc /mnt/Stuff btrfs defaults 0 0
</syntaxhighlight>

=== Partition tables ===
The GUID Partition Table (GPT) uses UUIDs (called there "GUID"s) to identify partitions and partition types. Unique partition IDs are assigned locally by the operating system. Partition type IDs are well-known numbers, usually assigned by operating-system or hardware vendors.

=== Microsoft COM ===
There are several flavors of GUIDs used in Microsoft's Component Object Model (COM):

- – interface identifier; (The ones that are registered on a system are stored in the Windows Registry at )
- – class identifier; (Stored at ). In practice it is not entirely separate from the space, because remoting the interface can require a proxy/stub object which some toolsets used to create with a equal to the interface's .
- – type library identifier; (Stored at )
- – category identifier; (its presence on a class identifies it as belonging to certain class categories, listed at )

=== Databases ===
UUIDs are commonly used as a unique key in database tables. The function in Microsoft SQL Server version 4 Transact-SQL returns standard random version 4 UUIDs, while the function returns 128-bit identifiers similar to UUIDs which are committed to ascend in sequence until the next system reboot. The Oracle Database function does not return a standard GUID, despite the name. Instead, it returns a 16-byte 128-bit RAW value based on a host identifier and a process or thread identifier, somewhat similar to a GUID. PostgreSQL contains a datatype and can generate most versions of UUIDs through the use of functions from modules. MySQL provides a function, which generates standard version 1 UUIDs.

==== Combined Time-GUID ====
The random nature of standard UUIDs of versions 3, 4, and 5, and the ordering of the fields within standard versions 1 and 2 may create problems with database locality or performance when UUIDs are used as primary keys. For example, in 2002 Jimmy Nilsson reported a significant improvement in performance with Microsoft SQL Server when the version 4 UUIDs being used as keys were modified to include a non-random suffix based on system time. This so-called "COMB" (combined time-GUID) approach made the UUIDs significantly more likely to be duplicated, as Nilsson acknowledged, but Nilsson only required uniqueness within the application. By reordering and encoding version 1 and 2 UUIDs so that the timestamp comes first, insertion performance loss can be averted.

COMB-like arrangements of UUID payloads were eventually standardized in RFC 9562 as versions 6 and 7.

=== Other examples ===
UEFI and ACPI are examples that use GUID.

== See also ==
- Birthday attack
- Object identifier (OID)
- Uniform Resource Identifier (URI)
- Snowflake ID
