Network Abstraction Layer

The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "non conversational" (storage, broadcast, or streaming) applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards.^[1]

Introduction[edit]

An increasing number of services and growing popularity of high definition TV are creating greater needs for higher coding efficiency. Moreover, other transmission media such as cable modem, xDSL, or UMTS offer much lower data rates than broadcast channels, and enhanced coding efficiency can enable the transmission of more video channels or higher quality video representations within existing digital transmission capacities. Video coding for telecommunication applications has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery. Throughout this evolution, continued efforts have been made to maximize coding efficiency while dealing with the diversification of network types and their characteristic formatting and loss/error robustness requirements.

The H.264/AVC and HEVC standards are designed for technical solutions including areas like broadcasting (over cable, satellite, cable modem, DSL, terrestrial, etc.) interactive or serial storage on optical and magnetic devices, conversational services, video-on-demand or multimedia streaming, multimedia messaging services, etc. Moreover, new applications may be deployed over existing and future networks. This raises the question about how to handle this variety of applications and networks. To address this need for flexibility and customizability, the design covers a NAL that formats the Video Coding Layer (VCL) representation of the video and provides header information in a manner appropriate for conveyance by a variety of transport layers or storage media.

The NAL is designed in order to provide "network friendliness" to enable simple and effective customization of the use of VCL for a broad variety of systems. The NAL facilitates the ability to map VCL data to transport layers such as:^[2]

RTP/IP for any kind of real-time wire-line and wireless Internet services.^[2]
File formats, e.g., ISO MP4 for storage and MMS.^[2]
H.32X for wireline and wireless conversational services.^[2]
MPEG-2 systems for broadcasting services, etc.^[2]

The full degree of customization of the video content to fit the needs of each particular application is outside the scope of the video coding standardization effort, but the design of the NAL anticipates a variety of such mappings. Some key concepts of the NAL are NAL units, byte stream, and packet formats uses of NAL units, parameter sets, and access units. A short description of these concepts is given below.

NAL units[edit]

The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The first byte of each H.264/AVC NAL unit is a header byte that contains an indication of the type of data in the NAL unit. For HEVC the header was extended to two bytes. All the remaining bytes contain payload data of the type indicated by the header. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream.

NAL Units in Byte-Stream Format Use[edit]

Some systems require delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits within which the locations of NAL unit boundaries need to be identifiable from patterns within the coded data itself. For use in such systems, the H.264/AVC and HEVC specifications define a byte stream format. In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix. The boundaries of the NAL unit can then be identified by searching the coded data for the unique start code prefix pattern. The use of emulation prevention bytes guarantees that start code prefixes are unique identifiers of the start of a new NAL unit. A small amount of additional data (one byte per video picture) is also added to allow decoders that operate in systems that provide streams of bits without alignment to byte boundaries to recover the necessary alignment from the data in the stream. Additional data can also be inserted in the byte stream format that allows expansion of the amount of data to be sent and can aid in achieving more rapid byte alignment recovery, if desired.

NAL Units in Packet-Transport System Use[edit]

In other systems (e.g., IP/RTP systems), the coded data is carried in packets that are framed by the system transport protocol, and identification of the boundaries of NAL units within the packets can be established without use of start code prefix patterns. In such systems, the inclusion of start code prefixes in the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes.

VCL and Non-VCL NAL Units[edit]

NAL units are classified into VCL and non-VCL NAL units. The VCL NAL units contain the data that represents the values of the samples in the video pictures, and the non-VCL NAL units contain any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures).

Parameter Sets[edit]

A parameter set is supposed to contain information that is expected to rarely change and offers the decoding of a large number of VCL NAL units. There are two types of parameter sets:

sequence parameter sets (SPS), which apply to a series of consecutive coded video pictures called a coded video sequence
picture parameter sets (PPS), which apply to the decoding of one or more individual pictures within a coded video sequence

The sequence and picture parameter-set mechanism decouples the transmission of infrequently changing information from the transmission of coded representations of the values of the samples in the video pictures. Each VCL NAL unit contains an identifier that refers to the content of the relevant picture parameter set and each picture parameter set contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that information within each VCL NAL unit. Sequence and picture parameter sets can be sent well ahead of the VCL NAL units that they apply to, and can be repeated to provide robustness against data loss. In some applications, parameter sets may be sent within the channel that carries the VCL NAL units (termed "in-band" transmission). In other applications, it can be advantageous to convey the parameter sets "out-of-band" using a more reliable transport mechanism than the video channel itself.

Parameter Set use with reliable "out-of-band" parameter set exchange

Access Units[edit]

A set of NAL units in a specified form is referred to as an access unit. The decoding of each access unit results in one decoded picture. Each access unit contains a set of VCL NAL units that together compose a primary coded picture. It may also be prefixed with an access unit delimiter to aid in locating the start of the access unit. Some supplemental enhancement information containing data such as picture timing information may also precede the primary coded picture. The primary coded picture consists of a set of VCL NAL units consisting of slices or slice data partitions that represent the samples of the video picture. Following the primary coded picture may be some additional VCL NAL units that contain redundant representations of areas of the same video picture. These are referred to as redundant coded pictures, and are available for use by a decoder in recovering from loss or corruption of the data in the primary coded pictures. Decoders are not required to decode redundant coded pictures if they are present. Finally, if the coded picture is the last picture of a coded video sequence (a sequence of pictures that is independently decodable and uses only one sequence parameter set), an end of sequence NAL unit may be present to indicate the end of the sequence; and if the coded picture is the last coded picture in the entire NAL unit stream, an end of stream NAL unit may be present to indicate that the stream is ending.

Coded Video Sequences[edit]

A coded video sequence consists of a series of access units that are sequential in the NAL unit stream and use only one sequence parameter set. Each coded video sequence can be decoded independently of any other coded video sequence, given the necessary parameter set information, which may be conveyed "in-band" or "out-of-band". At the beginning of a coded video sequence is an instantaneous decoding refresh (IDR) access unit. An IDR access unit contains an intra picture which is a coded picture that can be decoded without decoding any previous pictures in the NAL unit stream, and the presence of an IDR access unit indicates that no subsequent picture in the stream will require reference to pictures prior to the intra picture it contains in order to be decoded. A NAL unit stream may contain one or more coded video sequence.

References[edit]

^ "Network Abstraction Layer (NAL) Architecture". intervalzero.com. Interval Zero. 1996-01-01. Retrieved 2022-08-17.
^ ^a ^b ^c ^d ^e Jack, Keith (2007). "Chapter 14: MPEG-4 and H.264". Video Demystified (5 ed.). Newnes. p. 762. ISBN 978-0-7506-8395-1.

Sources

[1] "Network Abstraction Layer (NAL) Architecture". intervalzero.com. Interval Zero. 1996-01-01. Retrieved 2022-08-17.

[Jack_762-2] Jack, Keith (2007). "Chapter 14: MPEG-4 and H.264". Video Demystified (5 ed.). Newnes. p. 762. ISBN 978-0-7506-8395-1.

[1]

[2]