Information theory, developed by Claude E. Shannon in 1948, defines the notion of channel capacity and provides a mathematical model by which one can compute it. The key result states that the capacity of the channel, as defined above, is given by the maximum of the mutual information between the input and output of the channel, where the maximization is with respect to the input distribution. 
The notion of channel capacity has been central to the development of modern wireline and wireless communication systems, with the advent of novel error correction coding mechanisms that have resulted in achieving performance very close to the limits promised by channel capacity.
Channel capacity is additive over independent channels. It means that using two independent channels in a combined manner provides the same theoretical capacity as using them independently.
More formally, let and be two independent channels modelled as above; having an input alphabet and an output alphabet . Idem for .
We define the product channel as
This theorem states:
We first show that .
Let and be two independent random variables. Let be a random variable corresponding to the output of through the channel , and for through .
By definition .
Since and are independent, as well as and , is independent of . We can apply the following property of mutual information:
For now we only need to find a distribution such that . In fact, and , two probability distributions for and achieving and , suffice:
Now let us show that .
Let be some distribution for the channel defining and the corresponding output . Let be the alphabet of , for , and analogously and .
If G is an undirected graph, it can be used to define a communications channel in which the symbols are the graph vertices, and two codewords may be confused with each other if their symbols in each position are equal or adjacent. The computational complexity of finding the Shannon capacity of such a channel remains open, but it can be upper bounded by another important graph invariant, the Lovász number.
The noisy-channel coding theorem states that for any error probability ε > 0 and for any transmission rateR less than the channel capacity C, there is an encoding and decoding scheme transmitting data at rate R whose error probability is less than ε, for a sufficiently large block length. Also, for any rate greater than the channel capacity, the probability of error at the receiver goes to 0.5 as the block length goes to infinity.
C is measured in bits per second if the logarithm is taken in base 2, or nats per second if the natural logarithm is used, assuming B is in hertz; the signal and noise powers S and N are expressed in a linear power unit (like watts or volts2). Since S/N figures are often cited in dB, a conversion may be needed. For example, a signal-to-noise ratio of 30 dB corresponds to a linear power ratio of .
In a slow-fading channel, where the coherence time is greater than the latency requirement, there is no definite capacity as the maximum rate of reliable communications supported by the channel, , depends on the random channel gain , which is unknown to the transmitter. If the transmitter encodes data at rate [bits/s/Hz], there is a non-zero probability that the decoding error probability cannot be made arbitrarily small,
in which case the system is said to be in outage. With a non-zero probability that the channel is in deep fade, the capacity of the slow-fading channel in strict sense is zero. However, it is possible to determine the largest value of such that the outage probability is less than . This value is known as the -outage capacity.
In a fast-fading channel, where the latency requirement is greater than the coherence time and the codeword length spans many coherence periods, one can average over many independent channel fades by coding over a large number of coherence time intervals. Thus, it is possible to achieve a reliable rate of communication of [bits/s/Hz] and it is meaningful to speak of this value as the capacity of the fast-fading channel.