# Species diversity

Species diversity is the number of different species that are represented in a given community (a dataset). The effective number of species refers to the number of equally abundant species needed to obtain the same mean proportional species abundance as that observed in the dataset of interest (where all species may not be equally abundant). Species diversity consists of two components: species richness and species evenness. Species richness is a simple count of species, whereas species evenness quantifies how equal the abundances of the species are.[1][2][3]

## Calculation of diversity

Species diversity in a dataset can be calculated by first taking the weighted average of species proportional abundances in the dataset, and then taking the inverse of this. The equation is:[1][2][3]

${}^q\!D={1 \over \sqrt[q-1]{{\sum_{i=1}^S p_i p_i^{q-1}}}}$

The denominator equals mean proportional species abundance in the dataset as calculated with the weighted generalized mean with exponent q - 1. In the equation, S is the total number of species (species richness) in the dataset, and the proportional abundance of the ith species is $p_{i}$. The proportional abundances themselves are used as weights. The equation is often written in the equivalent form:

${}^q\!D=\left ( {\sum_{i=1}^S p_i^q} \right )^{1/(1-q)}$

The value of q defines which kind of mean is used. q = 0 corresponds to the weighted harmonic mean, which is 1/S because the $p_{i}$ values cancel out. q = 1 is undefined, except that the limit as q approaches 1 is well defined:

$\lim_{q \rightarrow 1} {}^q\!D = \exp\left(-\sum_{i=1}^S p_i \ln p_i\right)$

q = 2 corresponds to the arithmetic mean. As q approaches infinity, the generalized mean approaches the maximum $p_{i}$ value. In practice, q modifies species weighting, such that increasing q increases the weight given to the most abundant species, and fewer equally abundant species are hence needed to reach mean proportional abundance. Consequently, large values of q lead to smaller species diversity than small values of q for the same dataset. If all species are equally abundant in the dataset, changing the value of q has no effect, but species diversity at any value of q equals species richness.

Negative values of q are not used, because then the effective number of species (diversity) would exceed the actual number of species (richness). As q approaches negative infinity, the generalized mean approaches the minimum $p_{i}$ value. In many real datasets, the least abundant species is represented by a single individual, and then the effective number of species would equal the number of individuals in the dataset.[2][3]

The same equation can be used to calculate the diversity in relation to any classification, not only species. If the individuals are classified into genera or functional types, $p_{i}$ represents the proportional abundance of the ith genus or functional type, and qD equals genus diversity or functional type diversity, respectively.

## Diversity indices

Often researchers have used the values given by one or more diversity indices to quantify species diversity. Such indices include species richness, the Shannon index, the Simpson index, and the complement of the Simpson index (also known as the Gini-Simpson index).[4][5][6]

When interpreted in ecological terms, each one of these indices corresponds to a different thing, and their values are therefore not directly comparable. Species richness quantifies the actual rather than effective number of species. The Shannon index equals log(qD), and in practice quantifies the uncertainty in the species identity of an individual that is taken at random from the dataset. The Simpson index equals 1/qD and quantifies the probability that two individuals taken at random from the dataset (with replacement of the first individual before taking the second) represent the same species. The Gini-Simpson index equals 1 - 1/qD and quantifies the probability that the two randomly taken individuals represent different species.[1][2][3][6][7]

## Sampling considerations

Depending on the purposes of quantifying species diversity, the dataset used for the calculations can be obtained in different ways. Although species diversity can be calculated for any dataset where individuals have been identified to species, meaningful ecological interpretations require that the dataset is appropriate for the questions at hand. In practice, the interest is usually in the species diversity of areas so large that not all individuals in them can be observed and identified to species, but a sample of the relevant individuals has to be obtained. Extrapolation from the sample to the underlying population of interest is not straightforward, because the species diversity of the available sample generally gives an underestimation of the species diversity in the entire population. Applying different sampling methods will lead to different sets of individuals being observed for the same area of interest, and the species diversity of each set may be different. When a new individual is added to a dataset, it may introduce a species that was not yet represented. How much this increases species diversity depends on the value of q: when q = 0, each new actual species causes species diversity to increase by one effective species, but when q is large, adding a rare species to a dataset has little effect on its species diversity.[8]

In general, sets with many individuals can be expected to have higher species diversity than sets with fewer individuals. When species diversity values are compared among sets, sampling efforts need to be standardised in an appropriate way for the comparisons to yield ecologically meaningful results. Resampling methods can be used to bring samples of different sizes to a common footing.[9] Species accumulation curves and the number of species only represented by one or a few individuals can be used to help in estimating how representative the available sample is of the population from which it was drawn.[10][11]

## Trends in species diversity

The observed species diversity is affected not only by the number of individuals but also by the heterogeneity of the sample. If individuals are drawn from different environmental conditions (or different habitats), the species diversity of the resulting set can be expected to be higher than if all individuals are drawn from a similar environment. Increasing the area sampled increases observed species diversity both because more individuals get included in the sample and because large areas are environmentally more heterogeneous than small areas.