In ecology, the species discovery curve or species accumulation curve is a graph recording the cumulative number of species of living things recorded in a particular environment as a function of the cumulative effort expended searching for them (usually measured in person-hours). It is related to, but not identical with, the species-area curve.
The species discovery curve will necessarily be increasing, and will normally be negatively accelerated (that is, its rate of increase will slow down). Plotting the curve gives a way of estimating the number of additional species that will be discovered with further effort. This is usually done by fitting some kind of functional form to the curve, either by eye or by using non-linear regression techniques. Commonly used functional forms include the logarithmic function and the negative exponential function. The advantage of the negative exponential function is that it tends to an asymptote which equals the number of species that would be discovered if infinite effort is expended. However, some theoretical approaches imply that the logarithmic curve may be more appropriate, implying that though species discovery will slow down with increasing effort, it will never entirely cease, so there is no asymptote, and if infinite effort was expended, an infinite number of species would be discovered. An example in which one would not expect the function to asymptote is in the study of genetic sequences where new mutations and sequencing errors may lead to infinite variants.
The first theoretical investigation of the species-discovery process was in a classic paper by Fisher, Corbet and Williams (1943), which was based on a large collection of butterflies made in Malaya. Theoretical statistical work on the problem continues, see for example the recent paper by Chao and Shen (2004). The theory is linked to that of Zipf's law.
The same approach is used in many other fields. For example, in ethology, it can be applied to the number of distinct fixed action patterns that will be discovered as a function of cumulative effort studying the behaviour of a species of animal; in molecular genetics it is now being applied to the number of distinct genes that are discovered; and in literary studies, it can be used to estimate the total vocabulary of a writer from the given sample of his or her recorded works (see Efron & Thisted, 1976).