## Abstract

North Greenland Polar Eskimos are the only hunter–gatherer population, to our knowledge, who can offer precise genealogical records spanning several generations. This is the first report from Eskimos on two key parameters in population genetics, namely, generation time (*T*) and effective population size (*N*_{e}). The average mother–daughter and father–son intervals were 27 and 32 years, respectively, roughly similar to the previously published generation times obtained from recent agricultural societies across the world. To gain an insight for the generation time in our distant ancestors, we calculated maternal generation time for two wild chimpanzee populations. We also provide the first comparison among three distinct approaches (genealogy, variance and life table methods) for calculating *N*_{e}, which resulted in slightly differing values for the Eskimos. The ratio of the effective to the census population size is estimated as 0.6–0.7 for autosomal and X-chromosomal DNA, 0.7–0.9 for mitochondrial DNA and 0.5 for Y-chromosomal DNA. A simulation of alleles along the genealogy suggested that Y-chromosomal DNA may drift a little faster than mitochondrial DNA in this population, in contrast to agricultural Icelanders. Our values will be useful not only in prehistoric population inference but also in understanding the shaping of our genome today.

## 1. Introduction

The human species has existed for at least 150 000 years (White *et al*. 2003; McDougall *et al*. 2005), and for most of this time has led a Palaeolithic hunter–gatherer lifestyle until the start of agriculture in the Neolithic Age within the last 10 000 years. Many questions that prehistorians ask refer to events at Palaeolithic time depths, such as the initial settling of the continents by hunter–gatherers or the demographic effects of the Last Ice Age. When using genetic variation in living humans to reconstruct these prehistoric events, it is therefore crucial to apply demographic parameters (such as generation time and differences between the sexes in reproductive success) which are realistic for ancient hunter–gatherers. Our best proxies for ancient hunter–gatherers are the few remaining hunter–gatherer groups who have retained their lifestyle into modern times. Here, however, a difficulty arises because hunter–gatherers do not usually keep accessible genealogical records. It is only in very rare cases that outside observers have written down genealogical information for such groups, but fortunately this does apply to the Polar Eskimo population in Greenland where a Danish medical family compiled extensive genealogical records between 1805 and 1974 (Gilberg *et al*. 1978).

To our knowledge, this is the first report from Eskimos on two key parameters in population research and in genome analyses, namely, generation time (*T*) and effective population size (*N*_{e}). These parameter values from hunter–gatherers are increasingly valuable because during the last 20 years, the study of human prehistory has been accelerated by the accumulation of genetic data in living humans and by the introduction of new statistical methods (Harpending & Rogers 2000; Beaumont 2004). The age of the most recent common ancestor for non-recombining loci (e.g. Vigilant *et al*. 1991) or that of the peopling of a particular region (e.g. Forster *et al*. 1996) is estimated from the number of mutations found in a DNA sample. The past population size is inferred from the variation found in a sample (e.g. Murray-McIntosh *et al*. 1998). Some kinds of population genetic models are needed when such inference is carried out. The accuracy of inference depends on the values of the two parameters used in the models.

Generation time (*T*), or the intergeneration interval, is one of the most important parameters in human population genetics. Since genetic dating methods give us answers primarily in the form of the number of generations, we need the generation time to translate generations to years. In many applications, it has been assumed to be 20–25 years for prehistoric humans (Fenner 2005). However, recent studies on pre-industrial Europeans suggested that the generation time is longer than previously considered (Denmark/Germany: Forster 1996; Canada: Tremblay & Vézina 2000; Iceland: Helgason *et al*. 2003). Moreover, we have to pay attention to the variation among peoples living in different locations. Some demographic factors such as high mortality at young ages shorten the interval.

The effective population size *N*_{e} is another important parameter in population genetics. Genetic variation in a population is lost more quickly by stochastic processes when the size of the population is smaller. However, the speed of the loss is not determined by its census size *N* (including individuals of both sexes and all ages) but by its effective size *N*_{e} (where factors such as unequal family sizes are involved). Genetic variation is lost quickly (and hence its *N*_{e} is small) when there is a large variance in the offspring number (family size) and/or a biased operational sex ratio. The relationship between the census size *N* and the effective size *N*_{e} is critical when we apply population genetic models to real populations. For example, when we infer past population size from present genetic data, most models provide us with answers expressed as effective sizes. We need to translate them into actual sizes. In a review of the ratio of effective to actual population size (*N*_{e}/*N*) in wildlife (Frankham 1995), most estimates of *N*_{e}/*N* in humans fall in a range between 0.3 and 0.9 for autosomal loci. The variation may result from differences in demographic and social conditions.

The objective of this study is to estimate the values of these two important parameters, generation time and the ratio of the effective population size to census size, in one of the few extensively documented hunter–gatherer populations, the Polar Eskimos in North Greenland. Eskimos and previously studied Europeans differ considerably in their environment, physical characteristics and culture. If any of these factors made a great impact on generation time or the effective population size then we should see it in our Eskimo samples. On the other hand, if *T* and *N*_{e}/*N* were similar between Eskimos and Europeans, we are more justified in using modern *T* and *N*_{e}/*N* estimates for prehistoric peoples. Polar Eskimo genealogies have been recorded remarkably well since the mid-nineteenth century (Gilberg *et al*. 1978). They provide us with an extremely rare opportunity to measure these two parameters in a hunter–gatherer population.

## 2. Material and methods

### (a) Polar Eskimo database

The subjects of this study are the Polar Eskimos in the Thule District of North Greenland. A good summary of their demography based on censuses became available in 1976 (Gilberg 1976). Europeans came into contact with the Thule Eskimos in 1818, and since 1909 there has been continuous contact. The population size in the early nineteenth century was estimated to be approximately 150 or less. It was approximately 200–300 in the late nineteenth and the early twentieth century, and increased to 400 in 1959.

Genealogical information on the Polar Eskimos was collected and published by a Danish resident physician and his family (Gilberg *et al*. 1978). The database includes 1614 individuals who were born between *ca* 1805 and 1974. Edwards (1992*a*) studied the basic structure of their genealogy from a population genetic viewpoint. Following his definition, we regard 225 individuals whose parents do not appear in the database as founders. For generation time, we restrict the analysis to those who finished their reproduction (females: 49 years and above and males: 68 years and above), that is, parents born in or after 1926 (female) or 1907 (male) are disregarded. For lifetime offspring number, we analyse the data from the cohorts born between 1820 and 1906 (inclusive). We include all the individuals whose birth year is known in the compilation of life tables that summarize the mortality and reproduction rates at each age class.

### (b) Comparison with ape generation times

Although it is impossible to know the generation time in our distant ancestors, it may be helpful to compare the generation time of wild chimpanzees with ours. Therefore, we carried out a rough calculation of generation time using the mortality and fertility data published from long-term research sites (Boesch & Boesch-Achermann 2000; Nishida *et al*. 2003).

### (c) Estimation of effective population size

Effective population size has been traditionally defined in terms of either inbreeding or loss of genetic variation (Crow & Kimura 1970). We calculated the effective sizes for autosomal loci (*N*_{eA}), for mitochondrial loci (*N*_{eMit}), for Y-linked loci (*N*_{eY}) and for X-linked loci (*N*_{eX}). If we assume an ideal population where the offspring number follows a Poisson distribution and the sex ratio *N*_{m}/*N*_{f}=1 apply, the expected value of *N*_{eA} is equal to the census size *N*. *N*_{eMit} and *N*_{eY} are expected to be one-quarter of *N*_{eA}, while *N*_{eX} is expected to be three-quarters of *N*_{eA} (Storz *et al*. 2001).

The strength of this study lies in the fact that we know the actual genealogy of the population in addition to the birth records. This enables us to calculate *N*_{e} directly by simulating changes in the allele frequency through the genealogy (gene-flow simulation: Edwards 1968, 1992*b*; allele-dropping simulation: Heyer 1999). Two common alleles were assigned to the founders (50 : 50) and the frequency of the alleles was traced over time. This simulation was repeated 100 000 times, and the variance of the allele frequency among iterations was calculated. *N*_{e} was estimated from the increase of the variance between 1850 and 1940. We restricted the analysis to this period when the genealogical data were considered reliable enough. The increase during the period was then adjusted to the increase per generation time.

For comparison, we calculated *N*_{e} using two other methods that are commonly applied when only limited demographic information is available. First, we estimated *N*_{e} from the actual mean and variance of the number of offspring (family size). We counted the number of offspring for every individual born between 1820 and 1906. We then calculated the effective number for autosomal loci *N*_{eA} and X-chromosomal DNA *N*_{eX} from the variance of their offspring number (Hill 1972, 1979; Pollak 1990)where *M* and *F* are the male and female cohort sizes, respectively. The mean intergeneration interval *L* is calculated as *L*=(*L*_{mm}+*L*_{mf}+*L*_{fm}+*L*_{ff})/4 for autosomal loci or *L*=(*L*_{mf}+*L*_{fm}+*L*_{ff})/3 for X-chromosomal loci, where *L*_{mm}, *L*_{mf}, *L*_{fm} and *L*_{ff} are the father–son, father–daughter, mother–son and mother–daughter intervals, respectively. and represent the variance of the number of male and female offspring from each male parent, respectively, and cov(*mm*, *mf*) the covariance between the number of male and female offspring from each male parent. For haploids, Hill (1979) gavewhere *B* is the cohort size; *L* is the intergeneration interval; and is the variance of the number of offspring. As the haploid effective size is twice the diploid effective size

Although Hill's formulae for the estimation of *N*_{e} are convenient, they rely on the critical assumption of a constant population size. We conducted a discrete-generation forward simulation to quantify the deviations from the true values when the assumption is violated, and to suggest an adjustment method. Basically, we used an adjusted variance, which represents the variance when the average number of offspring is two (or one for haploid loci), suggested by Crow & Morton (1955). The details of the results are shown in the appendix A.

Next, we estimated *N*_{e} of sex-linked loci from the life table which summarizes the age-specific mortality and fertility in the population. Age-specific mortality rate (*p*_{x}), cumulative survival rate (*l*_{x}) and fertility rate (*m*_{x}) are calculated for each sex at age *x*. In the present study, female *m*_{x} represents the average number of daughters produced at that age and male *m*_{x} represents the average number of sons. We applied Felsenstein's (1971) formula that calculates *N*_{e} of populations with overlapping generations from life tables.

## 3. Results

### (a) Intergeneration intervals

In the genealogies, the oldest female who gave birth to a child was 49 years old, while the oldest male at the birth of his child was 68. The average intergeneration intervals could be underestimated if those who have not finished their reproduction yet were included into the analysis. Therefore, we considered only the cohorts born before 1926 (female) or 1907 (male) as parental generations. The intergeneration intervals were defined as the parent's age when each child was born. A total of 1549 intervals were recorded. The mean intergeneration interval was 29.3 years. The simple mean mother–daughter interval is 27.0 years (*N*=379, s.e.=0.38), and the mean father–son interval is 32.1 years (*N*=352, s.e.=0.46; figure 1). We also calculated, for every individual alive at the end of the study period, the mean maternal/paternal intervals of her/his ancestors, and then averaged them. This alternative is designed to double-count those parts of genealogy shared by two or more descendants and yields an interval length that weighs successful parents. We obtained similar values by this genealogy-based method (maternal, 27.2 years and paternal, 32.0 years).

Next, we constructed the life tables for two wild chimpanzee populations and calculated the generation time. The maternal generation time is estimated to be 24 years in the Mahale population and 19 years in the Taï population. The lower value for Taï results from the higher mortality rate among adults. We were unable to calculate the paternal generation time because their paternity relationships were hardly confirmed.

### (b) Effective population size

The effective size of the Polar Eskimo was estimated by the genealogy (gene-flow or allele-dropping) method. The estimated *N*_{eA}, *N*_{eX}, *N*_{eMit} and *N*_{eY} values during the period between 1850 and 1940 were 179.2, 139.3, 53.8 and 39.7, respectively. If we use the actual population size at the end of the study period (*N*=299), the estimate for *N*_{eA}/*N* was 0.60. Similarly, we obtained *N*_{eX}/*N*=0.47, *N*_{eMit}/*N*=0.18 and *N*_{eY}/*N*=0.13. If we use the harmonic mean of the population size during the period as a representative of the actual population size (*N*=271.9), the estimates for *N*_{eA}/*N*, *N*_{eX}/*N*, *N*_{eMit}/*N* and *N*_{eY}/*N* were 0.66, 0.51, 0.20 and 0.15, respectively. The increase in genetic variation during the whole period (90 years) was greater in Y-chromosomal DNA than in mtDNA. This means that the evolutionary rate by genetic drift is higher in Y-chromosomal DNA than that in mtDNA.

Next, the effective population size was calculated from the variance in the offspring number (table 1). For females, the mean and variance of the number of daughters were 1.51 and 2.54, respectively. The adjusted variance, which represents the variance when the average number of daughters is one, was 1.45. The equivalent procedure was repeated for males and we obtained the adjusted variance equal to 1.89 (the mean number of sons, 1.26 and variance, 2.66). Using Hill's formula and the adjustment procedure of variances in a growing population (see appendix A), we obtainedandThese values are overestimates if there is a positive correlation in the number of offspring between parents and offspring. However, we found no statistically significant correlations (between mothers and daughters: Kendall's *τ*=0.009, *n*=150; between fathers and sons: *τ*=0.025, *n*=202). We also calculated the effective size for autosomal and X-chromosomal DNA and obtainedand

Finally, the effective population sizes for sex-linked loci were calculated using the life table method. A life table that shows age-specific mortality (*p*_{x}) and fertility (*m*_{x}) for *x*=0–100 years old was made for each sex. Using the life table data, we obtainedandwhere *B*_{f} and *B*_{m} are the cohort sizes of females and males, respectively. The expected length of life of a newborn female was estimated to be 42.5 years, while that of a male was 39.9 years. If we consider the annual intrinsic rate of increase, which was calculated here as *λ*=1.01, we haveandwhere *N*_{f} and *N*_{m} are the census numbers.

It must be noted that the ratio of the effective size to the census size considered in the two approaches is a short-term value. In a population varying in size, the long-term effective size is close to a harmonic mean of the sizes during the period. Therefore, if a population experienced a bottleneck or a rapid expansion from a small population, the effective size could be quite small in relation to the current census size.

## 4. Discussion

### (a) Generation time

The generation time in the Polar Eskimos is similar to those from recent studies, which investigated long-term genealogical records (table 2). All these studies suggested that the human generation time is longer than that previously assumed in population genetic models. There are several other studies which reported less precise estimates of the generation time on the basis of demographic data or short-term genealogical data. Felsenstein (1971) obtained 26.3 years as the mean generation time (mother–offspring) when he calculated the effective population size on the basis of the US white female population for 1967. Based on the relatively short-term but maternity-tested records, Forster *et al*. (2002) described that the mother–daughter generation interval in two sampled areas of South India is 29.7 and 31.6 years, respectively (based on dead mothers and postmenopausal mothers of age more than 55 years). Storz *et al*. (2001) referred to a longer generation time for the Gainj population (Papua New Guinea). More recently, Fenner (2005) suggested 25–28 years as the maternal generation interval on the basis of his analysis of a wide range of demographic data including 40 less-developed nations and eight societies of hunter-gatherers. The generation time looks rather consistent regardless of the cultural and environmental differences among human societies.

Dating from DNA data is strongly influenced by the value of the generation time applied in the models. Time estimates are obtained usually in generations, and then translated into years. It is easy to see that a longer generation time yields a longer age estimate in years. This effect is serious when the mutation rate of the locus of interest is estimated from genealogical studies (e.g. Howell *et al*. 2003). The generation time may not have an effect if the mutation rate is calibrated at some past event (e.g. chimpanzee–human divergence: Horai *et al*. 1995; demographic expansion after a climate change: Forster *et al*. 1996), because the mutation rate per generation changes accordingly. The generation time may have a different kind of effect on demographic inference; a shorter generation time means more generations in a particular time period.

The majority of the previous studies assumed that the generation time for mitochondrial DNA and Y-chromosomal DNA is 20 and 25 years, respectively (e.g. Harpending & Rogers 2000). We suggest that a higher value, 25–30 for mtDNA and 30–35 years for Y-chromosomal DNA, should be used in genetic inference.

It may be argued that our distant ancestors might have had a shorter generation time because they must have suffered from a higher mortality rate or had smaller body sizes. In several studies estimating the age of the most recent common ancestors of humans and great apes, the generation time of great apes was assumed to be 7–15 years (Takahata & Satta 1997; Excoffier & Yang 1999) with the notable exception of Ruvolo (1997) who obtained information from a primatologist and used a generation time of 15–25 years. Unfortunately, there have been no published values for the generation time among apes. Our estimates of the maternal generation time for two wild chimpanzee populations are 19 and 24 years. Although we need more data on great apes, values of less than 15 years are clearly underestimates. This suggests that the hominin generation times in the past few million years may also have been longer than generally assumed.

### (b) Effective population size

Our estimated ratios of the effective to the census population size in the present study agree with those reported in other human populations (Frankham 1995; Storz *et al*. 2001). Humans generally exhibit higher values (0.3–0.9) than other animals (Frankham 1995).

Interestingly, there are slight differences between the values of the effective population size calculated by the different methods (table 3). This is partly because they measure different quantities by definition and are based on different assumptions that may be unfulfilled in reality. The estimates obtained from the genealogy method would be the most realistic if we had a perfect genealogical record. Unfortunately, the Eskimo genealogy contains some uncertain records even in the period between 1850 and 1940. The life table method might provide overestimates, in particular for males, because the variance in the offspring number reflects merely a stochastic variation resulting from the average age–sex mortality/fertility of each age–sex class. In other words, the method does not take into account skewed offspring numbers, for example in situations where some higher status males have many more children than others.

When we infer the population size of past human populations from the present genetic data, we need a ratio of the effective to the census population size (*N*_{e}/*N*) as we usually obtain results as effective sizes. A ratio of *N*_{eMit(h)}/*N*_{f} or *N*_{eY(h)}/*N*_{m} should be used instead of *N*_{eA}/*N* if such an inference is carried out using a haploid model on the basis of mitochondrial or Y-chromosomal data. The haploid effective population sizes *N*_{eMit(h)} and *N*_{eY(h)} are twice as large as the diploid ones *N*_{eMit} and *N*_{eY}, respectively. Considering the fact that table 2 shows the diploid effective sizes, we suggest using a ratio of 0.7–0.9 for females or mitochondrial DNA, 0.5 for males or Y-chromosomal DNA, and 0.6–0.7 for autosomal and X-chromosomal DNA. The difference between sexes reflects the variance of the offspring number. It is important to remember that reported paternity, and possibly maternity (Forster *et al*. 2002), may be incorrect. The actual effective sizes are smaller if the variances of the number of offspring are larger than those reported.

Helgason *et al*. (2003) suggested that mtDNA drifts faster than Y-chromosomal DNA in Iceland owing to a shorter generation time as well as to stronger intergenerational correlations in maternal than paternal lineages concerning the offspring number and generation time. In contrast, our gene flow simulation suggests that Y DNA may evolve a little faster than mtDNA in the Polar Eskimo population. The shorter maternal generation time (84%) does not cancel out the smaller effective population size of males (71%), and a clear sign of intergenerational correlations is not detected. It is a promising research subject to study whether the slightly faster drift of Y-chromosomal DNA observed in the Polar Eskimos is typical of other hunter-gatherer populations.

Overall, our hunter-gatherer genealogies show that the basic population genetic parameters are similar between humans across a wide range of cultures and geographical locations. It is therefore a reasonable working hypothesis to assume that these demographic values were also similar through time in human prehistory. Therefore, our results give some justification for the widespread use of modern human demographic values for prehistoric population inference in the human species.

## Acknowledgments

The authors are very grateful to Prof. Anthony W. F. Edwards for providing them with valuable information, including the genealogies in digital format as well as his unpublished papers, and for his comments on an earlier version of the paper. The authors are also grateful to Prof. Colin Renfrew for his help and advice, and Prof. William Hill and two anonymous reviewers for their helpful comments. The authors were financially supported by the Alfred P. Sloan Foundation and the McDonald Institute for Archaeological Research, University of Cambridge.

## Footnotes

- Received February 26, 2008.
- Accepted March 3, 2008.

- © 2008 The Royal Society