Elementary Human Genetics
The Central Asian Gene Pool
The Karakalpak Gene Pool
Discussion and Conclusions
Elementary Human Genetics
Every human is defined by his or her library of genetic material, copies of which are stored in every cell of the
body apart from the red blood cells. Cells are classified as somatic, meaning body cells, or gametic, the cells
involved in reproduction, namely the sperm and the egg or ovum. The overwhelming majority of human genetic material
is located within the small nucleus at the heart of each somatic cell. It is commonly referred to as the human genome.
Within the nucleus it is distributed between 46 separate chromosomes, two of which are known as the sex chromosomes.
The latter occur in two forms, designated X and Y. Chromosomes are generally arranged in pairs - a female has 22 pairs
of autosome chromosomes plus one pair of X chromosomes, while a male has a similar arrangement apart from
having a mixed pair of X and Y sex chromosomes.
A neutron crystallography cross-sectional image of a chromosome, showing the double
strand of DNA wound around a protein core.
Image courtesy of the US Department of Energy Genomics Program
A single chromosome consists of just one DNA macromolecule composed of two separate DNA strands, each of which contains
a different but complementary sequence of four different nucleotide bases - adenine (A), thymine (T), cytasine (C), and
guanine (G). The two strands are aligned in the form of a double helix held together by hydrogen bonds, adenine always
linking with thymine and cytasine always linking with guanine. Each such linkage between strands is known as a base pair.
The total human genome contains about 3 billion such base pairs. As such it is an incredibly long molecule that could be
from 3 cm to 6 cm long were it possible to straighten it. In reality the double helix is coiled around a core of structural
proteins and this is then supercoiled to create the chromosome, 23 pairs of which reside within a cell nucleus with a
diameter of just 0.0005 cm.
A gene is a segment of the DNA nucleotide sequence within the chromosome that can be chemically read to make one specific
protein. Each gene is located at a certain point along the DNA strand, known as its locus. The 22 autosome chromosome pairs
vary in size from 263 million base pairs in chromosome 1 (the longest) down to about 47 million base pairs in chromosome 21
(the shortest - chromosome 22 is the second shortest with 50 million base pairs), equivalent to from 3,000 down to 300 genes.
The two sex genes are also very different, X having about 140 million base pairs and expressing 1,100 genes, Y having only
23 million base pairs and expressing a mere 78 genes. The total number of genes in the human genome is around 30,000.
A complete set of 23 human homologous chromosome pairs
Image courtesy of the National Human Genome Research Institute, Maryland
Each specific pair of chromosomes have their own distinct characteristics and can be identified under the microscope after
staining with a dye and observing the resulting banding. With one exception the chromosome pairs are called homologous because
they have the same length and the same sequence of genes. For example the 9th pair always contain the genes for melanin
production and for ABO blood type, while the 14th pair has two genes critical to the body's immune response. Even so the
individual chromosomes within each matching pair are not identical since each one is inherited from each parent. A certain
gene at a particular locus in one chromosome may differ from the corresponding gene in the other chromosome, one being dominant
and the other recessive. The one exception relates to the male sex chromosomes, a combination of X and Y, which are not the same
length and are therefore not homologous.
A set of male human chromosomes showing typical banding
Various forms of the same gene (or of some other DNA sequence within the chromosome) are known as alleles. Differences in DNA
sequences at a specific chromosome locus are known as genetic polymorphisms. They can be categorized into various types, the
most simple being the difference in just a single nucleotide - a single nucleotide polymorphism.
When a normal somatic cell divides and replicates, the 23 homologous chromosome pairs (the genome) are duplicated through a
complex process known as mitosis. The two strands of DNA within each chromosome unravel and unzip themselves in order to
replicate, eventually producing a pair of sister chromatids - two brand new copies of the original single chromosome joined together.
However because the two chromosomes within each homologous pair are slightly different (one being inherited from each parent)
the two sister chromatids are divided in two. The two halves of each sister chromatid are allocated to each daughter cell,
thus replicating the original homologous chromosome pair. Such cells are called diploid because they contain two (slightly different)
sets of genetic information.
The production of gametic cells involves a quite different process. Sperm and eggs are called haploid cells, meaning single,
because they contain only one set of genetic information - 22 single unpaired chromosomes and one sex chromosome. They are formed
through another complex process known as meiosis. It involves a deliberate reshuffling of the parental genome in order
to increase the genetic diversity within the resulting sperm or egg cells and consequently among any resulting offspring.
As before each chromosome pair is replicated in the form of a pair of sister chromatids. This time however, each half of each
chromatid embraces its opposite neighbour in a process called synapsis. An average of two or three segments of maternal and
paternal DNA are randomly exchanged between chromatids by means of molecular rearrangements called crossover and genetic recombination.
The new chromatid halves are not paired with their matching partners but are all separated to create four separate haploid cells,
each containing one copy of the full set of 23 chromosomes, and each having its own unique random mix of maternal and paternal DNA.
In the male adult this process forms four separate sperm cells, but in the female only one of the four cells becomes an ovum, the
other three forming small polar bodies that progressively decay.
During fertilization the two haploid cells - the sperm and the ovum or egg - interact to form a diploid zygote (zyg meaning
symetrically arranged in pairs). In fact the only contribution that the sperm makes to the zygote is its haploid nucleus containing
its set of 23 chromosomes. The sex of the offspring is determined by the sex chromosome within the sperm, which can be either
X (female) or Y (male). Clearly the sex chromosome within the ovum has to be X. The X and the Y chromosomes are very different,
the Y being only one third the size of the X. During meiosis in the male, the X chromosome recombines and exchanges DNA with
the Y only at its ends. Most of the Y chromosome is therefore unaffected by crossover and recombination. This section is known
as the non-recombining part of the Y chromosome and it is passed down the male line from father to son relatively unchanged.
Scanning electron micrograph of an X and Y chromosome
Image courtesy of Indigo Instruments, Canada
Not all of the material within the human cell resides inside the nucleus. Both egg and sperm cells contain small energy-producing
organelles within the cytoplasm called mitochondria that have their own genetic material for making several essential mitochondrial
proteins. However the DNA content is tiny in comparison with that in the cell nucleus - it consists of several rings of DNA totalling
about 16,500 base pairs, equivalent to just 13 genes. The genetic material in the nucleus is about 300,000 times larger. When
additional mitochondria are produced inside the cell, the mitochondrial DNA is replicated and copies are transferred to the
new mitochondria. The reason why mitochondrial DNA, mtDNA for short, is important is because during fertilization virtually no
mitochondria from the male cell enters the egg and those that do are tagged and destroyed. Consequently the offspring only inherit
the female mitochondria. mtDNA is therefore inherited through the female line.
Population genetics is a branch of mathematics that attempts to link changes in the overall history of a population to changes in
its genetic structure, a population being a group of interbreeding individuals of the same species sharing a common geographical
area. By analysing the nature and diversity of DNA within and between different populations we can gain insights into their
separate evolution and the extent to which they are or are not related to each other. We can gain insights into a population's
level of reproductive isolation, the minimum time since it was founded, how marriage partners were selected, past geographical
expansions, migrations, and mixings.
The science is based upon the property of the DNA molecule to occasionally randomly mutate during replication, creating the possibility
that the sequence of nucleotides in the DNA of one generation may differ slightly in the following generation. The consequence of
this is that individuals within a homogenous population will in time develop different DNA sequences, the characteristic that we
have already identified as genetic polymorphism. Because mutations are random, two identical but isolated populations will tend
to change in different directions over time. This property is known as random genetic drift and its effect is greater in smaller
To study genetic polymorphisms, geneticists look for specific genetic markers. These are clearly recognizable mutations in the
DNA whose frequency of incidence varies widely across populations from different geographical areas. In reality the vast majority
of human genetic sequences are identical, only around 0.1% of them being affected by polymorphisms.
There are several types of genetic marker. The simplest are single nucleotide polymorphisms (SNPs), mentioned above, where just
one nucleotide has been replaced with another (for example A replaces T or C replaces G). SNPs in combination along a stretch of DNA
are called haplotypes, shorthand for haploid genotypes. These have turned out to be valuable markers because they are genetically
relatively stable and are found at differing frequencies in many populations. Some are obviously evolutionarily related to each
other and can be classified into haplogroups (Hg). Another type of polymorphism is where short strands of DNA have been randomly
inserted into the genetic DNA. This results in so-called biallelic polymorphism, since the strand is either present or absent. These
are useful markers because the individuals that have the mutant insert can be traced back to a single common ancestor, while those
who do not have the insert represent the original ancestral state . Biallelic polymorphisms can be assigned to certain haplotypes.
A final type of marker is based upon microsatellites, very short sequences of nucleotides, such as GATA, that are repeated in tandem
numerous times. A polymorphism occurs if the number of repetitions increases or decreases. Microsatellite polymorphisms, sometimes
also called short-tandem-repeat polymorphisms, occur more frequently over time, providing a different tool to study the rate of
genetic change against time.
Of course the whole purpose of sexual reproduction is to deliberately scramble the DNA from both parents in order to create
a brand new set of chromosome pairs for their offspring that are not just copies of the parental chromosomes. Studies show
that about 85% of genetic variation in autosomal sequences occurs within rather than between populations.
However it is the genetic variation between populations that is of the greatest interest when we wish to study their history.
Because of this, population geneticists look for more stable pieces of DNA that are not disrupted by reproduction. These are
of two radically different types, namely the non-recombining part of the Y chromosome and the mitochondrial DNA or mtDNA. A
much higher 40% of the variations in the Y chromosome and 30% of the variations in mtDNA are found between populations. Each
provides a different perspective on the genetic evolution of a particular population.
Y Chromosome Polymorphisms
By definition the Y chromosome is only carried by the male line. Although smaller than the other chromosomes, the Y chromosome
is still enormous compared to the mtDNA. The reason that it carries so few genes is because most of it is composed of "junk" DNA.
As such it is relatively unaffected by natural selection. The non-recombining part of the Y chromosome is passed on from father
to son with little change apart from the introduction of genetic polymorphisms as a result of random mutations. The only
problem with using the Y chromosome to study inheritance has been the practical difficulty of identifying a wide range
of polymorphisms within it, although the application of special HPLC techniques has overcome some of this limitation in recent years.
Y chromosome polymorphisms seem to be more affected by genetic drift and may give a better resolution between closely related
populations where the time since their point of divergence has been relatively short.
By contrast the mtDNA is carried by the female line. Although less than one thousandth the size of the DNA in the non-recombinant
Y chromosome, polymorphisms are about 10 times more frequent in mtDNA than in autosome chromosomes.
Techniques and Applications
Population genetics is a highly statistical science and different numerical methods can be used to calculate the various properties of
one or several populations. Our intention here is to cover the main analytical tools used in the published literature relating to
Karakalpak and the other Central Asian populations.
The genetic diversity of a population is the diversity of DNA sequences within its gene pool. It is calculated by a statistical
method known as the analysis of molecular variance (AMOVA) in the DNA markers from that population. It is effectively a summation of the
frequencies of individual polymorphisms found within the sample, mathematically normalized so that a diversity of 0 implies all the
individuals in that population have identical DNA and a diversity of 1 implies that the DNA of every individual is different.
The genetic distance between two populations is a measure of the difference in their polymorphism frequencies. It is calculated
statistically by comparing the pairwise differences between the markers identified for each population, to the pairwise
differences within each of the two populations. This distance is a multi-dimensional not a linear measure. However it is normally
illustrated graphically in two dimensions. New variables are identified by means of an angular transformation, the first two of which
together account for the greatest proportion of the differences between the populations studied.
Another property that can be measured statistically is kinship - the extent to which members of a population are related to
each other as a result of a common ancestor. Mathematically, a kinship coefficient is the probability that a randomly sampled
sequence of DNA from a randomly selected locus is identical across all members of the same population. A coefficient of 1
implies everyone in the group is related while a coefficient of 0 implies no kinship at all.
By making assumptions about the manner in which genetic mutations occur and their frequency over time it is possible to work backwards
and estimate how many generations (and therefore years) have elapsed from the most recent common ancestor, the individual to
whom all the current members of the population are related by descent. This individual is not necessarily the founder of the
population. For example if we follow the descent of the Y chromosome, this can only be passed down the male line from father to son.
If a male has no sons his non-combining Y chromosome DNA is eliminated from his population for ever more. Over time, therefore, the
Y chromosomes of the populations ancestors will be progressively lost. There may well have been ancestors older than the most recent
common ancestor, even though we can find no signs for those ancestors in the Y chromosome DNA of the current population.
A similar situation arises with mtDNA in the female half of the population because some women do not have daughters.
In 1977 the American anthropologist Gordon T. Bowles published an analysis of the anthropometric characteristics of 519 different
populations from across Asia, including the Karakalpaks and two regional groups of Uzbeks. Populations were characterized by 9
standard measurements, including stature and various dimensions of the head and face. A multivariate analysis was used to separate the
different populations by their physical features.
Bowles categorized the populations across four regions of Asia (West, North, East, and South) into 19 geographical groups. He then
analysed the biological distances between the populations within each group to identify clusters of biologically similar peoples.
Central Asia was divided into Group XVII encompassing Mongolia, Singkiang, and Kazakhstan and Group XVIII encompassing Turkestan and
Tajikistan. Each Group was found to contain three population clusters:
Anthropological Cluster Analysis of Central Asia
| Group || Cluster ||Regional Populations|
|XVII ||1||Eastern Qazaqs|
Alai Valley Kyrgyz
|2||Aksu Rayon Uighur
Alma Ata Uighur
|Alma Ata Qazaqs|
T'ien Shan Kyrgyz
|Total Turkmen |
Within geographical Group XVIII, the Karakalpaks clustered with the Uzbeks of Tashkent and the Uzbeks of Samarkand. The members of this
first cluster were much more heterogeneous than the other two clusters of neighbouring peoples. Conversely the Turkmen cluster had the
lowest variance of any of the clusters in the North Asia region, showing that different Turkmen populations are closely related.
The results of this study were re-presented by Cavalli-Sforza in a more readily understandable graphical form. The coordinates used are
artificial mathematical transformations of the original 9 morphological measurements, designed to identify the distances between different
populations in a simple two-dimensional format. The first two principal coordinates identify a clear division between the Uzbek/Karakalpaks,
and the Turkmen and Iranians, but show similarities between the Uzbek/Karakalpaks and the Tajiks, and also with the western Siberians.
Though not so close there are some similarities between the Uzbek/Karakalpaks and the Qazaqs, Kyrgyz, and Mongols:
Physical Anthropology of Asia redrawn by David Richardson after Bowles 1977
First and Second Principal Coordinates
The second and third principal coordinates maintains the similarity between Uzbek/Karakalpaks and Tajiks but emphasizes the more
eastern features of the Qazaqs, Kyrgyz, and Mongols:
Physical Anthropology of Asia redrawn by David Richardson after Bowles 1977
Second and Third Principal Coordinates
The basic average morphology of the Uzbeks and Karakalpaks shows them to be of medium stature, with heads that have an average length but
an above average breadth compared to the other populations of Asia. Their faces are broad and are of maximum height. Their noses
are of average width but have the maximum length found in Asia.
Qazaqs have the same stature but have longer and broader heads. Their faces are shorter but broader, having the maximum breadth found
in Asia, while their noses too are shorter and slightly broader.
Some of these differences in features were noted by some of the early Russian visitors, such as N. N. Karazin, who observed the differences
between the Karakalpaks and the Qazaqs (who at that time were called Kirghiz) when he first entered the northern Aral delta:
"In terms of type, the Karakalpak people themselves differ noticeably from the Kirghizs: flattened Mongolian noses are already a rarity here,
cheek-bones do not stand out so, beards and eyebrows are considerably thicker - there is a noticeably strong predominance of the Turkish race."
The Central Asian Gene Pool
Western researchers tended to under represent Central Asian populations in many of the earlier studies of population genetics.
Cavalli-Sforza, Menozzi, and Piazza, 1994
In 1994 Cavalli-Sforza and two of his colleagues published a landmark study of the worldwide geographic distribution of
human genes. In order to make global comparisons the study was forced to rely upon the most commonly available genetic
markers, and analysed classical polymorphisms based on blood groups, plasma proteins, and red cell enzymes. Sadly no information
was included for Karakalpaks or Qazaqs.
Results were analysed continent by continent. The results for the different populations of Asia grouped the Uzbeks, Turkmen,
and western Turks into a central cluster, located on the borderline between the Caucasian populations of the west and south
and the populations of Northeast Asia and East Asia:
Principal Component Analysis of Asian Populations
Redrawn by David Richardson after Cavelli-Sforza et al, 1994
Comas, Calafell, Pérez-Lezaun et al, 1998
In 1993-94 another Italian team collected DNA samples from four different populations close to the Altai: Qazaq highlanders
living close to Almaty, Uighur lowlanders in the same region, and two Kyrgyz communities - one in the southern highlands,
the other in the northern lowlands of Kyrgyzstan.
The data was used in two studies, both published in 1998. In the first, by Comas et al, mtDNA polymorphisms in these
four communities were compared with other Eurasian populations in the west (Europe, Middle East, and Turkey), centre (the Altai)
and the east (Mongolia, China, and Korea). The four Central Asian populations all showed high levels of sequence diversity -
in some cases the highest in Eurasia. At the same time they were tightly clustered together, almost exactly halfway between
the western and the eastern populations, the exception being that the Mongolians occupied a position close to this central
cluster. The results suggested that the Central Asian gene pool was an admixture of the western and eastern gene pools, formed
after the western and eastern Eurasians had diverged. The authors suggested that this diversity had possibly been enhanced by
human interaction along the Silk Road.
In the second, by Pérez-Lezaun et al, short-tandem-repeat polymorphisms in the Y chromosome were analysed for the
four Central Asian populations alone. Each of the four was found to be highly heterogeneous yet very different from the other
three, the latter finding appearing to contradict the mtDNA results. However the two highland groups had less genetic diversity
because each had very high frequencies for one specific polymorphism:
Y chromosome haplotype frequencies, with labels given to those shared by more than one population
From Pérez-Lezaun et al, 1998.
The researchers resolved the apparent contradiction between the two studies in terms of different migration patterns for men and women.
All four groups practised a combination of exogamy and patrilocal marriage - in other words couples within the same clan could not marry
and brides always moved from their own village to the village of the groom. Consequently the males, and their genes, were isolated and
localized, while the females were mobile and there were more similarities in their genes. The high incidence of a single marker in each
highland community was presumed to be a founder's effect, supported by evidence that the highland Qazaq community had only been
established by lowland Qazaqs a few hundred years ago.
Zerjal, Spencer Wells, Yuldasheva, Ruzibakiev, and Tyler-Smith, 2002
In 2002 a joint Oxford University/Imperial Cancer Research Fund study was published, analysing Y chromosome polymorphisms in 15 different
Central Asian populations, from the Caucasus to Mongolia. It included Uzbeks from the eastern viloyat of Kashkadarya, Qazaqs
and Uighurs from eastern Kazakhstan, Tajiks, and Kyrgyz. Blood samples had been taken from 408 men, living mainly in villages, between 1993
and 1995. In the laboratory the Y chromosomes were initially typed with binary markers to identify 13 haplogroups. Following this,
microsatellite variations were typed in order to define more detailed haplotypes.
Haplogroup frequencies were calculated for each population and were illustrated by means of the following chart:
Haplogroup frequencies across Central Asia
From Zerjal et al, 2002.
Many of the same haplogroups occurred across the 5,000 km expanse of Central Asia, although with large variations in frequency and with
no obvious overall pattern. Haplogroups 1, 2, 3, 9, and 26 accounted for about 70% of the total sample.
Haplogroups (Hg) 1 and 3 were common in almost all populations, but the highest frequencies of Hg1 were found in Turkmen and Armenians,
while the highest frequencies of Hg3 were found in Kyrgyz and Tajiks. Hg3 was more frequent in the eastern populations, but was only
present at 3% in the Qazaqs. Hg3 is the equivalent of M17, which seems to originate from Russia and the Ukraine, a region not covered
by this survey - see Spencer Wells et al, 2001 below. Hg9 was very frequent in the Middle East and declined in importance across
Central Asia from west to east. However some eastern populations had a higher frequency - the Uzbeks, Uighurs, and Dungans.
Hg10 and its derivative Hg36 showed the opposite pattern, together accounting for 54% of haplogroups for the Mongolians and 73% for
the Qazaqs. Hg26, which is most frequently found in Southeast Asia, occurs with the highest frequencies among the Dungans (26%),
Uighurs (15%), Mongolians (13%), and Qazaqs (13%) in eastern Central Asia. Hg12 and Hg 16 are widespread in Siberia and northern Eurasia
but are rare in Central Asia except for the Turkmen and Mongolians. Hg21 was restricted to the Caucasus region.
The most obvious observation is that virtually each population is quite distinct. As an example, the Uzbeks are quite different from the
Turkmen, Qazaqs, or Mongolians. Only two populations, the Kyrgyz from central Kyrgyzstan and the Tajiks from Pendjikent, show any
The researchers measured the genetic diversity of each population using both haplogroup and microsatellite frequencies. Within Central
Asia, the Uzbeks, Uighurs, Dungans, and Mongolians exhibited high genetic diversity, while the Qazaqs, Kyrgyz, Tajiks, and Turkmen
showed low genetic diversity. These differences were explored by examining the haplotype variation within each haplogroup for each
population. Among the Uzbeks, for example, many different haplotypes are widely dispersed across all chromosomes. Among the Qazaqs,
however, the majority of the haplotypes are clustered together and many chromosomes share the same or related haplotypes.
Low diversity coupled with high frequencies of population-specific haplotype clusters are typical of populations that have experienced
a bottleneck or a founder event. The most recent common ancestor of the Tajik population was estimated to date from the early part of
the 1st millennium AD, while the most recent common ancestors of the Qazaq and Kyrgyz populations were placed in the period 1200 to
1500 AD. The authors suggested that bottlenecks might be a feature of societies like the Qazaqs and Kyrgyz with small, widely dispersed
nomadic groups, especially if they had suffered massacres during the Mongol invasion. Of course these calculations have broad confidence
intervals and must be interpreted with caution.
Microsatellite haplotype frequencies were used to investigate the genetic distances among the separate populations. The best
two-dimentional fit produces a picture with no signs of general clustering on the basis of either geography or linguistics:
Genetic distances based on micosatellite haplotypes
From Zerjal et al, 2002.
The Kyrgyz (ethnically Turkic) do cluster next to the Tajiks (supposedly of Indo-Iranian origin), but both are well separated from the
neighbouring Qazaqs. The Turkmen, Qazaqs, and Georgians tend to be isolated from the other groups, leaving the Uzbeks in a somewhat
central position, clustered with the Uighurs and Dungans.
The authors attempted to interpret the results of their study in terms of the known history of the region. The apparently underlying
graduation in haplogroup frequencies from west to east was put down to the eastward agricultural expansion out of the Middle East
during the Neolithic, some of the haplogroup markers involved being more recent than the Palaeolithic. Meanwhile Hg3 (equivalent to M17 and
Eu19), which is widespread in Central Asia, was attributed to the migration of the pastoral Indo-Iranian "kurgan culture" eastwards from
the Ukraine in the late 3rd/early 2nd millennium BC. The mountainous Caucasus region seems to have been bypassed by this migration, which
seems to have extended across Central Asia as far as the borders of Siberia and China.
Later events also appear to have left their mark. The presence of a high number of low-frequency haplotypes in Central Asian populations
was associated with the spread of Middle Eastern genes, either through merchants associated with the early Silk Route or the later spread
of Islam. Uighurs and Dungans show a relatively high Middle Eastern admixture, including higher frequencies of Hg9, which might indicate
their ancestors migrated from the Middle East to China before moving into Central Asia.
High frequencies of Hg10 and its derivative Hg36 are found in the majority of Altaic-speaking populations, especially the Qazaqs, but
also the Uzbeks and Kyrgyz. Yet its contribution west of Uzbekistan is low or undetectable. This feature is associated with the
progressive migrations of nomadic groups from the east, from the Hsiung-Nu, to the Huns, the Turks, and the Mongols. Of course Central
Asians have not only absorbed immigrants from elsewhere but have undergone expansions, colonizations and migrations of their own,
contributing their DNA to surrounding populations. Hg1, the equivalent of M45 and its derivative markers, is believed to have originated
in Central Asia and is found throughout the Caucasus and in Mongolia.
The Karakalpak Gene Pool
Spencer Wells et al, 2001
The first examination of Karakalpak DNA appeared as part of a widespread study of Eurasian Y chromosome diversity published by
Spencer Wells et al in 2001. It included samples from 49 different Eurasian groups, ranging from western Europe, Russia,
the Middle East, the Caucasus, Central Asia, South India, Siberia, and East Asia. Data on 12 other groups was taken from the literature.
In addition to the Karakalpaks, the Central Asian category included seven separate Uzbek populations selected from Ferghana to Khorezm,
along with Turkmen from Ashgabat, Tajiks from Samarkand, and Qazaqs and Uighurs from Almaty. The study used biallelic markers that were
then assigned to 23 different haplotypes. To illustrate the results the latter were condensed into 7 evolutionary-related groups.
The study found that the Uzbek, Karakalpak, and Tajik populations had the highest haplotype diversity in Eurasia, the Karakalpaks having
the third highest diversity of all 49 groups. The Qazaqs and Kyrgyz had a significantly lower diversity.
This diversity is obvious from the chart comparing haplotype frequencies across Eurasia:
Distribution of Y chromosome haplotype lineages across various Eurasian populations
From Spencer Wells et al, 2001.
Uzbeks have a fairly balanced haplotype profile, while populations in the extreme west and east are dominated by one specific haplotype
lineage - the M173 lineage in the extreme west and the M9 lineage in the extreme east and Siberia.
The Karakalpaks are remarkably similar to the Uzbeks:
Distribution of Y chromosome haplotype lineages in Uzbeks and Karakalpaks
From Spencer Wells et al, 2001.
the main differences being that Karakalpaks have a higher frequency of M9 and M130 and a lower frequency of M17 and M89 haplotype
lineages. M9 is strongly linked to Chinese and other far-eastern peoples, while M130 is associated with Mongolians and Qazaqs.
On the other hand, M17 is strong in Russia, the Ukraine, the Czech and Slovak Republics as well as in Kyrgyz populations, while M89
has a higher frequency in the west. It seems that compared to Uzbeks, the Karakalpak gene pool has a somewhat higher frequency of
haplotypes that are associated with eastern as opposed to western Eurasian populations.
In fact the differences between Karakalpaks and Uzbeks are no more pronounced than between the Uzbeks themselves. Haplotype frequencies
for the Karakalpaks tend to be within the ranges measured across the different Uzbek populations:
Comparison of Karakalpak haplotype lineage frequencies to other ethnic groups in Central Asia
|| M130|| M89 || M9
|| M45 || M173 || M17
|| Total |
||0 - 7||7-18||19-34||5-21||4-11
Statistically Karakalpaks are genetically closest to the Uzbeks from Ferghana, followed by those from Surkhandarya, Samarkand, and
finally Khorezm. They are furthest from the Uzbeks of Bukhara, Tashkent, and Kashkadarya.
These results also show the distance between the Karakalpaks and the other peoples of Central Asia and its neighbouring regions.
Next to the Uzbeks, the Karakalpaks are genetically closest to the Tatars and Uighurs. However they are quite distant from the Turkmen,
Qazaqs, Kyrgyz, Siberians, and Iranians.
The researchers produced a "neighbour-joining" tree, which clustered the studied populations into eight categories according to the
genetic distances between them. The Karakalpaks were classified into cluster VIII along with Uzbeks, Tatars, and Uighurs - the
populations with the highest genetic diversity. They appear sandwiched between the peoples of Russian and the Ukraine and the
Mongolians and Qazaqs.
Neighbour-joining tree of 61 Eurasian Populations
Karakalpaks are included in cluster VIII along with Uzbeks, Tatars, and Uighurs
From Spencer Wells et al, 2001.
Spencer Wells and his colleagues did not attempt to explain why the Karakalpak gene pool is similar to Uzbek but is different from the
Qazaq, a surprising finding given that the Karakalpaks lived in the same region as the Qazaqs of the Lesser Horde before migrating
into Khorezm. Instead they suggested that the high diversity in Central Asia might indicate that its population is among the oldest
in Eurasia. M45 is the ancestor of haplotypes M173, the predominant group found in Western Europe, and is thought to have arisen in
Central Asia about 40,000 years ago. M173 occurred about 30,000 years old, just as modern humans began their migration from Central
Asia into Europe during the Upper Palaeolithic. M17 (also known as the Eu19 lineage) has its origins in eastern Europe and the Ukraine
and may have been initially introduced into Central Asia following the last Ice Age and re-introduced later by the south-eastern migration
of the Indo-Iranian "kurgan" culture.
Comas et al, 2004
At the beginning of 2004 a complementary study was published by David Comas, based on the analysis of mtDNA haplogroups from 12 Central
Asian and neighbouring populations, including Karakalpaks, Uzbeks, and Qazaqs. Sample size was only 20, dropping to 16 for Dungans and
Uighurs, so that errors in the results for individual populations could be high.
The study reconfirmed the high genetic diversity within Central Asian populations. However a high proportion of sequences originated elsewhere,
suggesting that the region had experienced "intense gene flow" in the past.
The haplogroups were divided into three types according to their origins: West Eurasian, East Asian, and India. Populations showed a
graduation from the west to the east with the Karakalpaks occupying the middle ground, with half of their haplogroups having a western
origin and the other half having an eastern origin. Uzbek populations contained a small Indian component.
Mixture of western and eastern mtDNA haplogroups across Central Asia
|Population||West Eurasian|| East Asian
|| Total |
The researchers found that two of the haplogroups of East Asian origin (D4c and G2a) not only occurred at higher frequencies
in Central Asia than in neighbouring populations but appeared in many related but diverse forms. These may have originated as
founder mutations some 25,000 to 30,000 years ago, expanded as a result of genetic drift and subsequently become dispersed into
the neighbouring populations. Their incidence was highest in the Qazaqs, and second highest in the Turkmen and Karakalpaks.
The majority of the other lineages separate into two types with either a western or an eastern origin. They do not overlap,
suggesting that they were already differentiated before they came together in Central Asia. Furthermore the eastern group contains
both south-eastern and north-eastern components. One explanation for their admixture in Central Asia is that the region was originally
inhabited by Western people, who were then partially replaced by the arrival of Eastern people. There is genetic evidence from
archaeological sites in eastern China of a drastic shift, between 2,500 and 2,000 years ago, from a European-like population to
the present-day East Asian population.
The presence of ancient Central Asian sequences suggests it is more likely that the people of Central Asia are a mixture of two
differentiated groups of peoples who originated in west and east Eurasia respectively.
Chaix and Heyer et al, 2004
The most interesting study of Karakalpak DNA so far was published by a team of French workers in the autumn of 2004. It was based on
blood samples taken during two separate expeditions to Karakalpakstan in 2001 and 2002, organized with the assistance of IFEAC, the
Institut Français d'Etudes sur l'Asie Centrale, based in Tashkent. The samples consisted of males belonging to five different ethnic
groups: Qon'ırat Karakalpaks (sample size 53), On To'rt Urıw Karakalpaks (53), Qazaqs (50), Khorezmian Uzbeks (40), and Turkmen (51).
The study was based on the analysis of Y chromosome haplotypes from DNA extracted from white blood cells. In addition to providing
samples for DNA analysis, participants were also interviewed to gather information on their paternal lineages and tribal and clan
Unfortunately the published results only focused on the genetic relationships between the tribes, clans and lineages of these five
ethnic groups. However before reviewing these important findings it is worth looking at the more general aspects that emerged from
the five samples. These were summarized by Professor Evelyne Heyer and Dr R Chaix at a workshop on languages and genes held in France
in 2005, where the results from Karakalpakstan were compared with the results from similar expeditions to Kyrgyzstan, the Bukhara,
Samarkand, and Ferghana Valley regions of Uzbekistan, and Tajikistan as well as with some results published by other research teams.
In some cases comparisons were limited by the fact that the genetic analysis of samples from different regions was not always done
according to the same protocols.
The first outcome was the reconfirmation of the high genetic diversity among Karakalpaks and Uzbeks:
Y Chromosome Diversity across Central Asia
|Population||Region||Sample Size|| Diversity |
|Karakalpak On To'rt Urıw||Karakalpakstan||54||0.89|
|Tajik Kamangaron||Ferghana Valley||30||0.98|
|Tajik Richtan||Ferghana Valley||29||0.98|
|Kyrgyz Andijan||Uzbek Ferghana Valley||46||0.82|
|Kyrgyz Jankatalab||Uzbek Ferghana Valley||20||0.78|
|Kyrgyz Doboloo||Uzbek Ferghana Valley||22||0.70|
The high diversities found in Uighur and Tajik communities also agreed with earlier findings. Qon'ırat Karakalpaks had somewhat
greater genetic diversity than On To'rt Urıw Karakalpaks. Some of these figures are extremely high. A diversity of zero implies
a population where every individual is identical. A diversity of one implies the opposite, the haplotypes of every individual
The second more important finding concerned the Y chromosome genetic distances among different Central Asian populations. As
usual this was presented in two dimensions:
Genetic distances between ethnic populations in Karakalpakstan and the Ferghana Valley
From Chaix and Heyer et al, 2004.
The researchers concluded that Y chromosome genetic distances were strongly correlated to geographic distances. Not only are Qon'ırat
and On To'rt Urıw populations genetically close, both are also close to the neighbouring Khorezmian Uzbeks. Together they give the
appearance of a single population that has only relatively recently fragmented into three separate groups. Clearly this situation is
mirrored with the two Tajik populations living in the Ferghana Valley and also with two of the three Kyrgyz populations from the same
region. Although close to the local Uzbeks, the two Karakalpak populations have a slight bias towards the local Qazaqs.
The study of the Y chromosome was repeated for the mitichondrial DNA, to provide a similar picture for the female half of the same
populations. The results were compared to other studies conducted on other groups of Central Asians. We have redrawn the chart showing
genetic distances among populations, categorizing different ethnic groups by colour to facilitate comparisons:
Genetic distances among ethnic populations in Central Asia
Based on mitochondrial DNA polymorphisms
From Heyer, 2005.
The French team concluded that, in this case, genetic distances were not related to either geographical distances or to linguistics.
However this is not entirely true because there is some general clustering among populations of the same ethnic group, although by
no means as strong as that observed from the Y chromosome data. The three Karakalpak populations highlighted in red consist of the
On To'rt Urıw (far right), the Qon'ırat (centre), and the Karakalpak sample used in the Comas 2004 study (left). The Uzbeks are shown in green
and those from Karakalpakstan are the second from the extreme left, the latter being the Uzbeks from Samarkand. A nearby group of
Uzbeks from Urgench in Khorezm viloyati appear extreme left. There is some relationship between the mtDNA of the Karakalpak
and Uzbek populations of the Aral delta therefore, but it is much weaker than the relationship between their Y chromosome DNA. On the
other hand the Qazaqs of Karakalpakstan, the uppermost yellow square, are very closely related to the Karakalpak Qon'ırat according to
These results are similar to those that emerged from the Italian studies of Qazaq, Uighur, and Kyrgyz Y chromosome and mitochondrial
DNA. Ethnic Turkic populations are generally exogamous. Consequently the male DNA is relatively isolated and immobile because men
traditionally stay in the same village from birth until death. They had to select their wives from other geographic regions
and sometimes married women from other ethnic groups. The female DNA within these groups is consequently more diversified. The results
suggest that in the delta, some Qon'ırat men have married Qazaq women and/or some Qazaq men have married Qon'ırat women.
Let us now turn to the primary focus of the Chaix and Heyer paper. Are the tribes and clans of the Karakalpaks and other ethnic groups
living within the Aral delta linked by kinship? Y chromosome polymorphisms were analysed for each separate lineage, clan, tribe, and
ethnic group using single tandem repeats. The resulting haplotypes were used to calculate a kinship coefficient at each respective
Within the two Karakalpak samples the Qon'ırat were all Shu'llik and came from several clans, only three of which permitted the computation
of kinship: the Qoldawlı, Qıyat, and Ashamaylı clans. However none of these clans had recognized lineages. The Khorezmian Uzbeks have also
long ago abandoned their tradition of preserving genealogical lineages.
The On To'rt Urıw were composed of four tribes, four clans, and four lineages:
- Qıtay tribe
- Qıpshaq tribe, Basar clan
- Keneges tribe, Omır and No'kis clans
- Man'g'ıt tribe, Qarasıraq clan
The Qazaq and the Turkmen groups were also structured along tribal, clan, and lineage lines.
The results of the study showed that lineages, where they were still maintained, exhibited high levels of kinship, the On To'rt Urıw having
by far the highest. People belonging to the same lineage were therefore significantly more related to each other than people selected at
random in the overall global population. Put another way, they share a common ancestor who is far more recent than the common ancestor for
the population as a whole:
Kinship coefficients for five different ethnic populations, including the Qon'ırat and the On To'rt Urıw.
From Chaix and Heyer et al, 2004.
The kinship coefficients at the clan level were lower, but were still significant in three groups - the Karakalpak Qon'ırat, the Qazaqs,
and the Turkmen. However for the Karakalpak On To'rt Urıw and the Uzbeks, men from the same clan were only fractionally more related to
each other than were men selected randomly from the population at large. When we reach the tribal level we find that the men in all five
ethnic groups show no genetic kinship whatsoever.
In these societies the male members of some but not all tribal clans are partially related to varying degrees, in the sense that they are
the descendants of a common male ancestor. Depending on the clan concerned this kinship can be strong, weak, or non-existant. However the
members of different clans within the same tribe show no such interrelationship at all. In other words, tribes are conglomerations of
clans that have no genetic links with each other apart from those occurring between randomly chosen populations. It suggests that such tribes
were formed politically, as confederations of unrelated clans, and not organically as a result of the expansion and sub-division of an
initially genetically homogenous extended family group.
By assuming a constant rate of genetic mutation over time and a generation time of 30 years, the researchers were able to calculate the
number of generations (and therefore years) that have elapsed since the existence of the single common ancestor. This was essentially the
minimum age of the descent group and was computed for each lineage and clan. However the estimated ages computed were very high. For example,
the age of the Qon'ırat clans was estimated at about 460 generations or 14,000 years (late Ice Age), while the age of the On To'rt Urıw lineages
was estimated at around 200 generations or 6,000 years (early Neolithic). Clearly these results are ridiculous. The explanation is that each
group included immigrants or outsiders who were clearly unrelated to the core population.
The calculation was therefore modified, restricting the sample to those individuals who belonged to the modal haplogroup of the descent group.
This excluded about 17% of the men in the initial sample. Results were excluded for those descent groups that contained less than three
|Descent Group||Population||Number of
|Age in years||95% Confidence|
|| 35||1,058||454 - 3,704|
|| 20|| 595||255 - 2,083|
||3,051||1,307 - 10,677|
On To'rt Urıw
|| 13|| 397||170 - 1,389|
|| 415||178 - 1,451|
|| 516||221 - 1,806|
The age of the On To'rt Urıw and other lineages averaged about 15 generations, equivalent to about 400 to 500 years. The age of the clans
varied more widely, from 20 generations for the Qazaqs, to 35 generations for the Qon'ırat, and to 102 generations for the Turkmen.
This dates the oldest common ancestor of the Qazaq and Qon'ırat clans to a time some 600 to 1,200 years ago. However the common
ancestor of the Turkmen clans is some 3,000 years old. The high ages of the Turkmen clans was the result of the occurrence of a
significantly mutated haplotype within the modal haplogroup. It was difficult to judge whether these individuals were genuinely related
to the other clan members or were themselves recent immigrants.
These figures must be interpreted with considerable caution. Clearly the age of a clan's common ancestor is not the same as the age of the
clan itself, since that ancestor may have had ancestors of his own, whose lines of descent have become extinct over time. The calculated
ages therefore give us a minimum limit for the age of the clan and not the age of the clan itself.
In reality however, the uncertainty in the assumed rate of genetic mutation gives rise to extremely wide 95% confidence intervals. The
knowledge that certain Karakalpak Qon'ırat clans are most likely older than a time ranging from 450 to 3,700 years is of little practical
use to us. Clearly more accurate models are required.
Chaix, R.; Quintana-Murci, L.; Hegay, T.; Hammer, M. F.; Mobasher, Z.; Austerlitz, F.; and Heyer, E., 2007
The latest analysis of Karakalpak DNA comes from a study examining the genetic differences between various pastoral and farming populations
in Central Asia. In this region these two fundamentally different economies are organized according to quite separate social traditions:
The study aims to identify differences in the genetic diversity of the two groups as a result of these two different lifestyles. It examines
the genetic diversity of:
- pastoral populations are classified into what their members claim to be descent groups (tribes, clans, and lineages), practice exogamous
marriage (where men must marry women from clans that are different to their own), and are organized on a patrilineal basis (children being
affiliated to the descent group of the father, not the mother).
- farmer populations are organized into nuclear and extended families rather than tribes and often practise endogamous marriage (where men
marry women from within the same clan, often their cousins).
The diversity of mtDNA was examined by investigating one of two short segments, known as hypervariable segment number 1 or HVS-1. This and HVS-2
have been found to contain the highest density of neutral polymorphic variations between individuals.
- maternally inherited mitochondrial DNA in 12 pastoral and 9 farmer populations, and
- paternally inherited Y-chromosomes in 11 pastoral and 7 farmer populations.
The diversity of the Y chromosome was examined by investigating 6 short tandem repeats (STRs) in the non-recombining region of the chromosome.
This particular study sampled mtDNA from 5 different populations from Karakalpakstan: On To'rt Urıw Karakalpaks, Qon'ırat Karakalpaks,
Qazaqs, Turkmen, and Uzbeks. Samples collected as part of other earlier studies were used to provide mtDNA data on 16 further populations
(one of which was a general group of Karakalpaks) and Y chromosome data on 20 populations (two of which were On To'rt Urıw and Qon'ırat
Karakalpaks sampled in 2001 and 2002). The sample size for each population ranged from 16 to 65 individuals.
Both Karakalpak arıs were classified as pastoral, along with Qazaqs, Kyrgyz, and Turkmen. Uzbeks were classified as farmers, along with
Tajiks, Uighurs, Kurds, and Dungans.
Results of the mtDNA Analysis
The results of the mtDNA analysis are given in Table 1, copied from the paper.
Table 1. Sample Descriptions and Estimators of Genetic Diversity from the mtDNA Sequence
|Population ||n ||Location
||Long ||Lat ||H ||π
||D ||pD ||Ps
|Karakalpaks ||20 ||Uzbekistan ||58
||43 ||0.99 ||5.29 ||-1.95 ||0.01
||0.90 ||1.05 |
|Karakalpaks (On To'rt Urıw) ||53 ||Uzbekistan/Turkmenistan border
||60 ||42 ||0.99 ||5.98 ||-1.92
||0.01 ||0.70 ||1.20 |
|Karakalpaks (Qon'ırat) ||55 ||Karakalpakstan
||59 ||43 ||0.99 ||5.37 ||-2.01
||0.01 ||0.82 ||1.15 |
|Qazaqs ||50 ||Karakalpakstan
||63 ||44 ||0.99 ||5.23 ||-1.97
||0.01 ||0.88 ||1.11 |
|Qazaqs ||55 ||Kazakhstan
||80 ||45 ||0.99 ||5.66 ||-1.87
||0.01 ||0.69 ||1.25 |
|Qazaqs ||20 ||
||68 ||42 ||1.00 ||5.17 ||-1.52
||0.05 ||1.00 ||1.00 |
|Kyrgyz ||20 ||Kyrgyzstan
||74 ||41 ||0.97 ||5.29 ||-1.38
||0.06 ||0.55 ||1.33 |
|Kyrgyz (Sary-Tash) ||47 ||South Kyrgyzstan, Pamirs
||73 ||40 ||0.97 ||5.24 ||-1.95
||0.01 ||0.49 ||1.52 |
|Kyrgyz (Talas) ||48 ||North Kyrgyzstan
||72 ||42 ||0.99 ||5.77 ||-1.65
||0.02 ||0.77 ||1.14 |
|Turkmen ||51 ||Uzbekistan/Turkmenistan border
||59 ||42 ||0.98 ||5.48 ||-1.59
||0.04 ||0.53 ||1.42 |
|Turkmen ||41 ||Turkmenistan
||60 ||39 ||0.99 ||5.20 ||-2.07
||0.00 ||0.73 ||1.21 |
|Turkmen ||20 ||
||59 ||40 ||0.98 ||5.28 ||-1.71
||0.02 ||0.75 ||1.18 |
|Dungans ||16 ||Kyrgyzstan
||78 ||41 ||0.94 ||5.27 ||-1.23
||0.12 ||0.31 ||1.60 |
|Kurds ||32 ||Turkmenistan
||59 ||39 ||0.97 ||5.61 ||-1.35
||0.05 ||0.41 ||1.52 |
|Uighurs ||55 ||Kazakhstan
||82 ||47 ||0.99 ||5.11 ||-1.91
||0.01 ||0.62 ||1.28 |
|Uighurs ||16 ||Kyrgyzstan
||79 ||42 ||0.98 ||4.67 ||-1.06
||0.15 ||0.63 ||1.23 |
|Uzbeks (North) ||40 ||Karakalpakstan
||60 ||43 ||0.99 ||5.49 ||-2.03
||0.00 ||0.68 ||1.21 |
|Uzbeks (South) ||42 ||Surkhandarya, Uzbekistan
||67 ||38 ||0.99 ||5.07 ||-1.96
||0.01 ||0.81 ||1.14 |
|Uzbeks (South) ||20 ||Uzbekistan
||66 ||40 ||0.99 ||5.33 ||-1.82
||0.02 ||0.90 ||1.05 |
|Uzbeks (Khorezm) ||20 ||Khorezm, Uzbekistan
||61 ||42 ||0.98 ||5.32 ||-1.62
||0.04 ||0.70 ||1.18 |
|Tajiks (Yagnobi) ||20 ||
||71 ||39 ||0.99 ||5.98 ||-1.76
||0.02 ||0.90 ||1.05 |
Key: the pastoral populations are in the grey area; the farmer populations are in the white area.
The table includes the following parameters:
- sample size, n, the number of individuals sampled in each population. Individuals had to be unrelated to any other member of the same sample
for at least two generations.
- the geographical longitude and latitude of the population sampled.
- heterozygosity, H, the proportion of different alleles occupying the same position in each mtDNA sequence. It measures the frequency of
heterozygotes for a particular locus in the genetic sequence and is one of several statistics indicating the level of genetic variation or
polymorphism within a population. When H=0, all alleles are the same and when H=1, all alleles are different.
- the mean number of pairwise differences, π, measures the average number of nucleotide differences between all pairs of HVS-1 sequences.
This is another statistic indicating the level of genetic variation within a population, in this case measuring the level of mismatch
- Tajima’s D, D, measures the frequency distribution of alleles in a nucleotide sequence and is based on the difference between two estimations
of the population mutation rate. It is often used to distinguish between a DNA sequence that has evolved randomly (D=0) and one that has experienced directional selection favouring a single allele. It is consequently used as a test for natural selection. However it is also influenced by population history and negative values of D can indicate high rates of population growth.
- the probability that D is significantly different from zero, pD.
- the proportion of singletons, Ps, measures the relative number of unique polymorphisms in the sample. The higher the proportion of singletons,
the greater the population has been affected by inward migration.
- the mean number of individuals carrying the same mtDNA sequence, C, is an inverse measure of diversity. The more individuals with the same
sequence, the less diversity within the population and the higher proportion of individuals who are closely related.
The table shows surprisingly little differentiation between pastoral and farmer populations. Both show high levels of within population
genetic diversity (for both groups, median H=0.99 and π is around 5.3). Further calculations of genetic distance between populations, Fst, (
not presented in the table but given graphically in the online reference below) showed a corresponding low level of genetic differentiation
among pastoral populations as well as among farmer populations.
Both groups of populations also showed a significantly negative Tajima’s D, which the authors attribute to a high rate of demographic growth in
neutrally evolving populations.
Supplementary data made available online showed a weak correlation between genetic distance, Fst, and geographic distance for both pastoral and
farmer populations. Click here for redirection to the relevant
Results of the Y chromosome Analysis
The results of the Y chromosome analysis are given in Table 2, also copied from the paper:
Table 1. Sample Descriptions and Estimators of Genetic Diversity from the Y chromosome STRs
|Population ||n ||Location
||Long ||Lat ||H ||π
||r ||Ps ||C |
|Karakalpaks (On To'rt Urıw) ||54 ||Uzbekistan/Turkmenistan border
||60 ||42 ||0.86 ||3.40 ||1.002
||0.24 ||2.84 |
|Karakalpaks (Qon'ırat) ||54 ||Karakalpakstan
||59 ||43 ||0.91 ||3.17 ||1.003
||0.28 ||2.35 |
|Qazaqs ||50 ||Karakalpakstan
||63 ||44 ||0.85 ||2.36 ||1.004
||0.16 ||2.78 |
|Qazaqs ||38 ||Almaty, KatonKaragay, Karatutuk,
Rachmanovsky Kluchi, Kazakhstan
|68 ||42 ||0.78 ||2.86 ||1.004
||0.26 ||2.71 |
|Qazaqs ||49 ||South-east Kazakhstan
||77 ||40 ||0.69 ||1.56 ||1.012
||0.22 ||3.06 |
|Kyrgyz ||41 ||Central Kyrgyzstan (Mixed)
||74 ||41 ||0.88 ||2.47 ||1.004
||0.41 ||1.86 |
|Kyrgyz (Sary-Tash) ||43 ||South Kyrgyzstan, Pamirs
||73 ||40 ||0.45 ||1.30 ||1.003
||0.12 ||4.78 |
|Kyrgyz (Talas) ||41 ||North Kyrgyzstan
||72 ||42 ||0.94 ||3.21 ||1.002
||0.39 ||1.78 |
|Mongolians ||65 ||Ulaanbaatar, Mongolia
||90 ||49 ||0.96 ||3.37 ||1.009
||0.38 ||1.81 |
|Turkmen ||51 ||Uzbekistan/Turkmenistan border
||59 ||42 ||0.67 ||1.84 ||1.006
||0.27 ||3.00 |
|Turkmen ||21 ||Ashgabat, Turkmenistan
||59 ||40 ||0.89 ||3.34 ||1.006
||0.48 ||1.62 |
|Dungans ||22 ||Alexandrovka and Osh, Kyrgyzstan
||78 ||41 ||0.99 ||4.13 ||1.005
||0.82 ||1.10 |
|Kurds ||20 ||Bagyr, Turkmenistan
||59 ||39 ||0.99 ||3.59 ||1.009
||0.80 ||1.11 |
|Uighurs ||33 ||Almaty and Lavar, Kazakhstan
||79 ||42 ||0.99 ||3.72 ||1.007
||0.67 ||1.22 |
|Uighurs ||39 ||South East Kazakhstan
||79 ||43 ||0.99 ||3.79 ||1.008
||0.77 ||1.15 |
|Uzbeks (North) ||40 ||Karakalpakstan
||60 ||43 ||0.96 ||3.42 ||1.005
||0.48 ||1.54 |
|Uzbeks (South) ||28 ||Kashkadarya, Uzbekistan
||66 ||40 ||1.00 ||3.53 ||1.008
||0.93 ||1.04 |
|Tajiks (Yagnobi) ||22 ||Penjikent, Tajikistan
||71 ||39 ||0.87 ||2.69 ||1.012
||0.45 ||1.69 |
Key: the pastoral populations are in the grey area; the farmer populations are in the white area.
This table also includes the sample size, n, and longitude and latitude of the population sampled, as well as the heterozygosity, H, the mean
number of pairwise differences, π, the proportion of singletons, Ps, and the mean number of individuals carrying the same Y STR haplotype, C.
In addition it includes a statistical computation of the demographic growth rate, r.
In contrast to the results obtained from the mtDNA analysis, both the heterozyosity and the mean pairwise differences computed from the Y chromosome
STRs were significantly lower in the pastoral populations than in the farmer populations. Thus Y chromosome diversity has been lost in the pastoral
Conversely calculations of the genetic distance, Rst, between each of the two groups of populations showed that pastoral populations were more
highly differentiated than farmer populations. The supplemental data given online demonstrates that this is not as a result of geographic distance,
there being no perceived correlation between genetic and geographic distance in both population groups.
Finally the rate of demographic growth was found to be lower in pastoral than in farmer populations.
At first sight the results are counter-intuitive. One would expect that the diversity of mtDNA in pastoral societies would be higher than in
farming societies, because the men in those societies are marrying brides who contribute mtDNA from clans other than their own.
Similarly one would expect no great difference in Y chromosome diversity between pastoralists and farmers because both societies are patrilinear.
Leaving aside the matter of immigration, the males who contribute the Y chromosome are always selected from the local sampled population.
To understand the results, Chaix et al investigated the distribution of genetic diversity within individual populations using a statistical
technique called multi-dimensional scaling analysis or MDS. This attempts to sort or resolve a sample into its different component parts, illustrating
the results in two dimensions.
The example chosen in the paper focuses on the Karakalpak On To'rt Urıw arıs. The MDS analysis of the Y chromosome data
resolves the sample of 54 individuals into clusters, each of whom have exactly the same STR haplotypes:
Multidimensional Scaling Analysis based on the Matrix of Distance between Y STR Haplotypes
in a Specific Pastoral Population: the Karakalpak On To'rt Urıw.
Thus the sample contains 13 individuals from the O'mir clan of the Keneges tribe with the same haplotype (shown by the large cross), 10 individuals
of the Qarasıyraq clan of the Man'g'ıt tribe with the same haplotype (large diamond), and 10 individuals from the No'kis clan of the Keneges
tribe with the same haplotype (large triangle). Other members of the same clans have different haplotypes, as shown on the chart. Those close to the
so-called "identity core" group may have arisen by mutation. Those further afield might represent immigrants or adoptions.
No such clustering is observed following the MDS analysis of the mtDNA data for the same On To'rt Urıw arıs:
Multidimensional Scaling Analysis based on the Number of Differences between the Mitochondrial Sequence
in the Same Pastoral Population: the Karakalpak On To'rt Urıw.
Every individual in the sample, including those from the same clan, has a different HVS-1 sequence.
Similar MDS analyses of the different farmer populations apparently showed very few "identity cores" in the Y chromosome data and a total absence
of clustering in the mtDNA data, just as in the case of the On To'rt Urıw.
The overall conclusion was that the existence of "identity cores" was specific to the Y chromosome data and was mainly restricted to the pastoral
populations. This is reflected in the tables above, where we can see that the mean number of individuals carrying the same mtDNA sequence ranges
from about 1 to 1½ and shows no difference between pastoral and farming populations. On the other hand the mean number of individuals carrying
the same STR haplotype is low for farming populations but ranges from 1½ up to almost 5 for the pastoralists. Pastoral populations also have
a lower number of Y chromosome singletons.
Chaix et al point to three reinforcing factors to explain the existence of "identity cores" in pastoral as opposed to farming populations:
Together these factors reduce overall Y chromosome diversity.
- pastoral lineages frequently split and divide with closely related men remaining in the same sub-group, thereby reducing Y chromosome diversity,
- small populations segmented into lineages can experience strong genetic drift, creating high frequencies of specific haplotypes, and
- random demographic uncertainty in small lineage groups can lead to the extinction of some haplotypes, also reducing diversity.
To explain the similar levels of mtDNA diversity in pastoral and farmer populations, Chaix et al point to the complex rules connected with
exogamy. Qazaq men for example must marry a bride who has not had an ancestor belonging to the husband's own lineage for at least 7 generations,
while Karakalpak men must marry a bride from another clan, although she can belong to the same tribe. Each pastoral clan, therefore, is gaining
brides (and mtDNA) from external clans but is losing daughters (and mtDNA) to external clans. Such continuous and intense migration reduces mtDNA
genetic drift within the clan. This in turn lowers diversity to a level similar to that observed in farmer populations, which is in any event
already high. The process of two-way female migration effectively isolates the mtDNA structure of pastoral societies from their social structure.
One aspect overlooked by the study is that, until recent times, Karakalpak clans were geographically isolated in villages located in specific parts
of the Aral delta and therefore tended to always intermarry with one of their adjacent neighbouring clans.
In effect, the two neighbouring clans behaved like a single population, with females moving between clans in every generation. How such social
behaviour affected genetic structure was not investigated.
The Uzbeks were traditionally nomadic pastoralists and progressively became settled agricultural communities from the 16th century onwards. The
survey provided an opportunity to investigate the effect of this transition in lifestyle on the genetic structure of the Uzbek Y chromosome.
Table 2 above shows that the genetic diversity found among Uzbeks, as measured by heterozygosity and the mean number of pairwise differences,
was similar to that of the other farmer populations, as was the proportion of singleton haplotypes. Equally the mean number of individuals carrying
the same Y STR haplotype was low (1 to 1½), indicating an absence of the haplotype clustering (or "identity cores") observed in pastoral
populations. The pastoral "genetic signature" must have been rapidly eroded, especially in the case of the northern Uzbeks from Karakalpakstan,
who only settled from the 17th century onwards.
Two reasons are proposed for this rapid transformation. Firstly the early collapse and integration of the Uzbek descent groups following their
initial settlement and secondly their mixing with traditional Khorezmian farming populations, which led to the creation of genetic admixtures of
the two groups.
Of course the Karakalpak On To'rt Urıw have been settled farmers for just as long as many Khorezmian Uzbeks and cannot in any way be strictly
described as pastoralists. Indeed the majority of Karakalpak Qon'ırats have also been settled for much of the 20th century. However both
have strictly maintained their traditional pastoralist clan structure and associated system of exogamous marriage. So although their lifestyles have
changed radically , their social behaviour to date has not.
Discussion and Conclusions
The Karakalpaks and their Uzbek and Qazaq neighbours have no comprehensive recorded history, just occasional historical reports coupled
with oral legends which may or may not relate to certain historical events in their past. We therefore have no record of where or when the
Karakalpak confederation emerged and for what political or other reasons.
In the absence of solid archaeological or historical evidence, many theories have been advanced to explain the origin of the Karakalpaks.
Their official history, as taught in Karakalpak colleges and schools today, claims that the Karakalpaks are the descendants of the original
endemic nomadic population of the Khorezm oasis, most of whom were forced to leave as a result of the Mongol invasion in 1221 and the
subsequent dessication of the Aral delta following the devastation of Khorezm by Timur in the late 14th century, only returning in
significant numbers during the 18th century. We fundamentally disagree with this simplistic picture, which uncritically endures with high-
ranking support because it purports to establish an ancient Karakalpak origin and justifies tenure of the current homeland.
While population genetics cannot unravel the full tribal history of the Karakalpaks per se, it can give us important clues to
their formation and can eliminate some of the less likely theories that have been proposed.
The two arıs of the Karakalpaks, the Qon'ırat and the On To'rt Urıw, are very similar to each other genetically, especially in the
male line. Both are equally close to the Khorezmian Uzbeks, their southern neighbours. Indeed the genetic distances between the different
populations of Uzbeks scattered across Uzbekistan is no greater than the distance between many of them and the Karakalpaks. This
suggests that Karakalpaks and Uzbeks have very similar origins. If we want to find out about the formation of the Karakalpaks we should
look towards the emergence of the Uzbek (Shaybani) Horde and its eastwards migration under the leadership of Abu'l Khayr, who united much
of the Uzbek confederation between 1428 and 1468.
Like the Uzbeks, the Karakalpaks are extremely diverse genetically. One only has to spend time with them to realize that some look European,
some look Caucasian, and some look typically Mongolian. Their DNA turns out to be an admixture, roughly balanced between eastern and
western populations. Two of their main genetic markers have far-eastern origins, M9 being strongly linked to Chinese and other Far Eastern
peoples and M130 being linked to the Mongolians and Qazaqs. On the other hand, M17 is strong in Russia, the Ukraine, and Eastern Europe,
while M89 is strong in the Middle East, the Caucasus, and Russia. M173 is strong in Western Europe and M45 is believed to have originated
in Central Asia, showing that some of their ancestry goes back to the earliest inhabitants of that region. In fact the main difference
between the Karakalpaks and the Uzbeks is a slight difference in the mix of the same markers. Karakalpaks have a somewhat greater bias
towards the eastern markers. One possible cause could be the inter-marriage between Karakalpaks and Qazaqs over the past 400 years, a theory
that gains some support from the close similarities in the mitochondrial DNA of the neighbouring female Karakalpak Qon'ırat and Qazaqs
of the Aral delta.
After the Uzbeks, Karakalpaks are next closest to the Uighurs, the Crimean Tatars, and the Kazan Tatars, at least in the male line. However
in the female line the Karakalpaks are quite different from the Uighurs and Crimean Tatars (and possibly from the Kazan Tatars as well).
There is clearly a genetic link with the Tatars of the lower Volga through the male line. Of course the Volga region has been closely linked
through communications and trade with Khorezm from the earliest days.
The Karakalpaks are genetically distant from the Qazaqs and the Turkmen, and even more so from the Kyrgyz and the Tajiks. We know that the
Karakalpaks were geographically, politically, and culturally very close to the Qazaqs of the Lesser Horde prior to their migration into
the Aral delta and were even once ruled by Qazaq tribal leaders. From their history, therefore, one might have speculated that the Karakalpaks
may have been no more than another tribal group within the overall Qazaq confederation. This is clearly not so. The Qazaqs have a quite
different genetic history, being far more homogenous and genetically closer to the Mongolians of East Asia. However as we have seen, the
proximity of the Qazaqs and Karakalpaks undoubtedly led to intermarriage and therefore some level of genetic exchange.
Karakalpak Y chromosome polymorphisms show different patterns from mtDNA polymorphisms in a similar manner to that identified in certain
other Central Asian populations. This seems to be associated with the Turkic traditions of exogamy and so-called patrilocal marriage.
Marriage is generally not permissible between couples belonging to the same clan, so men must marry women from other clans, or tribes, or
in a few cases even different ethnic groups. After the marriage the groom stays in his home village and his bride moves from her village
to his. The result is that the male non-recombining part of the Y chromosome becomes localized as a result of its geographical isolation,
whereas the female mtDNA benefits from genetic mixing as a result of the albeit short range migration of young brides from different clans
One of the most important conclusions is the finding that clans within the same tribe show no sign of genetic kinship, whether the tribe
concerned is Karakalpak, Uzbek, Qazaq, or Turkmen. Indeed among the most settled ethnic groups, the Uzbeks and Karakalpak On To'rt Urıw,
there is very little kinship even at clan level. It seems that settled agricultural communities soon lose their strong tribal identity and
become more openminded to intermarriage with different neighbouring ethnic groups. Indeed the same populations place less importance on their
geneaology and no longer maintain any identity according to lineage.
It has generally been assumed that most Turkic tribal groups like the Uzbeks were formed as confederations of separate tribes and this is
confirmed by the recent genetic study of ethnic groups from Karakalpakstan. We now see that this extends to the tribes themselves, with an
absence of any genetic link between clans belonging to the same tribe. Clearly they too are merely associations of disparate groups, formed
because of some historical reason other than descent. Possible causes for such an association of clans could be geographic or economic, such
as common land use or shared water rights; military, such as a common defence pact or the construction of a shared qala; or perhaps political,
such as common allegiance to a strong tribal leader.
The history of Central Asia revolves around migrations and conflicts and the formation, dissolution, and reformation of tribal confederations,
from the Saka Massagetae and the Sarmatians, to the Oghuz and Pechenegs, the Qimek, Qipchaq, and Karluk, the Mongols and Tatars, the White and
Golden Hordes, the Shaybanid and Noghay Hordes, and finally the Uzbek, Qazaq, and Karakalpak confederations. Like making cocktails from
cocktails, the gene pool of Central Asia was constantly being scrambled, more so on the female line as a result of exogamy and patrilineal
The same tribal and clan names occur over and over again throughout the different ethnic Qipchaq-speaking populations of Central Asia,
but in different combinations and associations. Many of the names predate the formation of the confederations to which they now belong,
relating to earlier Turkic and Mongol tribal factions. Clearly tribal structures are fluid over time, with some groups withering or
being absorbed by others, while new groups emerge or are added.
When Abu'l Khayr Sultan became khan of the Uzbeks in 1428-29, their confederation consisted of at least 24 tribes, many with smaller
subivisions. The names of 6 of those tribes occur among the modern Karakalpaks. A 16th century list, based on an earlier document,
gives the names of 92 nomadic Uzbek tribes, at least 20 of which were shared by the later breakaway Qazaqs. 13 of the 92 names also
occur among the modern Karakalpaks.
Shortly after his enthronement as the Khan of Khorezm in 1644-45, Abu'l Ghazi Khan reorganized the tribal structure of the local Uzbeks
into four tüpe:
|Tüpe||Main Tribes||Secondary Tribes
|On Tort Urugh||On To'rt Urıw||Qan'glı|
|Durman, Yüz, Ming|
Shaykhs, Burlaqs, Arabs
| || ||Uyg'ır|
8 out of the 11 tribal names associated with the first three tüpe are also found within the Karakalpak tribal structure.
Clearly there is greater overlap between the Karakalpak tribes and the local Khorezmian Uzbek tribes than in the Uzbek tribes in general.
The question is whether these similarities pre-dated the Karakalpak migration into the Aral delta or are a result of later Uzbek influences?
We know that the Qon'ırat were a powerful tribe in Khorezm for Uzbeks and Karakalpaks alike. They were mentioned as one of the Karakalpak
"clans" on the Kuvan Darya [Quwan Darya] by Gladyshev in 1741 along with the Kitay, Qipchaq, Kiyat, Kinyagaz-Mangot (Keneges-Man'g'ıt), Djabin, Miton,
and Usyun. Munis recorded that Karakalpak Qon'ırat, Keneges, and Qıtay troops supported Muhammad Amin Inaq against the Turkmen in 1769.
Thanks to Sha'rigu'l Payzullaeva we have a comparison of the Qon'ırat tribal structure in the Aral Karakalpaks, the Surkhandarya Karakalpaks,
and the Khorezmian Uzbeks, derived from genealogical records:
The different status of the same Qon'ırat tribal groups among the Aral and Surkhandarya Karakalpaks and the Khorezmian Uzbeks
| Khorezmian |
|Qostamg'alı||clan||branch of tribe|| |
|Qanjıg'alı||tiıre||branch of tribe||tube|
|Shu'llik||division of arıs||clan|| |
|Tartıwlı||tiıre||branch of tribe||clan|
|Sıyraq||clan||branch of clan|| |
|Qaramoyın||tribe||branch of clan|| |
A tube is a branch of a tribe among the Khorezmian Uzbeks and a tiıre is a branch of a
clan among the Aral Karakalpaks.
The Karakalpak enclave in Surkhandarya was already established in the first half of the 18th century, some Karakalpaks fleeing
to Samarkand and beyond following the devastating Jungar attack of 1723. Indeed it may even be older - the Qon'ırat have a
legend that they came to Khorezm from the country of Zhideli Baysun in Surkhandarya. This suggests that some Karakalpaks had
originally travelled south with factions from the Shaybani Horde in the early 16th century. The fact that the Karakalpak
Qon'ırats remaining in that region have a similar tribal structure to the Khorezmian Uzbeks is powerful evidence that the tribal
structure of the Aral Karakalpaks had broadly crystallized prior to their migration into the Aral delta.
The Russian ethnographer Tatyana Zhdanko was the first academic to make an in-depth study of Karakalpak tribal structure. She
not only uncovered the similarities between the tribal structures of the Uzbek and Karakalpak Qon'ırats in Khorezm but also the
closeness of their respective customs and material and spiritual cultures. She concluded that one should not only view the
similarity between the Uzbek and Karakalpak Qon'ırats in a historical sense, but should also see the commonality of their present-
day ethnic relationships. B. F. Choriyev added that "this kind of similarity should not only be sought amongst the Karakalpak
and the Khorezmian Qon'ırats but also amongst the Surkhandarya Qon'ırats. They all have the same ethnic history."
Such ethnographic studies provide support to the findings that have emerged from the recent studies of Central Asian genetics.
Together they point towards a common origin of the Karakalpak and Uzbek confederations. They suggest that each was formed out of
the same melange of tribes and clans inhabiting the Dasht-i Qipchaq following the collapse of the Golden Horde, a vast expanse
ranging northwards from the Black Sea coast to western Siberia and then eastwards to the steppes surrounding the lower and middle
Syr Darya, encompassing the whole of the Aral region along the way.
Of course the study of the genetics of present-day populations gives us the cumulative outcome of hundreds of thousands of years of
complex human history and interaction. We now need to establish a timeline, tracking genetic changes in past populations using the
human skeletal remains retrieved from Saka, Sarmatian, Turkic, Tatar, and early Uzbek and Karakalpak archaeological burial sites. Such
studies might pinpoint the approximate dates when important stages of genetic intermixing occurred.
Sha'rigu'l Payzullaeva recalls an interesting encounter at the Regional Studies Museum in No'kis during the month of August 1988. Thirty-eight
elderly men turned up together to visit the Museum. Each wore a different kind of headdress, some with different sorts of taqıya,
others with their heads wrapped in a double kerchief. They introduced themselves as Karakalpaks from Jarqorghan rayon in Surkhandarya
viloyati, just north of the Afghan border. One of them said "Oh daughter, we are getting old now. We decided to come here to see our
homeland before we die."
During their visit to the Museum they said that they would travel to Qon'ırat rayon the following day. Sha'rigu'l was curious to know why
they specifically wanted to visit Qon'ırat. They explained that it was because most of the men were from the Qon'ırat clan.
One of the men introduced himself to Sha'rigu'l: "My name is Mirzayusup Khaliyarov, the name of my clan is Qoldawlı. After discovering that
Sha'rigu'l was also Qoldawlı his eyes filled with tears and he kissed her on the forehead.
Bowles, G. T., The People of Asia, Weidenfeld and Nicolson, London, 1977.
Comas, D., Calafell, F., Mateu, E., Pérez-Lezaun, A., Bosch, E., Martínez-Arias, R., Clarimon, J., Facchini, F.,
Fiori, G., Luiselli, D., Pettener, D., and Bertranpetit, J., Trading Genes along the Silk Road: mtDNA Sequences and
the Origin of Central Asian Populations, American Journal of Human Genetics, 63, pages 1824 to 1838, 1998.
Cavalli-Sforza, L. L., Menozzi, P., and Piazza, A., The History and Geography of Human Genes, Princeton University Press,
Chaix, R., Austerlitz, F., Khegay, T., Jacquesson, S., Hammer, M. F., Heyer, E., and Quintana-Murci, L., The Genetic or
Mythical Ancestry of Descent Groups: Lessons from the Y Chromosome, American Journal of Human Genetics, Volume 75, pages
1113 to 1116, 2004.
Chaix, R., Quintana-Murci, L., Hegay, T., Hammer, M. F., Mobasher, Z., Austerlitz, F., and Heyer, E., From Social to Genetic
Structures in Central Asia, Current Biology, Volume 17, Issue 1, pages 43 to 48, 9 January 2007.
Comas, D., Plaza, S., Spencer Wells, R., Yuldaseva, N., Lao, O., Calafell, F., and Bertranpetit, J., Admixture, migrations,
and dispersals in Central Asia: evidence from maternal DNA lineages, European Journal of Human Genetics, pages 1 to 10, 2004.
Heyer, E., Central Asia: A common inquiry in genetics, linguistics and anthropology, Presentation given at the conference
entitled "Origin of Man, Language and Languages", Aussois, France, 22-25 September, 2005.
Heyer, E., Private communications to the authors, 14 February and 17 April, 2006.
Krader, L., Peoples of Central Asia, The Uralic and Altaic Series, Volume 26, Indiana University, Bloomington, 1971.
Passarino, G., Semino, O., Magri, C., Al-Zahery, N., Benuzzi, G., Quintana-Murci, L., Andellnovic, S., Bullc-Jakus, F., Liu, A.,
Arslan, A., and Santachiara-Benerecetti, A., The 49a,f Haplotype 11 is a New Marker of the EU19 Lineage that Traces Migrations
from Northern Regions of the Black Sea, Human Immunology, Volume 62, pages 922 to 932, 2001.
Payzullaeva, Sh., Numerous Karakalpaks, many of them! [in Karakalpak], Karakalpakstan Publishing, No'kis, 1995.
Pérez-Lezaun, A., Calafell, F., Comas, D., Mateu, E., Bosch, E., Martínez-Arias, R., Clarimón, J., Fiori, G.,
Luiselli, D., Facchini, F., Pettener, D., and Bertranpetit, J., Sex-Specific Migration Patterns in Central Asian Populations,
Revealed by Analysis of Y-Chromosome Short Tandem Repeats and mtDNA, American Journal of Human Genetics, Volume 65, pages 208
to 219, 1999.
Spencer Wells, R., The Journey of Man, A Genetic Odyssey, Allen Lane, London, 2002.
Spencer Wells, R., et al, The Eurasian Heartland: A continental perspective on Y-chromosome diversity, Proceedings
of the National Academy of Science, Volume 98, pages 10244 to 10249, USA, 28 August 2001.
Underwood, J. H., Human Variation and Human Micro-Evolution, Prentice-Hall Inc., New Jersey, 1979.
Underwood, P. A., et al, Detection of Numerous Y Chromosome Biallelic Polymorphisms by Denaturing High-Performance
Liquid Chromatography, Genome Research, Volume 7, pages 996 to 1005, 1997.
Zerjal, T., Spencer Wells, R., Yuldasheva, N., Ruzibakiev, R., and Tyler-Smith, C., A Genetic Landscape Reshaped by Recent
Events: Y Chromosome Insights into Central Asia, American Journal of Human Genetics, Volume 71, pages 466 to 482, 2002.
Visit our sister site www.qaraqalpaq.com, which uses the correct transliteration, Qaraqalpaq, rather than the
Russian transliteration, Karakalpak.
Return to top of page