Indo-European Daughter Languages: Tocharian
by Edward Dawson & Peter Kessler, 15
September 2017.Updated 15 January 2019
The Tocharians are perhaps the most mysterious of
all of the Indo-European branches. Thankfully, recent DNA evidence
has provided a vital ingredient when it comes to telling their
story but, despite this, it is a somewhat complicated story.
The core Indo-Europeans began to separate into
definite proto languages around 3000 BC, during an expansion phase
which is known as the Yamnaya horizon. These proto languages soon
became unintelligible to each other, although this fragmenting
process excludes the Anatolian branch of IEs who had already headed
southwards from the Pontic-Caspian steppe (see the feature, A
History of Indo-Europeans, Migrations and Language, for more
The western or centum language section of
Indo-Europeans (IEs) would evolve into Celtic, Italic, Venetic,
Illyrian, Ligurian, Vindelician/Liburnian and Raetic branches. This
group appears to be associated with a specific Y-DNA haplogroup
called R1b. A related Y-DNA haplogroup - R1a - is associated with
eastern or satem IE languages. It's the Indo-Iranian/Indo-Aryan,
Baltic, and Slavic groups which fall into this latter grouping.
Map 3 from the earlier feature on Indo-European (IE) language and
migration shows IE migration out of the Pontic-Caspian steppe by
around 3000 BC, with the centum-speaking Tocharians
apparently being edged ever eastwards by satem speakers
who were also expanding into the east (click or tap on map to view full
A History of Indo-Europeans
RULERS OF THE FAR EAST:
Peering at the Tocharians through Language
The United Sites of Indo-Europeans
Studies in the History and Language of the Sarmatians
Linguistics Research Center, University of Texas at Austin
Indo-European Chronology - Countries and Peoples
Indo-European Etymological Dictionary (J Pokorny)
Ancient History Encyclopaedia
Two groups, however, do not fit perfectly into that tidy pair of
east and west IE boxes. One of these involves the Germanic language
speakers, who appear to have been founded by R1a/satem people
but with a very mixed subsequent heritage. The other anomaly, one
which appears early in the Yamnaya horizon, involves a western group
which apparently decided to be different from all the others and
head eastwards. It is this group which evolved into the Tocharian
branch of Indo-Europeans.
This eastwards migration by the Tocharians can be
referred to as their u-turn migration.
A favourite current theory is that the satem
(eastern) languages evolved in the core Indo-Europeans on the
Caspian steppe after the departure both of the West IEs and the
Tocharians. Both of these latter divisions would have been left
with an older, centum version of the language which did
not receive the same later influences that the satem version
If this is correct then a u-turn theory in which
the Tocharians initially headed west and then changed direction to
head east would be a very realistic one because the Anatolians were
the first to detach themselves from the Indo-European core, and they
also spoke a centum language which did not show those later
influences. In fact, they left early enough to miss even some of the
later centum influences.
Another theory, based on the DNA evidence, points
toward IEs around 3000 BC being divided into two main groups. These
would have been steppe dwellers who were speaking centum
dialects and who bore the R1b Y chromosome and, to the north of them
in the forests and forest-steppe, satem dialect speakers who
bore the R1a Y-chromosome. The problem for their centum
neighbours is that in this theory the satem group moved south
once they had the benefit of horse riding, and they proceeded to
occupy swathes of the former group's territory. It seems very
unlikely that this process occurred peacefully!
West IEs in the east?
More specifically focussing on the Tocharians,
it was the increasing realisation that they appeared to have a
very odd history that confirmed their West Indo-European origins
despite being the most eastern of IEs. It has become likely that
they were amongst those centum speakers who may have been
forced out by satem speakers appropriating their territory.
However, where the Tocharians are concerned, it's
never quite that simple.
Their language showed elements both of the eastern
satem/R1a and western centum/R1b influences. Working
out how this may have happened is the tricky part of any examination
of the Tocharians, but an intriguing possibility is that they ended
up being a hybrid people who were made up of various elements of
multiple Indo-European groups, scooping up more followers as they
passed through West IE, South IE, and East IE groups.
Tocharian is, at its core, a centum language
- just like Indo-European languages in the west - despite its Far
Eastern setting. The most reasonable likelihood for the hybridisation
process is that a specific group took over other groups, and they
all adopted the most dominant language variant whilst also picking
up influences from the later arrivals. The key to understanding who
conquered whom lies in the male lineage and therefore in the Y-DNA.
A vital tool in helping to solve the Tocharian
mystery was the discovery of the 'Tarim mummies', a series of
mummified bodies discovered in the Tarim Basin which includes the
Takla Makan Desert (Taklamakan) in its territory. A DNA analysis
of twelve of the earliest mummies has shown that eleven of them
were Caucasoid men who possessed Y-DNA belonging to the R1a group,
making them eastern, satem speakers. For this region and
time such a finding would be very normal.
From this fact it can be postulated that a group
of nomadic satem/R1a types, most likely a group of IEs who
were closely related to the later Indo-Iranians, conquered other
groups as they progressed eastwards. They may have overcome many
small groups, including a more sizable population of centum/R1b
types, as they also headed east. Therefore the original
centum-speaking Tocharians would seem to have fallen under
the control of a more dominant group of satem speakers -
easy enough with the Tocharians passing through the eastern steppe
which was already full of satem speakers.
The predominance of R1a (eleven out of twelve
mummies) in the limited sample points to R1a satem males
being responsible for mating with centum-speaking women.
That finding makes it likely that the women were either brides
from centum groups, or that they had been captured in
raids or warfare.
Tocharians in relation to archaeological
Most studies of IE sequencing put the separation
of Tocharian after that of Anatolian and before any other branch.
The rather notable migration from around 3500 BC which created the
Afanasevo culture meets that expectation, with a section of the
Volga-Ural steppe population making its way eastwards across
Kazakhstan, covering a distance of more than two thousand kilometres
to reach the Altai Mountains.
This then, was the Tocharian migration in its
original form. Whether its people were satem-speaking men who
had already collected a population of centum-speaking wives
either as prizes or through trade and intermarriage, or
centum-speakers who were later dominated by
satem-speakers is unclear. What would have happened though
was that these wives would have raised any children they had, and
would have taught them their own language alongside whatever basic
satem influences they may have needed. These early Tocharians
were already centum-speaking hybrids.
Although that is theory, it's the most likely
theory. What is certain is that, alongside the hybridisation
process, Tocharians also borrowed heavily from other languages,
probably during their subsequent migrations. We find Sanskrit words
which they adopted due to their later adherence to Buddhist religion,
such words coming from Indo-Aryans who were themselves an offshoot
of the Indo-Iranians - both satem-speakers. Could Tocharian
be heavily hybridised in the manner of modern English with its large
French vocabulary, and its religious-adopted Latin vocabulary? It
certainly seems possible.
Burial mounds in the modern Russian region of Khakhassia can be
marked with small standing stones as shown here, with this area
being a core part of the territory of the Afanasevo culture
The United Sites of Indo-Europeans website rounds off much of
the discussion with the following (with additions in italics
for the text which was not written by a native English-speaker):
This group is perhaps the least studied in all
of the Indo-European macro-family. It consists of two dead
languages, Tocharian A (or Agnean) and Tocharian B (or Kuchanian).
These were spoken in the first millennium AD in East Turkestan,
in several cases in which inscriptions and texts written in
these languages were found.
The routes and methods used in Tocharic
migrations from the Middle East to East Asia are still
unknown. The languages show many borrowings from early Iranian
languages, archaic Finno-Ugric (of the Uralic family), and even
Tibetan-like forms, but the structure itself shows much similarity
with Germanic languages primarily, and also with Balto-Slavic
Linguists think that Tocharians moved
through Central Asia from west to east and, on their way, had a
large number of linguistic contacts which were reflected
in their tongue. Before these migrations, it being a dialect
in the proto-Indo-European community, Tocharians must have
communicated closely with future Anatolians and Italo-Celts.
In truth, the Y-DNA results from the Tarim mummies
were quite a surprise. Whilst the general expectation was that they
would be R1b types (centum-speakers), they were anything but
that, being R1a satem types. As discussed above, this means
that the Tocharian males were descended from the satem-speaking
forest and forest steppe IEs, not the steppe-dwelling,
centum-speaking IEs as was generally expected.
The sense of surprise at the result was despite
the fact that Central Asia was dominated by satem-speaking
Indo-Iranians, while the only centum speakers were the
Tocharians themselves. Primarily the expectation existed because
Indo-Iranians don't seem to have reached as far east as the Tarim
Basin. Simply put, no one expected the Tocharians themselves to
have satem-speaking influences.
However, in language terms, there doesn't appear
to be any evidence of those words in Tocharian A which are used in
Asha (Arte/Rte). This is possibly because the Tocharians separated
from other Indo-Europeans prior to the formulation of Asha; or
alternately that they never had it or were a military elite which
did not include priests among them.
Asha is the modern term for the philosophical
practice of adherence to the truth of what is, what exists. The
word 'Asha' comes from Zoroastrianism. Its ancient names were Rte
among Indians (Indo-Aryan Hindus), and Arte among Iranians. There
are also linguistic pointers toward the philosophy existing amongst
early Germans under the name of Istwae. All of these names are the
verb 'to be', used as nouns.
In addition, the language of Tocharian A seems to
have more in common with Celto-Italic languages than it does the
Avestan/Vedic of Indo-Iranian and Indo-Aryan satem languages.
Many familiar words are contractions, with sounds having been
dropped - a common enough Celto-Italic practice. These contracted
words can come about as a result of a population using a hybrid
language; or it can result from sheer laziness. The latter, if true,
would be another pointer towards a lack of the Asha philosophy, as
Asha is extremely precisionist in character. 
With that examination of Tocharian A in mind, the
theory which sees a satem military elite taking over another,
centum-speaking tribe (or at least its women) seems to be the
only rational explanation for the creation of the hybrid Tocharians
of recorded history. And the take-over happened early enough that
Asha did not yet exist. That date of approximately 3000 BC - or
perhaps a bit later - still looks reasonable for the separation of
Tocharians from other Indo-Europeans, with their dominance by
satem-speaking Indo-Iranian East IEs following on relatively
soon after that.
Tocharian tongues survived for a good three or
four thousand years. By AD 500 they could still be found in Xinjiang
(early home of the Göktürks of this same period), and in the caravan
cities of the Silk Road. By this time they had divided into two
or three quite distinctive languages, all of which exhibited archaic
Indo-European traits. Despite their long journey to the Altai
Mountains, along the Chinese border, and then towards Central Asia,
they were able to maintain a strong identity... and a strong
Here's a perfect example of
why Tocharian is so odd: 'wäl, walo', meaning a prince (IE *wal-,
meaning 'strong, powerful'); 'wäl', meaning 'to die'. The words
'wal, walo', meaning 'strong', can be extended to mean a prince
or king, and this is Celtic form. The Germanic word for Celts
probably derives from it. The word 'wal', meaning 'to die', is
the Germanic usage, cognate with English and German 'fall', and
the Norse 'valr', seen in 'valkyrie'. All of this shows that
Tocharian simply must be a hybrid language.
Yardley, John & Heckel, Waldemar -
Epitome of the Philippic History of Pompeius Trogus: Books
11-12, Volume 1, Marcus Junianus Justinus
Anthony, David W - The Horse, the Wheel,
and Language: How Bronze-Age Riders from the Eurasian Steppes
Shaped the Modern World
Pokorny, J - Indo-European Etymological
Dictionary, online database which updates Pokorny's
Indogermanisches Etymologisches Wörterbuch
Ancient History Encyclopaedia
Geochronology - Indo-European Chronology
- Countries and Peoples
Indo-European Chronology - Countries and
Indo-European Etymological Dictionary (J
Linguistics Research Center, University of
Texas at Austin
Peering at the Tocharians through Language
Studies in the History and Language of the
United Sites of Indo-Europeans
Maps and text copyright © Edward Dawson & P L Kessler.
An original feature for the History Files.