Language Change


All modern languages started as regional dialects.

Old English  

The dates:

  1. Old English: 449-1066
  2. Middle English: 1066-1500
  3. Modern English: 1500-1957
  4. Modem English: 1975-??

Historical events:

  1. Old
    1. 449: Saxons invade Britain (bringing along a their Germanic language)
    2. 6th century: religious literature
    3. 8th century Beowulf: pretty unreligious
    4. 1066: Norman Conquest (Battle of Hastings)
  2. Middle
    1. 1387 Canterbury Tales
    2. 1476 Caxton's printing press
  3. Modern
    1. 1564 Birth of Shakespeare
    2. 1611 King James Bible (a translation into the English spoken by the common people at that time)
        Translator's preface to the 1611 Edition

Old English sounded pretty different:

    The Lord's prayer in Olde English
Great English
Vowel Shift

One of the great differences separating Middle English from Modern English is the shift of a whole set of long or tense vowels, affecting the pronunciation of a guge precentage of the wordsa of the language.

Dates; 1400-1600. Lots of disagreement about when it really got going, when it was completed. Whether it happened in stages, whether all the vowels changed simultaneously (over say, 50 years).

Look for a list of changes here.

Here's the basic picture. Most vowels raised.

Some examples affected by the change

Another set, showing some other changes that have happened, and some of the unstressed vowels that were lost in a separate change (Chaucer here means Middle English; Shakespeare means Modern English, before and after the change, respectively):

The shift affected only long or tensevowels. So fish () and sack () stayed the same (short, lax vowels).


William Caxton invented the printing press about 1460.

This is when spelling conventions began to be fixed. (Lots of variation in handwritten manuscripts before this).

This is before the Great Vowel Shift really got underway, so the spelling of English often reflects a state beFORE the shift.

  Suppose we want to know whether French, Italian, Spanish, and Portuguese are related languages.

We look up some words with related meanings AND related pronunications.

    Correspondence Sets
    French Italian Spanish Portuguese English Correspondence
    cher caro caro caro "dear" [sh]-[k]-[k]-[k]
    champ campol campo campo "field" [sh]-[k]-[k]-[k]
    chandelle candela candela candela "field" [sh]-[k]-[k]-[k]
A correspondence is a set of sounds occurring in corresponding positions of cognates in related languages.

The fact that the same correspondence set shows up consistently in all four languages argues for their relatedness. We can hypothesize that a k was the Proto-language source for the first sound in all these words. We write

for reconstructions to distinguish them from ordinary phonetic representations. What they stand for is some single phoneme in the Proto language whose exact pronunciation we can't usually be sure of.

In this case of course we know more than usual.

We know from historical record that French, Italian, Spanish, and Portuguese are all descended from Latin (All started out as regional dialects).

And we know all the words in question are cognate with Latin words that were written with a c, a Latin symbol that usually corresponded to a [k] sound.

So the method is supported by this test case.


Here are the steps in historical reconstruction of a Proto language.

  1. Find correspondence sets
  2. Posit proto sounds (using "majority rules", or common sense exceptions). Do this in two steps, the clear cases first, then the hard cases....
  3. Write down the sound change rules for each of the languages
  4. Reconstruct proto forms.

We illustrate the method of reconstruction with some hypothetical data

    Hypothetical Data
    Language A Language B Language C Language D Correspondence
    hono hono fono vono [h]-[h]-[f]-[v]
    hari hari fari veli [h]-[h]-[f]-[v]
    rahima rahima rafima levima [r]-[r]-[r]-[l]
    hor hor for vol [h]-[h]-[f]-[v]

First pass reconstruction for each correspondence set:

    Reconstruction I
    Language A Language B Language C Language D Proto Language
    h h f v  
    a a a e  
    a a a a  
    i i i i *i
    o o o o *o
    r r r l *r
    n n n n *n
    m m m m *m

Reconstruction: Second Pass

    Reconstruction II
    Language A Language B Language C Language D Proto Language Rule
    h h f v *f A,B: f -> h
    D: f -> v
    a a a e *a D: a -> e\ __ r,v
    a a a a *a  
    i i i i *i  
    o o o o *o  
    r r r l *r D: r -> l
    n n n n *n
    m m m m *m

Why we reconstruct an f:

    Knowledge of sound changes across languages: f -> h is common cross-linguistically. h -> f is not. In addition we know of examples of

    Proto Language
    Language A Language B Language C Language D Proto Language
    hono hono fono vono *fono
    hari hari fari veli *fari
    rahima rahima rafima levima *rafima
    hor hor for vol *for

Using these methods linguists like Jones, Bopp, Rask, and Grim reconstructed a proto language called Proto Indo-European, or just Indo-European, a language which fluorished about 6,000 years ago (but may be as old as 8000 years old).

Their correspondence data looked like this:

Sanskrit   Avestan   Greek    Latin    Gothic     English

pita                 pater    pater    fadar      father 
padam                poda     pedem    fotu       foot 
bhratar              phrater  frater   brothar    brother 
bharami    barami    phero    fero     baira      bear 
jivah      jivo               wiwos    qius       quick 
sanah      hano      henee    senex    sinista    senile 
virah      viro               wir      wair       were(wolf) 
                     tris     tres     thri       three
                     deka     decem    taihun     ten
           satem     he-katon centum   hund(rath) hundred

From which they derived PIE (Proto Indo-European) forms like:

*p@ter-         father
*ped-           foot
*bhrater-       brother
*bher-          carry
*gwei-          live
*sen-           old
*wi-ro-         man     (derived from *wei@- vital force)
*trei-          three
*dekm-          ten
*dkm-tom-       hundred (derived from *dekm- ten)

Here's the language tree they also devised:

Some things to note:

  1. Note the absence of some Eurpean languages, for example, Hungarian and Finnish. The reason is that these languages are NOT Ino-European!
  2. Dead languages that are the ancestors of multiple languages: Latin, Sanskrit
  3. Langauges with no surviving families: Armenian
  4. Dead language families: Anatolian (note especially Hittite), Tocharian (very little remains)
  5. Many language of the Mid East and South Asia are Indo-European (Hindi,Punjabi,Gujarati [India], Pashto [Afghanistan], Persian or Varsi [Iran], Kurdish [Turkey, Iraq]
  6. Germanic. English's closes relative: probably Frisian.
  7. Every node in this tree was a language. The ones is all caps are languages we have no written records of, which exist only in reconstructions. So Latin would be called "Proto-Romance" if we had no written Latin.

A quote from here:

One real triumph of this method of reconstruction was the Laryngeal 
Hypothesis: it was known that there were some troublesome places in 
Indo-European where the sound changes seemed not to be behaving in 
their usual regular way; things were happening to vowels and 
sometimes consonants that couldn't be easily explained based on what 
we saw in the attested languages. Ferdinand de Saussure in the late 
19th century said that there had to be a set of three segments in the 
proto-language that had not survived in any of the daughter languages 
-- he was fairly conservative about claiming what they must have 
been, but he called them laryngeals and pointed out the precise 
locations where they must have occurred. Many years later, when a 
bunch of texts in Turkey were finally decoded and we knew we were 
looking at the ancient Anatolian language Hittite, the oldest 
attested Indo-European language -- voila: there were the laryngeals, 
exactly where Saussure had predicted they must be just on the basis 
of careful reconstruction.

The term language family is reserved for thelargest group of related languages for which we can reliably construct a family tree. Indo-European is a language family. There is no larger group of languages we can reliably relate the Indo-European languages to.

It's important to realize we don't really know what Proto Indo-Eurpean (PIE) sounds like. That's why we call it a Proto-langauge (unlike Latin, which is the ancestor of several modern languages, but which we also have plenty of direct evidence about).

There was no Indo-Eureopean writing system. Probably writing hadn't yet been invented when PIE was spoken, at least not in that part of the world.

Homeland. Theories: India, South Asia, Afghanistan, parts of former Soviet Union. Some theories of the Indo-European homeland

No reconstructable words for camel, tiger, camel. Reconstructable words for wolf and birch. Somewhere with wolves and birches? Some reconstructed PIE words


  1. Indo-European Community Approx: 6,000--4,000 B.C
  2. Indo Europeans in Europe: by about 2500 B.C (archeological evidence: big migrations INTO Europe seem to have ended by about this time).
Other Language

There are many other language families for which extensive reconstruction of a family tree has been performed.

At a minimum there are 4000 languages in the world, at a max 8000.

The number of distinct language families is much harder to give. Linguists disagree a lot on this.

Maybe there's just one language family (Nostratic) in the end.

More conventionally, here are some examples:

  1. Uralic: Hungarian, Finnish, and Estonian go here. Many many other languages spoken across Asia (many in the former Soviet Union).
  2. Afro-Asiatic: Northern Africa and the Middle East. Include Semitic (Hebrew, Arabic, Berber, Ethiopian).
  3. Sino-Tibetan: Asia. Includes Chinese, Burmese, Tibetan.
  4. Niger-Congo: most of the languages of Africa, including Swahili and Zulu.
  5. Austronesian: Madagascar (near Africa), Hawaii, New Zealand,Asia (the Phillipines, Malaysia). These people got around!
Grimm's Law  

As an example of the kind of success the fathers of Indo-Eurpopean had in their reconstruction efforts, we cite Grimm's Law

First the subject matter: Grimm's Law is a about a historical sound change linking Proto-Indo-European with Germanic.

The effects of this sound change can thus directly be observed in Germanic languages like German, English, and Gothic (a dead language, but very important for historical linguists; see Indo-Eurpean tree above).

Here are the changes in table form:

    Grimm's Law
    PIE bh dh gh b d g p t k
    Germanic b d g p t k f th x (or h)
First we assume PIE had something like voiced aspirated stops. Sanskrit had them, Hindi still does, but they're pretty much gone elsewhere, and Grimm's Law was one of the first big steps.

This looks like a lot to remember. But something very regular is going on.

    Grimm's Law
    PIE Voiced Aspirated Stop Voiced Stop Voiceless Stop
    Germanic Voiced Stop Voiceless Stop Voiceless Fricative
In every case the location remains the same or close to it. The manner or voicing changes.

A mnemonic that helps. Think of a chain. Start with the voiced aspirates going to voiced stops: xh -> x. Now the voiced stops in Germanic are getting crowded and need somewhere to go. So they become voiceless. Now the voiceless stops in Germanic are crowded and they need somewhere to go. So they become fricatives. So what happens to the old Germanic fricatives? Well, there weren't any. THis is how they got their start. And the chain ends here.

Sanskrit is often conservative with respect to these changes, so often the simplest way to see evidence of the change is to compare a Sanskrit word with a Germanic cognate, and pretty much any Germanic language will do.

    Grimm's Law
      bh -> b d -> t p -> f d -> t
    Sanskrit bhratar dva- pita padam
    Germanic brotar
    Meaning brother two father foot
of Sound Changes

Historical sound changes are supposed to happen to every word.

By and large they do. One complication is borrowed words. Words borrowed into English after the Vowel shift don't shift.

Sometimes they have exceptions that get explained.

There are exceptions to Grimm's Law. We see one in the table above. The middle "t" of Sanskrit "bhratar" correspnds to a "d" in Gothic "brodar". But wait! "t" is supposed to go to "th". Exception!

This particular exception is quite systematic and has to do with stress. It later got explained by Karl Verner with a different Law (called Verner's Law, not our problem right now).

Sometimes there are outright exceptions. But surprisingly few.

    "I" stands for a back unrounded vowel
    Paviotso (=YP)
    Monachi (=NM)
    mupi mupi "nose"
    tama tawa "tooth"
    piwI piwI "heart"
    sawa?pono sawa?pono "a feminine name"
    nImI nIwI "liver"
    tamano tawano "springtime"
    pahwa pahwa "aunt"
    kuma kuwa "husband"
    wowa?a wowa?a "Indians living to the west"
    mIhI mIhI "porcupine"
    noto noto "throat"
    tapa tape "sun"
    ?atapI ?atapI "jaw"
    papi?i papi?i "older brother"
    patI petI "daughter"
    nana nana "man"
    ?atI ?etI "bow" "gun"


  1. Find correspondence sets
  2. Posit proto sounds (using "majority rules"E, common sense when possible). Do this in two steps, the clear cases first, then the hard cases....
  3. Write down the sound change rules for each of the languages
  4. Reconstruct proto forms.

Correspondence sets:

    Paviotso (=YP)
    Monachi (=NM)
    m w after vowel (a,I,u)
    m m elsewhere (initially)
    w w after back/front vowel
    consonant (h),initially
    p p  
    t t  
    s s  
    n n  
    h h medial only
    k k  
    ? ?  
    u u  
    i i  
    I I  
    o o  
    a e before t,#
    a a everywhere?

Reconstructed Sounds I

    Paviotso (=YP)
    Monachi (=NM)
    m w  
    m m  
    w w  
    p p *p
    t t *t
    s s *s
    n n *n
    h h *h
    k k *k
    ? ? *?
    u u *u
    i i *i
    I I *I
    o o *o
    a e  
    a a  

Reconstructed Sounds II

    Paviotso (=YP)
    Monachi (=NM)
    Proto Rule
    m w *m NM: m -> w \ __ Vowel
    m m *m elsewhere
    w w *w  
    p p *p  
    t t *t  
    s s *s  
    n n *n  
    h h *h  
    k k *k  
    ? ? *?  
    u u *u  
    i i *i  
    I I *I  
    o o *o  
    a e *e YP: e -> a
    a a *a elsewhere
Reconstructed forms
    Paviotso (=YP)
    Monachi (=NM)
    mupi mupi *mupi
    tama tawa *tama
    piwI piwI *piwI
    sawa?pono sawa?pono *sawa?pono
    nImI nIwI *nImI
    tamano tawano t*amano
    pahwa pahwa *pahwa
    kuma kuwa *kuma
    wowa?a wowa?a *wowa?a
    mIhI mIhI *mIhI
    noto noto *noto
    tapa tape *tape
    ?atapI ?atapI *?atapI
    papi?i papi?i *papi?i
    patI petI *petI
    nana nana *nana
    ?atI ?etI *?etI