Linguistics 581

Portuguese in xfst: pp
Terminology  

  1. Lexical: cantar+VERB+PresInd+1P+Sing
  2. Surface: canto ("I sing")

Note the arrows in rewrite rules point from the Lexical string toward the surface string. But we call this direction "downward" in xfst:

  1. Upward: Toward lexical string
  2. Downward: Toward surface string
  3. The top: Lexical string
  4. The bottom: Surface string (Sheesh!)
Portuguese
Orthography
 

Lexical: caso: Orthography
Surface: kazu: Pronunciation
When you are finished with this exercise you will know all the rules for pronouncing Portuguese orthography(in Southern Brazilian Portuguese).
Two
Approaches
 

  1. The grammar is one great big (disjunctive) regular expression (after this defines a language), which you load in using "read regex" (used in Trial run of xfst. )
  2. Write an xfst script (recommended by more doctors!). Call it port-pron.xfst (or something with extension ".xfst") and do "source port-propn.xfst"). A script is a just a sequence of xfst commands.

Observation:

  1. xsft allows you to keep more than one FS-network around at a time. This is done by use of a stack.
  2. The network on top of the stack is active and is the one accesssed by "up" and "down" commands.
  3. Some commands like "compose" access the top two networks on the stack. Compose also changes the stack, making its result the top of the new stack.
  4. There are commands for shifting the stack around. or operating on the stack: union net, concatenate net, intersect net. [Section 3.4.3]
  5. The moral: If you take the script option on the Portuguese problem, you should make sure that your script ends with the network that represents the entire grammar on top. For instance this can be done with a compose rule, where the result of composition is the entire grammar.
Pronunciation  

J palatalized d, Like dy is "judge"
C palatalized d, Like dy is "judge"
$ alveopalatal sybilant, like teh phoneme spelled "sh" in English, always written wioth preceding escape character "%"
L palatalized l, spelled "lh", like Italian "gl"
N Spelled "nh", palatal nasal
R Trill, spelled "rr" in words, "r" at beginning.

Orthography  

I am not dealing with the issues of producing actual Portuguese orthography in this execrise. Therefore I will write the vowel "a" with an acute accent as two characters "a'" and "a" with a tilde accent as "a~". I will write cedilla ("c" with a curly-queue) as "k".

Test Data  

A set of Portuguese words in: portuguese-data.

When you have your Portuguese fst loaded, test it with this data:

xfst[1]: apply down < portuguese-data
Can evben the last line of your script.
Rule
ordering
 

The pronunciation of orthographic "c":

  • "c": always pronounced /s/ before "i" or "e"
  • "c": always pronounced /$/ as part of digraph "c h"
  • "C": pronounced /k/ elsewhere

    [ c h -> %$ ]  # "%" is an escape character allowing us to handle "special" characters
                  # like "$"
    .o.
    [ c -> s || [e | i  ] ]
    .o.
    [ c -> k ] # An elsewhere rule
    
  • Deletion  

    Use the symbol "0" to perform deletions:

    [ n h -> N ]
    .o.
    [ h -> 0 ] # Deletion rule.  Use "0".
               # The elsewhere rule for "h"
    
    Complete solutions  

    Soln 1 from back of book.

    Soln 2 from back of book: Preferred.