Terminology
|
 
|
- Lexical: cantar+VERB+PresInd+1P+Sing
- Surface: canto ("I sing")
Note the arrows in rewrite rules point from
the Lexical string toward
the surface string.
But we call this direction "downward" in xfst:
- Upward: Toward lexical string
- Downward: Toward surface string
- The top: Lexical string
- The bottom: Surface string (Sheesh!)
|
Portuguese
Orthography
|
 
|
Lexical: caso: Orthography
Surface: kazu: Pronunciation
When you are finished with this exercise you will
know all the rules for pronouncing Portuguese
orthography(in Southern Brazilian Portuguese).
|
Two
Approaches
|
 
|
- The grammar is one great big (disjunctive) regular expression
(after this defines a language), which you load
in using "read regex" (used in Trial
run of xfst. )
- Write an xfst script (recommended by more doctors!).
Call it port-pron.xfst
(or something with extension ".xfst") and do "source port-propn.xfst"). A script
is a just a sequence of xfst commands.
Observation:
- xsft allows you to keep more than one FS-network around
at a time. This is done by use of a stack.
- The network on top of the stack is active and
is the one accesssed by "up" and "down" commands.
- Some commands like "compose" access the top
two networks on the stack. Compose also changes the stack,
making its result
the top of the new stack.
- There are commands for shifting the stack around.
or operating on the stack: union net, concatenate net,
intersect net. [Section 3.4.3]
- The moral: If you take the script option
on the Portuguese problem, you should make sure that your
script ends with the network that represents the entire grammar on top.
For instance this can be done with a compose rule, where the result
of composition is the entire grammar.
|
Pronunciation
|
 
|
J | palatalized d, Like dy is "judge" |
C | palatalized d, Like dy is "judge" |
$ | alveopalatal sybilant, like teh
phoneme spelled "sh" in English, always written wioth preceding escape character "%" |
L | palatalized l, spelled "lh", like
Italian "gl" |
N | Spelled "nh", palatal nasal |
R | Trill, spelled "rr" in words, "r" at beginning. |
|
Orthography
|
 
|
I am not dealing with the issues of producing actual
Portuguese orthography in this execrise. Therefore
I will write the vowel "a"
with an acute accent as two characters "a'"
and "a" with a tilde accent as "a~". I will write
cedilla ("c" with a curly-queue) as "k".
|
Test Data
|
 
|
A set of Portuguese words in: portuguese-data.
When you have your Portuguese fst loaded,
test it with this data:
xfst[1]: apply down < portuguese-data
Can evben the last line of your script.
|
Rule
ordering
|
 
|
The pronunciation of orthographic "c":
"c": always pronounced /s/ before "i" or "e"
"c": always pronounced /$/ as part of digraph "c h"
"C": pronounced /k/ elsewhere
[ c h -> %$ ] # "%" is an escape character allowing us to handle "special" characters
# like "$"
.o.
[ c -> s || [e | i ] ]
.o.
[ c -> k ] # An elsewhere rule
|
Deletion
|
 
|
Use the symbol "0" to perform deletions:
[ n h -> N ]
.o.
[ h -> 0 ] # Deletion rule. Use "0".
# The elsewhere rule for "h"
|
Complete solutions
|
 
|
Soln 1 from back of book.
Soln 2 from back of book: Preferred.
|