|
Rule Based Writing Systems
Yule's Rules for the Design of Scripts
by Steve Bett and Valerie Rule
Links References Rules Types of Writing Systems
All writing systems have rules.1 Some writing systems are more elgant than others. They have a simpler set of rules that govern how letters are combined to represent strings of segmented speech sounds. The simplest orthographic rule would be the alphabetic rule for grapheme phoneme correspondence.
In most writing systems there are no rules governing what shapes are used to represent different sound categories other than historic precedence and distinctiveness. This article explores rules that go beyond those that have traditionally been used to define a script and a writing system (WS).
The rules for writing systems are often referred to as its orthography or spelling system. There are at least two kinds of spelling rules which limit the alternatives for representing sounds and for representing words. These rules can be thought of as answers to basic questions:
What is a legitimate way to represent the sound?
All WSs have a few ambiguous situations where there is more than one way to represent a sound. In the better systems this is limited to the diphthongs or vowel blends and sounds that are represented by digraphs (2-graphs). The better writing systems minimize these ambiguities and approximate the ideal of one and only one symbol per sound. Such writing systems (most of which interpret the ideal to include using a combination of symbols - digraphs) are called regular and/or phonemic.
What is a legitimate way to represent the word?A phonemic (or broad) transcriptions of speech is all that is needed for a writing system. Users typically already know the words and speech patterns so all that is required is a mnemonic. Phonetic (or narrow) transcriptions indicate stress, syllables, and capture other nuances of sound that enable linguists to describe subtle difference in dialects.
A digraphic notational system is inherently ambiguous. If the graph SH is used to represent the phoneme /sh/ and its component graphs S and H are used to represent other phonemes, there is an ambiguity. Should the letters be interpreted as in BISHOP or as in MISHAP?
Digraphs provide an easy solution to the problem of representing the sound segments in a language with a limited set of basic characters. The traditional "alphabet" consists of 22 to 26 characters. About 20 of the characters have an associated sound that is consistent across several languages. To use this character set to represent the 35 basic sounds and 20 or so diphthongs or blends found in the English language, letter combinations were used to represent new sound categories. In addition, the same letter was used to represent more than one sound.
Since TO (traditional orthography) is a digraphic system, notational systems that retain the digraphs, or ligatures designed to look like the digraphs, will have the closest resemblance and introduce the least amount of "visual shock." Logically, it makes sense to replace the digraph, [sh] with a single graph such as [S]. The phrase, "shoe shine" might become something like "Soo Syn." This may be logical and consistent but to someone brought up on a steady diet of TO, it looks very odd.
The English writing system has a great variety of ways to represent a given sound. There is an average of 13 or more different ways to represent each sound in a word. This is why traditional English orthography is called irregular. There are about 40 different sounds in English and over 560 ways of representing them.
The legitimate way to represent a sound is defined by the rules of the system. While any mark or combination of marks can be used to represent a sound category, practical notational systems typically attempt to stay with the old marking conventions. The biggest obstacle for the orthographer is trying to represent an inconsistent system with a consistent one. No matter what symbol pattern is taken as the model and made the universal spelling pattern, it will not "look right" in every situation. In English, since 60% of the words contain an irregularity. Therefore, a regular representation will look odd 60% of the time.
English is standardized: there is usually only one way to represent (i.e., spell) a particular word.
TO has a variety of spelling patterns which expands the graphemic possibilities and the confusion.
Those who have studied English spelling patterns, such as Paul Hanna, have concluded that TO has too many graphemic options. (patterns for the vowel sounds in *TOOL)
Compare: throw now/ thought out.
Here the roles of ou and ow are reversed To be consistent this would have to be spelled either throw nou | thawght out. or throu now | thought owt. A proposed alternative to the chaos of TO, labeled NF, is shown in the last row below:
|
||||||||||||||||||||||||||||||||||||
|
|
Why now and row don't sound alike: When analyzed individually, English words make sense. *Now is an abbreviation for [nah-owe] which could be represented as [naa-ow] or [no-ow]. *Row and *owe use the w to lengthen the sound and distinguish it from from the o in *bought. The [w] or [uu] as a semi-vowel is pronounced as in hook. *awe + *hook, when combined, sound like /owe/. IPA uses u to represent the sound in *hook and *put. Thus, *owe is transcribed as a diphthong: /ou/.
|
no single consistent pattern will look quite right
It is possible to come up with a variety of consistent representations of sound categories (phonemes) |
Sentences translated into the Nu Folik notation
| throw | out | now | caught | or | fought | that's | not | hard |
| thro | aot | nao | cawt | or | fawt | thqts | naht | hard |
| throu | aut | nau | cot | or | fot | thats | nqt | |
| I | bought | a | hot | new | coat | for | cool | weather |
| Ai | bawt | a | hat | nu | cot | for | cu'l | wethr |
| He | grew | fond | of | the | wand | he | found | |
| Hee | gru | fand | 'v | th | wand | hee | faond | |
| hit | heat | hot | hat | hold | hike | hound | fawn | |
| hit | heet | hat | hqt/haet | hold | haik | haond | fawn |
The less phonemic the writing system, the less it follows an alphabetic rule and the greater the need to memorize the dictionary, i.e., resort to lexical spellings.AUTO CONVERSION WITH BTRSPL: It is now possible to use a computer program to convert an article written in traditional English orthography to several simplified and regularized spelling systems. The problem with early efforts to reform spelling was that the materials that one might want to read were not available in the new notation. Now as long as the material exist in a digital form, it can be read in the orthography of choice.
The chief advantage of a phonemic WS is that it allows someone unfamiliar with the languge to read it aloud (an be understood by a native speaker). The spelling system for a phonemic WS is very simple and can be learned in a few hours since there are less than 60 pairs to be associated and some of the associations are already familiar.
More information on BTRSPL and instructions for downloading.
INDEPENDENT COMPONENTS OF A WS: Language, Character sets, and orthography are all separate components in a writing system. One can use the same character sets with different languages and different orthographies. One can use the same orthography with different character sets. Standard English employs at least 3 different character sets, lower case, upper case, and cursive. Changing the shape of a character need not change the orthography.
The concept of an alphabet is somewhat complicated. It is more than the character set. It also includes mapping rules which can be simple or complicated. In one sense, Portuguese and English use the same alphabet, i.e., they use the same character set and 80% of the characters are associated with more or less the same sound categories. The letters are also in the same "alphabetical" order.
On the other hand English written according to the Portugues orthography would be difficult for those conditioned to traditional English orthography to read. Example: Da hai tcheir in da cho waz peintad blu 'end uait (note) There would also be a few gaps in the Portuguese orthography: some English speech sounds could only be approximated (eg. d for th; end or aend for and). If an alphabet includes the mapping rules, it is dificult to claim that Portuguese and English share the same alphabet.
Ch is a legitimate way in English to represent /sh/ as in *machine. It is almost always the correct way to represent this sound in Portuguese and rarely the correct way in English. *Soke is a legitimate way the represent the sound /souk/ but the correct way to spell this particular word is "soak". English is noted for having a number of graphemic options for any given sound. The average number of options, according to Dewey, is 9. In a given case, only one of the graphemic options is correct.
In English, EI, AE, A...E, A, are all legitimate ways to represent the /ei/ sound in English. Any one of these representations can refer to 5 or more other sounds. The letter A, for instance, can represent the sounds /ae, ei, ah, awe, .../.
Writing systems differ according to the language units respresented by each sign:
One meaningful unit (morpheme) <······> one sign
Usually referred to as a logographic system. e.g., Chinese, Arabic numerals.
One syllable <······> one sign
Logographic systems are usually easy to speed read but difficult to learn (1,000-4,000 signs)
The traditional English writing system (TO) is often characterized as morpho-phonemic.
The plural -s is an example of a morpheme. It has 4 pronounciations /es/, /ez/, /s/, and /z/
cats, dogz, horsez, plus a few irregulars such as mice instead of /mausez/
These changes do not have to be noted even in a phonemic script because they are the effects of the contiguous phoneme: One has to work extra hard to pronounce dogs with an /s/.Usually referred to as syllabaries. Ideal for languages with few syllables. Relatively easy to learn.
One significant speech sound (phoneme) <······> one sign
Typically around 80 signs. (5 vowels x 16 consonants) Examples: Cree, Japanese
An English syllabary might have as many as 400 signs.Usually called an alphabetic system - typically 30-40 signs. Examples: Finish, Italian, Portuguese.
Alphabetic systems do not require many signs 12 to 80 are sufficient, depending on the language
An alphabetic system that used digraphs would required fewer signs. 50 sounds could be referenced
with as few as 15 signs. The problem with digraphs is that without ligatures, they would have an
ambiguous interpretation. Examples:A phone is any segment of speech cut out of the continuum for some purpose
A phoneme is a meaningful sound segment or sound category.There are no pure writing systems, they are all mixes. TO has a syllabic R and L and this is also a feature of some more phonemic scripts. Some languages formally recognize syllabic consonants.
The scripts or character sets used in a writing system can also be based on rules.
Rules can be designed to make a script easier to learn, easier to write, and to reduce ambiguity. Rules, when consistently applied, can make a writing system easier to use (an easier to teach).Not all rules make sense. One rule, namely the alphabetic rule, is enough to achieve a spelling accuracy rate of 90% or greater. Traditional Englishy spelling (TO) requires over 100 rules to achieve this accuracy rate. According to Dewey, the forty-one English speech sounds can be spelled 461 different ways in TO but only 41 ways using a phonemic alphabet. DoubleSpell (dBLspel) uses about 70 spellings of 35 basic phonemes plus a number of diphthongs and blends.
A phonemic alphabet is one that applies the same alphabetic rule in transcribing speech sounds over 90% of the time. TO applies the same alphabetic rule 50% to 70% of the time, depending on how this is calculated. 49% of the most frequent 100,000 words in English contain at least one irregularity. TO simply has too many graphemic options: There are just too many ways to spell a word that are sometimes right but often wrong.
The fewer the number of rules to achieve a particular goal, the better. Consistent application of the alphabetic rule eliminates the need for rules such as "no silent letters" and "no duplicate letters". The first three rules in the list below are there because they indicate why some non-phonemic scripts are better than others.
Most of those in the Simplified Spelling Society would be satisfied if the English writing system could be reformed to the point where it was on par with the best or most phonemic writing systems used by other European countries. In one sense, this is not a particularly ambitious goal. The dictionaries already have a pronunciation guide so all that is required is just to spell the word according to the guide. To do this there needs to be a symbol for 35 basic phonemes. Such systems exist.
The more ambitious goal is to reform the Roman alphabet. This extended reform takes three routes:
(1) augmenting the inventory of shapes so there are enough unique shapes to represent all English speech sounds without resorting to digraphs (or 2 letter combinations) and
Such mnemonics have been used by Laubach and other literacy experts to teach the Roman alphabet. There is a difference, however, between saying A is for the sound in *apple and deforming a picture of an apple so it looks someing like the letter a and saying that /ae/ is *ax and the letter ax looks like an ax. The difference shows up when we move to one of the other five sounds that are commonly represented by the A shapes. A is for *ape ... vs. /ei/ is *avian and the letter avian looks like a bird. (In one of the Laubach text for Spanish speakers A is for *ala (wing) because *avia was not in common use except as a brand name for a shoe.)(2) Re forming the shapes of the letters so the ones for similar sounds actually look similar. Some similarity is already there: the upper case P looks something like a vertically flipped (mirror image) variant of b. The lower case r looks something like a vertically flipped letter L. The s looks something like a horizontally flipped z. This type of relationship would have to be extended to other affinities: td, mn, ... (Pitman used these relationships to reduce his shorthand character set, e.g., p and b both had the same shape, one was just bolder than the other).
3) Literally re-forming the shapes used for letters so they are easier to recognize as corresponding to the assigned letter name. The purpose of these reforms would be to make the alphabet easier to learn, remember, and use.
Using A for *apple and *ape is a little confusing compared to
is ax and
is the letter avian. note
The observation that the shapes of our letters are not ideal is not a new one. Some of the earliest scholars on symbol systems thought there should be something more than an arbitrary connection between the shapes of the letters and the sounds they signify. In 1668, Bishop Wilkens remarked, "[Letter shapes] should be the most simple and facil and yet elegant and comely... There should be some kind of correspondence between the figure [or shape] and the nature and kind of the letters which they express." Two correspondence rules are proposed at the end of the following chart.
| Rules (Fig. 2) | TO | CS | NS | PBA | PMF |
| Few silent letters
CS retains a few "magic" e's NS uses a silent e as a marker |
X | X | X | X | |
| No silent letters |
|
|
|
|
|
| Greater regularity (Consistency)
e.g., gem = jem, igh = y, phone = fon |
|
|
|
|
|
| No duplicate letters (e.g., c, q, x in TO) |
|
|
|
||
| Phonemic (the alphabetic rule) | X | X | X | ||
| No Ambiguity (distinctive shapes - no digraphs) | X | X | |||
| Single stroke (monoline) letter shapes | X | X | |||
| Similar sounds have similar shapes | X | X | |||
| Name suggests sound (acrophonic rule) | X | ||||
| Name suggests shape (pictographic rule) | X | ||||
| TO=Traditional Orthography, CS=Cut Spelling, NS=New Spelling, PBA=Shavian, PMF=Monofon | |||||
CriteriaInternationl English Speling (a transliteration project)
.
- Beauty, elegance (technical excellence)
- Simplicity
Simplicity is an important consideration. User communities will simplify unless constrained. Typically, it is the writing system, and the scholars who maintain it, that slow change in the direction of simplification. Scholars have little control over the way people speak so the language does change over time and, unless reformed and adjusted, the spelling system continues to deviate from the alphabetic principle.William Smalley in Orthographic Studies, 1964, p. 34f. listed the following criteria
- Maximize motivation for the learner and acceptance by his society. Technical exellence is not enough.
- Maximize the representation of speech. The fullest representation of the acutal spoken language is the ideal. Phonetically accurate, however, does not mean representing each sound in the language. The distinctions needed by a native speaker are not the same as those needed by someone unfamiliar with the language.
- Maximize the ease of learning. Complexity is bad.
- Maximize transfer. If the symbol is used with a different value in other writing systems, there is negative transfer.
- Maximize eas of production. Is the script easy to type and print?
1. Omiting letrs surplus to representation of meaning and pronunsiation. TION=sh'n
a. omiting doubl leters (letter becomes letr)
2. Consistent consonant spelings. (phone=fon, gem=jem, ...)
b. incorporating the syllabic use of letters
c.3. Reduction of vowel spellings from around 318 to about 40. (Are there that many? See Dewey)
If so it seems unlikely that this amount of reduction is feasible.
New Spelling New Follick ate| a:t| AE, a eat| i:t| EE, e i | ai | IE, i o| ou| au, ou, o ute| iu:t| uet A, ei, ey E, ie, ee I, ai, i..e O, ow, oh U, iu, yu 4. Choosing the sight words (logograms) to be retained. (See Anglic's list of 40: the, of, want)
5.Gaining acceptance of the prposed alternative spellings from at least one dictionary publisher
6. Deciding what clipped spelling forms can be mixed with TO without causing distress
7. Developing an Implementation a plan based on assessments 'what the market wil bear' or how much change the general public is likely to tolerate
- which means some temporary mixing with old TO forms acording to th situation. cf the thousands of alternativ spellings alredy in dictionarys
A Non Roman Augmented Script (Shaw Alphabet)
One of the best known attempts to re-form the Roman alphabet is known as Shavian, a new character set developed by Kingsley Read supported by grant from G.B. Shaw. The Shaw alphabet or Shavian was a constructed alphabet that went beyond the rules that constrained the shapes in other phonemic scripts.The rules and features that account for the success of Shavian are ones that tend to divorce it from T.O. Taking his cue from statements made by Shaw and the rules of the Shaw alphabet competition, Read made no attempt to use history as a guide in assigning sound values to his symbol set. The resulting disconnect from traditional sound-shape associations reflected Shaw's wishes: "The new alphabet must be so different from the old that no one could possibly mistake the new spelling for the old." (Shaw, 1941, p. 39) Shaw was perhaps overly concerned that those who used simplfied spelling and Roman characters would be seen as ignorant.
Shavian is a rule based constructed script. When most people use the term: rule based script, they are usually referring to the alphabetic rule (e.g. Yule, 1982). When an English passage is written in a script that adheres to the alphabetic rule, it can usually be read (slowly) but it doesn't look much like traditional English (TO).
Tthx faaloe.ng iz riten ue.zng x script caald Neu Spelng
As shown above, several rules were used in the construction of Shavian including the rule: Similar sounds should have similar shapes. Following this rule in addition to the alphabetic rule typically leads to an even more non-traditional look. One cannot drop or suspend the rules used to construct Shavian if one hopes to retain its reported beneifits.
Some of those who have studied the subject (Notably Wilkins, Pitman, Read, and Shaw) thought that the shape of the sound sign should derive from some internal logic and not be constrained by the imperfections of any existing orthography or letter set.
Wilkins (1668) after listing the problems with all existing orthographies, recommended: [Letter shapes] should be the most simple and facil and yet elegant and comely. They must be sufficiently distinguished from one another. There should be some kind of correspondence between the figure [or shape] and the nature and kind of the letters which they express.
References (starter bibliography)
Bett, Steve T. (1997) Rule Based Writing Systems. (unpublished but available on the web)
Bodmer, Frederick. (1944) The Loom of Language. NY: W.W. Norton.
Dewey, Godfrey. (1971) English Spelling: Roadblock to Reading. New York: Teachers College Press.
Gleason, H.A. (1961) An Introduction to Descriptive Linguistics. NY: Henry Holt & Co. (The Phoneme)
Hockett, Charles F. (1958) A Course in Modern Linguistics. NY: Macmillan (Signalling via sound: Phonology).
Pitman, Sir James, and John St. John. (1960?) Alphabets and Reading. London. Pitman
Smalley, Wiliam. (1964) Orthographic Studies (Vol. VI) Articles on New Writing Systems. United Bible Societies.
Wilkins, John (1668) Visible Speech from An Essay Towards a Real Character and Philosophical Language , London, Royal Society.
© 1997 BETA Bett Educational Technology Associates div. of OUI, Inc.