SAMPA

http://www.phon.ucl.ac.uk/home/sampa/home.htm
computer readable phonetic alphabet
CKS is also a computer readable phonetic alphabet but contains digraphs
Unigraf is a computer readable phonetic alphabet which looks better than SAMPA in print and is easier to learn
Un like SAMPA, neither of these English specific notations can represent other languages


 


SAMPA (Speech Assessment Methods Phonetic Alphabet) is a machine-readable phonetic alphabet. It was originally developed under the ESPRIT project 1541, SAM (Speech Assessment Methods) in 1987-89 by an international group of phoneticians, and was applied in the first instance to the European Communities languages Danish, Dutch, English, French, German, and Italian (by 1989); later to Norwegian and Swedish (by 1992); and subsequently to Greek, Portuguese, and Spanish (1993). Under the BABEL project, it has now been extended to Bulgarian, Estonian, Hungarian, Polish, and Romanian (1996). Under the aegis of COCOSDA it is hoped to extend it to cover many other languages (and in principle all languages). Now added (1997): Croatian, Russian.

Unless and until ISO 10646/Unicode is implemented internationally, SAMPA and the proposed X-SAMPA (Extended SAMPA) constitute the best international collaborative basis for a standard machine-readable encoding of phonetic notation.

Note about Unicode: Some fonts and browsers are now capable of handling WGL4, the subset of Unicode needed for the orthography of all the languages of Europe. Test yours by looking at this page, or download an up-to-date browser and a WGL4 font. Unicode SAMPA pages are now available with correct local orthography, for those with this capacity, for Bulgarian, Greek, Hungarian, Polish, and Romanian.

SAMPA basically consists of a mapping of symbols of the International Phonetic Alphabet onto ASCII codes in the range 33..127, the 7-bit printable ASCII characters. Associated with the coding (mapping) are guidelines for the transcription of the languages to which SAMPA has been applied. Unlike other proposals for mapping the IPA onto ASCII, SAMPA is not one single author's scheme, but represents the outcome of collaboration and consultation among speech researchers in many different countries. The SAMPA transcription symbols have been developed by or in consultation with native speakers of every language to which they have been applied, but are standardized internationally.

A SAMPA transcription is designed to be uniquely parsable. As with the ordinary IPA, a string of SAMPA symbols does not require spaces between successive symbols.

SAMPA has been applied not only by the SAM partners collaborating on EUROM 1, but also in other speech research projects (e.g. BABEL, Onomastica) and by Oxford University Press.

In its basic form SAMPA was seen as catering essentially for segmental transcription, particularly of a traditional phonemic or near-phonemic kind. Prosodic notation was not adequately developed. This shortcoming has now been remedied by a proposed parallel system of prosodic notation, SAMPROSA. It is important that prosodic and segmental transcriptions be kept distinct from one another, on separate representational tiers (because certain symbols have different meanings in SAMPROSA from their meaning in SAMPA: e.g. H denotes a labial-palatal semivowel in SAMPA, but High tone in SAMPROSA).

A recent proposal for an extended version of the segmental alphabet, X-SAMPA, would extend the presently agreed conventions so as to make provision for every symbol on the Chart of the International Phonetic Association, including all diacritics. In principle this would make it possible to produce a machine-readable phonetic transcription for every known human language.

The present SAMPA recommendations (as devised for the basic six languages) are set out in the following table. All IPA symbols that coincide with lower-case letters of the Latin alphabet remain the same; all other symbols are recoded within the ASCII range 37..126. In this current WWW document the IPA symbols cannot be shown, but the columns indicate respectively a SAMPA symbol, its ASCII/ANSI number, the shape of the corresponding IPA symbol, and the symbol's meaning or use.

Vowel Phoneme Table for British English (RP) with key words
IPA and SAMPA notation shown for the 21 essential sounds for RP English
(Jones was searching for the minimum number of phonemes and did not include 3 that are listed below)
  6 checked,  6 unchecked,  5 -6 diphthongs, 4 -6 ending with schwa
Chekt - short
Free - long
Difthongs
4 with schwa
ae   { a:  A ai    aI a a@ | ai aI@
at, ax, ask,  cat alms, want, star  5 eye, ice, bite are, care / ire, fire
e    E   ei    eI e  e@   e6
edje get, elbow  3 her, girl, urban ace, ape, vein air, care, there
i      I i:    i oi    oI i:    i@
it, in, index, ill eel, east, very oil, boy, loyal ear, fear, deer
   Q : turned c  o ou   ou     o@
ox, cot awe, call, cost oh, oat, low for, four, floor, more
u   U u:   u ju    yu u    u@
hook, put, book ooze, zulu, zoo you, few, fuse your,  sure
^     V    @ au  Au au  Au@
up, cut ago, sofa, unit out, down our, flower, power
The IPA turned e, turned a, and turned c are unavailable in IPA and Latin-1

Vowels

CKS  Sampa    ASCII      Key          IPA  symbol             Description

 a     A      65    alms    script a        open back unrounded, Cardinal 5, Eng. start
 a.    {     123    ax      æ ligature      near-open front unrounded, Eng. trap
 -     6      54    -       turned a        open schwa, Ger. besser       
 o.    Q      81    ox      turned script a open back rounded, Eng. lot   
 e.    E      69    edje    epsilon         open-mid front unrounded, C3, Fr. même    
 a'    @      64    ago     turned e        schwa, Eng. banana 
'r     3      51    her     rev. epsilon    long mid central, Eng. nurse, her, urban
 i.    I      73    in      small cap I     lax close front unrounded, Eng. kit
 o     O      79    awe     turned c        open-mid back rounded, Eng. thought, awe, all  
 o'    2      50    oat     ø               close-mid front rounded, Fr. deux, oat, silo   
 -     9      57    -       oe ligature     open-mid front rounded, Fr. neuf 
 -     &      38    -       s.c. OE lig.    open front rounded
 u'    U      85    hook    upsilon         lax close back rounded, Eng. foot, hook, put
 -     }     125    -       barred u        close central rounded, Swedish sju
 u.    V      86    up      turned v        open-mid back unrounded, Eng. strut
 -     Y      89    -       small cap Y     lax [y], Ger. hübsch
Consonants
B       66      beta            voiced bilabial fricative, Sp. cabo
C       67      ç               voiceless palatal fricative, Ger. ich
D       68      ð               voiced dental fricative, Eng. then
G       71      gamma           voiced velar fricative, Sp. fuego
L       76      turned y        palatal lateral, It. famiglia 
J       74      left-tail n     palatal nasal, Sp. año 
N       78      eng             velar nasal, Eng. thing  
R       82      inv. s.c. R     vd. uvular fric. or trill, Fr. roi
S       83      esh             voiceless palatoalveolar fricative, Eng. ship
T       84      theta           voiceless dental fricative, Eng. thin
H       72      turned h        labial-palatal semivowel, Fr. huit
Z       90      ezh (yogh)      vd. palatoalveolar fric., Eng. measure
?       63      dotless ?       glottal stop, Ger. Verein, also Danish stød
Length, stress and tone marks
:       58      colon           length mark    
"       34      vertical stroke primary stress 
%       37      low vert. str.  secondary stress             
`       96      (see note)      falling tone                 
'       39      (see note)      rising tone
Note: The SAMPA tone mark recommendations were based on the IPA as it was up to 1989-90. Since then, however, the IPA has changed its symbols for falling and rising tones. These SAMPA tone marks may now be considered obsolete, having in practice been superseded by the SAMPROSA proposals.

Vowels (CKS and SAMPA notation) (Duplicate)

     CKS SAMPA   ASCII   IPA descrip.    Explanation  [need a familiar english equivalent]

     a   A       65      script a        open back unrounded, Cardinal 5, Eng. start

     a.  {       123     æ ligature      near-open front unrounded, Eng. trap

     -   6       54      turned a        open schwa, Ger. besser

     o.  Q       81      turned script a open back rounded, Eng. lot lQt

     e.  E       69      epsilon         open-mid front unrounded, C3, Fr. même

     a'  @       64      turned e        schwa, Eng. banana

     'r  3       51      rev. epsilon    long mid central, Eng. nurse

     i.  I       73      small cap I     lax close front unrounded, Eng. kit

     o   O       79      turned c        open-mid back rounded, Eng. thought

     o'  2       50      ø               close-mid front rounded, Fr. deux   doe?

     -   9       57      oe ligature     open-mid front rounded, Fr. neuf  noof?

     -   &       38      s.c. OE lig.    open front rounded

     u'  U       85      upsilon         lax close back rounded, Eng. foot  fUt

     -   }       125     barred u        close central rounded, Swedish sju

     u.  V       86      turned v        open-mid back unrounded, Eng. strut   strVt, Vp

     -   Y       89      small cap Y     lax [y], Ger. hübsch

Diacritics
(shown with another symbol as an example)
=n      60      inferior stroke         syllabic consonant, Eng. garden
O~      126     superior tilde          nasalization, Fr. bon

Chart of 25 consonants (one cell with a double entry)
voiced 
lenis
unvoiced 
fortis
voiced 
lenis
unvoiced 
fortis
voiced 
lenis
unvoiced 
fortis
b p h `w  hw z s
k g j ch zh sh
d t .l    l .r    r ð  dh th  o
v f .m   m n  | 3 ng w   [uu] [i]
 Note:  In CKS, the numbers within a word are not numbers but letters 2=zh, 5=sh, 3=ng,

Quality Chart
Consonants bilabial labio-dental Dental/Aveolar palatal velar
Plosive
b   p
 
d   t
 
k  g
Nasal
m
 
n
 
 
Fricative
 
f   v
ð   s  th
j  ch
5 [sh] 2 [zh]
Frictionless
w
 
r   l
y
 

The phonemic notation of individual languages
These pages provide a brief outline of the phonemic distinctions in various languages: Bulgarian, Croatian, Danish, Dutch, English, Estonian, French, German, Greek, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, and Swedish.
Extensions
These pages provide extensions of the basic segmental SAMPA: SAMPROSA (prosodic), X-SAMPA (other symbols, mainly segmental).

UCL Phonetics and Linguistics home page, University College London home page.

For queries please contact John Wells by e-mail mailto:j.wells@ucl.ac.uk   or at
John Wells
Department of Phonetics and Linguistics, 
University College London, 
Gower Street, 
London WC1E 6BT.
+44 171 380 7175

Last revised 1998 07 01
http://www.phon.ucl.ac.uk/home/sampa/home.htm