Sounds: The Sound Change Applier

sounds: The Sound Change Applier

This page describes a simple program which can apply a set of sound changes to a lexicon. You can use sounds to help work out a reconstruction for actual languages, to create plausible descendants of a conlang, or in fact to make any structured set of lexical changes to a database of words.

I suggest using the more powerful SCA² instead. It has many more features, works in your browser, and supports Unicode.

The program is available in three forms. Please note: right-click on the links to the executables, pick Save Target As, and save them to your disk.

C source code, which you can compile on any system (I've used it successfully on Windows, Unix, and MacOS), and modify for your own use
a Windows executable.
A Mac executable, which you can download either in MacBinary II or BinHex (hqx) format.

[ Back to the Language Construction Kit ]
[ Back to Metaverse ]

Basic operation

sounds reads a text file containing a lexicon, applies a set of sound changes described in another text file, and outputs the results to a third file (and to the screen).

For instance, sounds will take the two input files on the left and produce the output file on the right:

latin.lex port.sc port.out

lector
doctor
focus
jocus
districtus       
civitatem
adoptare
opera
secundus

V=aeiou
C=ptcqbdgmnlrhs     
F=ie
B=ou
S=ptc
Z=bdg
s//_#
m//_#
e//Vr_#
v//V_V
u/o/_#
gn/nh/_
S/Z/V_V
c/i/F_t
c/u/B_t
p//V_t
ii/i/_
e//C_rV

leitor 	[lector]
doutor 	[doctor]
fogo 	[focus]
jogo 	[jocus]
distrito 	[districtus]
cidade 	[civitatem]
adotar 	[adoptare]
obra 	[opera]
segundo 	[secundus]

The command line

The program can be run with command line parameters (all of which are optional):

sounds latin port control-parameters

The control parameters are one of the following: -p -b -l -f -.

-p tells sounds to print out which rules apply to each word: s-> /_# applies to secundus at 7

-b prints the output with the original word in brackets (suitable for using as the basis of a lexicon with etymologies): leitor [lector]
Without this parameter, the output looks like this: lector --> leitor

-l overrides -b if present) and omits the source word from the output, leaving only output words, like this:
leitor
The resulting output file is suitable for use as a new .lex file. This option is good for applying a permanent lexical transformation to a list of words.

-f directs output to the output file only, rather than to both the screen and the file. This option is useful for very long vocabulary lists.

The first two non-control parameters are taken as filenames: the first gives the name of the .lex file, containing the lexicon; the second gives the name of the .sc file, containing the sound changes or lexical rules to apply. The extensions should be left off.

If filenames are given, the program will run once, against the selected files, and then exit.
If no filenames are given, the program will ask for the input files, produce the output, and ask for more files, continuing till you enter q to quit.

Output will be printed to the screen, and also written to a file name.out where name is the name of the .sc file: in the example above, port.out.

I find myself running the program multiple times, tweaking the rules or the vocabulary in between runs.

Common Windows problems

To use command line parameters you have to have a command line. That means running the program in a command window. Look under Programs/Accessories and run Command Prompt. Now type cd plus the name of the directory where you downloaded sounds-- e.g. cd c:\downloads\. As with all commands in the command prompt, hit Enter. Now you can run sounds as described above.
If sounds says your file could not be read in and you're sure it's there-- you probably have file extensions turned off, and what you think is a .lex or .sc file is really a Notepad (.txt) file. The easiest thing to do is to re-save the file as a real .lex file. In Notepad, for instance, change the "Save as type" dropdown to "All files" instead of "Text documents". Then it won't add .txt to your file name. If you've done this right, the file won't have the Notepad icon, and if you double-click it Windows will ask what app to open it with; select Notepad.

The `.lex` file

The .lex file is just a text file, consisting of a list of words, one per line.

The `.sc` file

The key to the operation of sounds is the .sc file. This text file contains two things: definitions of variables, and a set of rules or sound changes.

Variable definitions should come first, one per line; then sound changes, one per line. A line beginning with * will be taken as a comment and ignored.

Sound change format

Hopefully the format of the rules will be familiar to any linguist. For instance, here's one sound change:
c/g/V_V
This rule says to change c to g between vowels. (We'll see how to generalize this rule below.)

More generally, a sound change looks like this:
x/y/z
where x is the thing to be changed, y is what it changes to, and z is the environment.

The z part must always contain an underline _, representing the part that changes. That can be all there is, as in
gn/nh/_
which tells the program to replace gn with nh unconditionally.

The character # represents the beginning or end of the word. So
u/o/_#
means to replace u with o, but only at the end of the word.

The middle (y) part can be blank, as in
s//_#
This means that s is deleted when it ends a word.

Variables

The environment (the z part) can contain variables, like V above. These are defined at the top of the file. I use capital letters for this, though this is not a requirement. Variables can only be one character long. You can define any variables needed to state your sound changes. E.g. you could define S to be any stop, or K for any coronal, or whatever.

So the variable definition and rule
F=ie c/i/F_t
means that c changes to i after a front vowel and before a t.

You can use variables in the first two parts as well. For instance, suppose you've defined
S=ptc Z=bdg S/Z/V_V
This means that the stops ptc change to their voiced equivalents bdg between vowels. In this usage, the variables must correspond one for one-- p goes to b, t goes to d, etc. Each character in the replacement variable (here Z) gives the transformed value of each character in the input variable (here S). Make sure the two variable definitions are the same length!

A variable can also be set to a fixed value, or deleted. E.g.
Z//V_V
says to delete voiced stops between vowels.

Rule order

Rules apply in the order they're listed. So, with the word opera and the rules
p/b/V_V e//C_rV
the first rule voices the p, resulting in obera; the second deletes an e between a consonant and an intervocalic r, resulting in obra.

The -p command line parameter can assist in debugging rules, since it causes the output to show exactly what rules applied to each word.

Optional elements in the environment

One or more elements in the environment can be marked as optional with parentheses. E.g.
u/ü/_C(C)F
says to change u to ü when it's followed by one or two consonants and then a front vowel.

How to use it

The program is simple-minded and yet powerful... in fact it's powerful in part because it's simple-minded. You can do a lot with these basic pieces.

Input orthography

For instance, you may wonder whether the .lex file should be based on spellings or phonemes. It doesn't matter: the program applies its changes to whatever you give it. In my example I used conventional spellings, but I could just as easily have used a phonemic rendering. Similarly, I wrote the rules to output orthographic Portuguese, simply to make for an easy example. It would be better to output a phonetic representation. This would help us realize that we really need a sound change
k/s/_F
that would handle the change from civitatem with /k/ to cidade with /s/.

The program will handle whatever you put into the .lex and .sc files, including accented characters. If the language you're working with requires a special font, simply edit the source and output files with an editor, using that font. This would allow you to use (say) an IPA font.

To improve my Latin-to-Portuguese file, for instance, I would certainly want to handle vowel length and stress. I might use accented vowels for this. Of course the program knows nothing about phonetics, so you have to remember to define the variables to match how you've set up the .lex file. If you use accented vowels, you will want to change the definition of V.

Using digraphs

Though sound changes can refer to digraphs, variables can't include them. So, for instance, the following rule is intended to delete an i onset following an intervocalic consonant:
i//VC_V
However, it won't affect (say) achior, because the C will not match the digraph ch. You could write extra rules to handle the digraphs; but it's often more convenient to use an orthography where every phoneme corresponds to a single character.

You can write transformation rules at the beginning of your sound change list to transform digraphs in the input file:
ph/f/_

Using `sounds` for conlang development

To create a child language from a parent, create a .lex file containing the vocabulary of the parent, then a .sc file containing the sound changes you want to apply. Now run sounds to generate the child language's vocabulary.

For an example, you can download a vocabulary of Methaiun and the sound changes for Kebreni (right-click!). You can compare this to the Kebreni grammar in Virtual Verduria.

For me, there is a peculiar, intense pleasure in creating a daughter language with a particular feel to it, merely by altering the set of sound changes. All I can think of to compare it to is creating new animals indirectly, by mutating their DNA.

What sort of sound changes should you use? You can examine the history of any language family for ideas. Some common changes that can form part of your repertoire (with some sample sounds rules):

Lenition. Stops become fricatives; unvoiced consonants become voiced; stops erode into glottal stops, or h, or disappear. The intervocalic position is especially prone to change.
S/Z/V_V
Palatalization. Consonants can palatalize before or after a front vowel i e, perhaps ending up as an affricate or fricative.
k/ç/_F
Monophthongization. Diphthongs tend to simplify. This rule is fun to apply after letting the vanished sounds affect adjoining consonants.
i//CV_C
Assimilation. Consonants change to match the place or type of articulation of an adjoining consonant.
D=td m/n/_D
Nasalization. A nasal consonant can disappear, after nasalizing the previous vowel.
Â=âêîôû N=mn V/Â/_N N//Â_
Umlaut. A vowel changes to match the rounding of the next vowel in the word.
u/ü/_C(C)i
Vowel shifts. One vowel can migrate into a free area of the vowel space, perhaps dragging others behind it.
a/&/_ o/a/_ u/o/_
Tonogenesis. One way tones can originate is for voiced consonants to induce the next vowel to be pronounced in a low pitch.
Z=bdgzvmnlr V=aiu L=áíú V/L/Z_
Loss of unstressed syllables.
A=áéíóú V//AC(C)_
Loss of final sounds. This can really mess up your carefully worked out inflectional system.
V//_#

The beauty part of using sounds is that your language will illustrate the Neo-Grammarian principle: sound changes apply uniformly whenever their conditions are met. You may choose to edit the results by hand, however, to simulate the complications of real languages. Analogy can regularize the grammar; words may be borrowed from another dialect where different changes applied; words may be reborrowed from the parent language by scholars.

I pay particular attention to the havoc the sound changes are likely to wreak on the inflectional system. E.g. if a case distinction is maintained in some words and lost in others, it may spread to the second category by analogy.

Sound changes can also result in homonyms. For instance, if you voice intervocalic consonants, meta and meda will merge. You can simply live with this, but if the merger is particularly awkward, the users of the language are likely to invent a new word to replace one of the homonyms. E.g. Latin American Spanish has innovated cocinar 'to cook', since the original cocer has merged with coser 'to sew'.

Using `sounds` to find spelling rules

I've also used sounds to model the spelling rules of English. Here the input file lists the spellings of several thousand English words, and the "sound changes" are rules for turning those spellings into a phonetic representation of how the words sound.

Most people think English spelling is hopeless; but in fact the rules predict the correct pronunciation of the word 60% of the time, and make only minor errors (e.g. insufficient vowel reduction) another 35% of the time.

Here's a discussion of the rules, including the sounds input and output files.