SCA2 Help

The Sound Change Applier 2 is an updated version of my C program which applies a set of sound changes to a lexicon. You can use it to help work out a reconstruction for actual languages, to create plausible descendants of a conlang, or in fact to make any structured set of lexical changes to a database of words.

This version is written in Javascript, which means it runs in your browser. The advantage is that it supports Unicode, it’ll run on all systems, and you don’t have to mess with ASCII or command lines anymore.

Changes since the old SCA.


Try it out! With the default inputs, hit Apply. You should get an output like this:
As if by magic, a selection of Latin words has turned into Portuguese.

The controls

Here’s what the controls do.

Output format tells how you want each line of the output to look like. The first option just prints each output word; this is good for generating a new list of words (e.g. as input for the next round of changes). The second is suitable for use in a dictionary with the etymology in brackets. The third gives the input and output words in order. (See here for how to add glosses.)

Show differences from last run, if checked, will boldface any changes from the last run when you hit Apply. This can be very useful to see what the effect of a changed rule is. (Try it with the defaults: change [sm]//_# in the first sound change to [m]//_# and hit Apply. You should see several of the words change, now retaining their final s.)

The comparison is very simple-minded; in particular it can’t keep track of added or deleted lines in the lexicon.

Note that if you hit Apply without making any changes, all the bolding is removed (since in fact nothing changed between runs).

Report which rules apply prints a report in the Output section listing every time a rule applies, like this:

u/o/_# applies to districtu at 8
This is useful for understanding why a rule applies (or doesn’t) when you expected the opposite.

Rewrite on output controls whether the rewrite rules should be reversed when writing the output lexicon.

Apply applies the sound changes to the input lexicon, generating the output lexicon. We’ll talk about exactly what that means below.

Javascript, to protect your computer, cannot read or write files. Instead:

Help me! brings up this help file.

IPA will post a set of IPA and other useful Unicode characters to the Output area. You can then copy and paste a character into any of the input boxes.

On Safari and Firefox, Undo will work as it should: you can make a change, hit Apply, and if you don’t like the results, click on the text box you changed and select Undo. This doesn’t work on IE.

Defining sound changes

The Sound Changes box are rules for modifying the input lexicon. Hopefully the format of the rules will be familiar to any linguist. For instance, here’s one sound change:
This rule says to change c to g between vowels. (We’ll see how to generalize this rule below.)

More generally, a sound change looks like this:

that is, the target string is changed to the replacement string within the given environment.

Optionally you can use → in place of the first slash. So the above rule can also be written


The environment must always contain an underline _, representing the part that changes. That can be all there is, as in

which tells the program to replace gn with nh unconditionally.

The character # represents the beginning or end of the word. So

means to replace u with o, but only at the end of the word.

The replacement string can be blank, as in

This means that s is deleted when it ends a word.


The environment can contain variables, like V above. These are defined in the Categories box. I use capital letters for this, though this is not a requirement. Variables can only be one character long (unless you use rewrite rules). You can define any variables needed to state your sound changes. E.g. you could define S to be any stop, or K for any coronal, or whatever.

So the category definition and rule

means that c changes to i after a front vowel and before a t.

You can use variables in the first two parts as well. For instance, suppose you’ve defined

This means that the stops ptc change to their voiced equivalents bdg between vowels. In this usage, the variables must correspond one for one— p goes to b, t goes to d, etc. Each character in the replacement variable (here Z) gives the transformed value of each character in the input variable (here S). If the replacement category is shorter than the target category, the matching input will be deleted.

A variable can also be set to a fixed value, or deleted. E.g.

says to delete voiced stops between vowels.

Rule order

Rules apply in the order they’re listed. So, with the word opera and the rules

the first rule voices the p, resulting in obera; the second deletes an e between a consonant and an intervocalic r, resulting in obra.

Optional elements in the environment

One or more elements in the environment can be marked as optional with parentheses. E.g.

says to change u to ü when it’s followed by one or two consonants and then a front vowel.

New stuff

In addition to Unicode support, the IPA chart, and rewrite rules:

SCA² treats spaces as word boundaries. So if you have a rule

then it will not only turn kima to sima, but kima kimaka to sima simaka.

Epenthesis is supported by leaving the target part of the rule blank. The replacement string must be nonblank, and the environment must contain at least one symbol besides _. For instance

will insert j before every instance of kt.

Simple metathesis is supported by the special replacement string \\. For instance

will turn all instances of nt before a vowel to tn. (To be precise, the input string is reversed; it can be of any length.)

Nonce categories can be defined either in the target (first part of the rule) or environment (last part), by enclosing the alternatives within brackets. Examples:

k/s/_[ie] Change k to s before either i or e.
[ao]u/o/_ Either au or ou is changed to o.
m/n/_[dt#] Change m to n before dentals and word-finally.
With the SCA1 I found myself writing a lot of similar rules, and nonce categories let them be combined.

Nonce categories in the environment (only) can include other categories:

k/g/_[VL] Change k to g before any member of categories V or L.

Nonce categories in the environment can include the word boundary #.

Degemination can be accomplished using the special character ². (Note that this is the first character shown in the IPA display.)

m//_²   Change mm to m.
  Change mm to m and nn to n, but leave mn and nm alone.
Finally, SCA² now supports extended category substitution. The target must still begin with a category; however, other material may occur after it. And the replacement string may contain any number of characters, with a category string given at any point. Examples:
Bi/Dj/_   Instances of B plus i are changed to the corresponding member of D plus j.
Nd/bM/_V   Instances of N plus d before a vowel are changed to b plus the corresponding member of M; note that this is a more complicated metathesis.

You can do gemination on category substitution, like this:

This will geminate all members of category M.

You can use a special wildcard to match anything. This allows you to test for something earlier or later on in the word. E.g. this rule will change a member of S to Z if there is a vowel V anywhere following it:


(This is a new feature and may still have bugs.)

The symbol is the third character in the IPA list. I didn’t use * because a) it’s very computery and b) people may have used it in their sound changes and I didn’t want to break them.

Including a gloss

It can be convenient to include a gloss in your lexicon which isn’t affected by the sound changes. This is done by separating the gloss with a space plus the special character ‣ (this is the second character in the text shown by the IPA button). For instance:
focus ‣ fire
Here’s the output you’ll get from that (with the default sound changes), in each of the output formats:
fogo ‣ fire
fire → fogo ‣ fire
fogo ‣ fire [focus]
No sound changes will apply to anything after ‣, but rewrite rules do apply, so if you use this option I recommend using non-English characters for the rewrite rules (e.g. use χ rather than x for kh).

Rule exceptions

Sometimes you'd like to say that a rule applies in environment e1, except for environment e2. You can generally handle this by writing more rules, but SCA² also allows you to state this directly by adding e2 after another slash, e.g.
k/s/_F/#s_   k changes to s before a front vowel, but not after word-initial s.
M/N/#_/_CF   Category M changes to category N word initially, but not before another consonant followed by a front vowel.
Because of the difficulty of lining up the _ in both environments, the exception environment can't include optional characters (those in parentheses) before the underline. (They can occur after it.)

Rewrite rules

These allow you to apply global substitutions to the input and output. The most important use is to allow digraphs.
If you use digraphs, you must follow the rules in this section. SCA² won’t handle digraphs properly on its own.
Rules with diagraphs will work so long as they can be treated as sequences of characters. For instance, these all work fine:
But you can’t define categories with digraphs. E.g. this was probably intended to define three fricatives kh sh zh
but in fact it defines the F category as k h s h z h, which won’t at all do what you expect.

The old SCA required that you use single characters instead. E.g. you might write

That still works, but you can use rewrite rules instead. E.g. define some rules like this:
Now you can use kh zh sh ng in any of the other input boxes— categories, sound changes, input lexicon. The SCA will apply the rewrite rules to provide single characters it can work with, and then apply them again backwards to provide output using digraphs.

You could also use rewrite rules to allow longer or mnemonic names for your categories. E.g.

Now you could write sound changes like
(The category names still have to be unique— you can’t use F to define both front vowels and fricatives. But recall that you can use any Unicode character now for category names.)

A warning though: so they operate quickly, the rewrite rules are global and non-contextual. The results may surprise you if you didn’t realize your transcription system was ambigious. E.g. don’t use kh both for IPA /x/ and for the cluster /k h/.

If you need contextual rewrite rules... just use SCA²! Add your rewrite rules at the top and bottom of the file, with the appropriate context specifications.

Sometimes you want the rewrite rules to apply only to the input. (For instance, the orthography may only apply to the parent language.) In that case, make sure Rewrite on output is unchecked.