SCA² - sound change applier

SCA² Help

The Sound Change Applier 2 is an updated version of my C program which applies a set of sound changes to a lexicon. You can use it to help work out a reconstruction for actual languages, to create plausible descendants of a conlang, or in fact to make any structured set of lexical changes to a database of words.

This version is written in Javascript, which means it runs in your browser. The advantage is that it supports Unicode, it’ll run on all systems, and you don’t have to mess with ASCII or command lines anymore.

Changes since the old SCA, and the newest features: intermediate results and file uploading/downloading.

Example

Try it out! With the default inputs, hit Apply. You should get an output like this:

leitor 	
doutor 	
fogo 	
jogo 	
distrito 	
cidade 	
adotar 	
obra 	
segundo

As if by magic, a selection of Latin words has turned into Portuguese.

The controls

Here’s what the controls do.

Output format tells how you want each line of the output to look like. The first option just prints each output word; this is good for generating a new list of words (e.g. as input for the next round of changes). The second is suitable for use in a dictionary with the etymology in brackets. The third gives the input and output words in order. (See here for how to add glosses.)

Show differences from last run, if checked, will boldface any changes from the last run when you hit Apply. This can be very useful to see what the effect of a changed rule is. (Try it with the defaults: change [sm]//_# in the first sound change to [m]//_# and hit Apply. You should see several of the words change, now retaining their final s.)

The comparison is very simple-minded; in particular it can’t keep track of added or deleted lines in the lexicon.

Note that if you hit Apply without making any changes, all the bolding is removed (since in fact nothing changed between runs).

Report which rules apply prints a report in the Output section listing every time a rule applies, like this:

u/o/_# applies to districtu at 8

This is useful for understanding why a rule applies (or doesn’t) when you expected the opposite.

Rewrite on output controls whether the rewrite rules should be reversed when writing the output lexicon.

Files include input lexicon affects the file sharing buttons.

Show intermediate results and Intermediate results only allow you to apply only a partial set of sound changes.

Apply applies the sound changes to the input lexicon, generating the output lexicon. We’ll talk about exactly what that means below.

Browse / Download are used for saving files; Parse .sc/Back to .sc are an alternative using the clipboard.

Help me! brings up this help file.

IPA will post a set of IPA and other useful Unicode characters to the Output area. You can then copy and paste a character into any of the input boxes.

On Safari and Firefox, Undo will work as it should: you can make a change, hit Apply, and if you don’t like the results, click on the text box you changed and select Undo. This doesn’t work on IE.

Defining sound changes

The Sound Changes box are rules for modifying the input lexicon. Hopefully the format of the rules will be familiar to any linguist. For instance, here’s one sound change:

c/g/V_V

This rule says to change c to g between vowels. (We’ll see how to generalize this rule below.)

More generally, a sound change looks like this:

target/replacement/environment

that is, the target string is changed to the replacement string within the given environment.

Optionally you can use → in place of the first slash. So the above rule can also be written

c→g/V_V

The environment must always contain an underline _, representing the part that changes. That can be all there is, as in

gn/nh/_

which tells the program to replace gn with nh unconditionally.

The character # represents the beginning or end of the word. So

u/o/_#

means to replace u with o, but only at the end of the word.

The replacement string can be blank, as in

s//_#

This means that s is deleted when it ends a word.

Rule order

Rules apply in the order they’re listed. So, with the word opera and the rules

p/b/V_V e//C_rV

the first rule voices the p, resulting in obera; the second deletes an e between a consonant and an intervocalic r, resulting in obra.

Optional elements in the environment

One or more elements in the environment can be marked as optional with parentheses. E.g.

u/ü/_C(C)F

says to change u to ü when it’s followed by one or two consonants and then a front vowel.

New stuff

In addition to Unicode support, the IPA chart, and rewrite rules:

SCA² treats spaces as word boundaries. So if you have a rule

k/s/#_

then it will not only turn kima to sima, but kima kimaka to sima simaka.

Epenthesis is supported by leaving the target part of the rule blank. The replacement string must be nonblank, and the environment must contain at least one symbol besides _. For instance

/j/_kt

will insert j before every instance of kt.

Simple metathesis is supported by the special replacement string \\. For instance

nt/\\/_V

will turn all instances of nt before a vowel to tn. (To be precise, the input string is reversed; it can be of any length.)

Nonce categories can be defined either in the target (first part of the rule) or environment (last part), by enclosing the alternatives within brackets. Examples:

k/s/_[ie] Change k to s before either i or e.
[ao]u/o/_ Either au or ou is changed to o.
m/n/_[dt#] Change m to n before dentals and word-finally.

With the SCA1 I found myself writing a lot of similar rules, and nonce categories let them be combined.

Nonce categories in the environment (only) can include other categories:

k/g/_[VL] Change k to g before any member of categories V or L.

Nonce categories in the environment can include the word boundary #.

Degemination can be accomplished using the special character ². (Note that this is the first character shown in the IPA display.)

m//² Change mm to m.

M=mn M//_² Change mm to m and nn to n, but leave mn and nm alone.

Finally, SCA² now supports extended category substitution. The target must still begin with a category; however, other material may occur after it. And the replacement string may contain any number of characters, with a category string given at any point. Examples:

Bi/Dj/_ Instances of B plus i are changed to the corresponding member of D plus j.

Nd/bM/_V Instances of N plus d before a vowel are changed to b plus the corresponding member of M; note that this is a more complicated metathesis.

You can do gemination on category substitution, like this:

M/M²/_

This will geminate all members of category M.

You can use a special wildcard … to match anything. This allows you to test for something earlier or later on in the word. E.g. this rule will change a member of S to Z if there is a vowel V anywhere following it:

S/Z/_…V

The … symbol is the third character in the IPA list. I didn’t use * because a) it’s very computery and b) people may have used it in their sound changes and I didn’t want to break them.

Including a gloss

It can be convenient to include a gloss in your lexicon which isn’t affected by the sound changes. This is done by separating the gloss with a space plus the special character ‣ (this is the second character in the text shown by the IPA button). For instance:

focus ‣ fire

Here’s the output you’ll get from that (with the default sound changes), in each of the output formats:

fogo ‣ fire fire → fogo ‣ fire fogo ‣ fire [focus]

No sound changes will apply to anything after ‣, but rewrite rules do apply, so if you use this option I recommend using non-English characters for the rewrite rules (e.g. use χ rather than x for kh).

Rule exceptions

Sometimes you'd like to say that a rule applies in environment e₁, except for environment e₂. You can generally handle this by writing more rules, but SCA² also allows you to state this directly by adding e₂ after another slash, e.g.

k/s/_F/#s_ k changes to s before a front vowel, but not after word-initial s.

M/N/#_/_CF Category M changes to category N word initially, but not before another consonant followed by a front vowel.

Because of the difficulty of lining up the _ in both environments, the exception environment can't include optional characters (those in parentheses) before the underline. (They can occur after it.)

Rewrite rules

These allow you to apply global substitutions to the input and output. The most important use is to allow digraphs.

If you use digraphs, you must follow the rules in this section. SCA²; won’t handle digraphs properly on its own.

Rules with diagraphs will work so long as they can be treated as sequences of characters. For instance, these all work fine:

c/ch/_a
sh/zh/V_V
u/o/_ng

But you can’t define categories with digraphs. E.g. this was probably intended to define three fricatives kh sh zh

F=khshzh

but in fact it defines the F category as k h s h z h, which won’t at all do what you expect.

The old SCA required that you use single characters instead. E.g. you might write

F=xßΩ

That still works, but you can use rewrite rules instead. E.g. define some rules like this:

kh|x
zh|ž
sh|š
ng|ŋ

Now you can use kh zh sh ng in any of the other input boxes— categories, sound changes, input lexicon. The SCA will apply the rewrite rules to provide single characters it can work with, and then apply them again backwards to provide output using digraphs.

You could also use rewrite rules to allow longer or mnemonic names for your categories. E.g.

<front>|F

Now you could write sound changes like

i/ü/_<front>

(The category names still have to be unique— you can’t use F to define both front vowels and fricatives. But recall that you can use any Unicode character now for category names.)

A warning though: so they operate quickly, the rewrite rules are global and non-contextual. The results may surprise you if you didn’t realize your transcription system was ambigious. E.g. don’t use kh both for IPA /x/ and for the cluster /k h/.

If you need contextual rewrite rules... just use SCA²! Add your rewrite rules at the top and bottom of the file, with the appropriate context specifications.

Sometimes you want the rewrite rules to apply only to the input. (For instance, the orthography may only apply to the parent language.) In that case, make sure Rewrite on output is unchecked.

Intermediate results

Sometimes you’d like to see the intermediate results for all words. E.g., maybe you want Old Portuguese on the road from Latin to Portuguese.

You do this by including the special line -* at the appropriate point in your sound changes file. (I encourage you to fill in the line as a comment, e.g. -* Old Portuguese.

To see the intermediate results, check Show intermediate results. E.g. if you add -* just after the rule gn/nh/_ in the default sound changes, you get output that looks like this:

opera; obra secundo; segundo

If you check Intermediate results only, you’ll get only the intermediate results. In the above case, you’d get

opera secundo

You can have multiple -* commands, though I don’t attest to the readability of the results. This can be useful for debugging, as you can see before-and-after results for a block of rules.

Saving files

To save your work, hit Download. This will group the categories, rewrite rules, sound changes, and input lexicon into a text. Browsers may differ on what happens next. Firefox properly lets me save it as a file, or open it in a text editor. Safari just puts the text in a new window which you can save.

Browse lets you read in a file, and distribute it properly to the input fields. To do this, it applies some simple rules:

If there is a = sign, they go in Categories.
If there is a | sign, they go in Rewrite Rules.
If there is an / or a leading -, they go in Sound changes.
Everything else goes in Input Lexicon.

Corollary: if you have comments in your sound changes, make sure they begin with -.

If you don’t want the input lexicon to go in your file, uncheck Files include input lexicon. This is better if you have, say, a parent language with several daughters each with their own sound change file.

Did you accidentally erase all your work? Hit Undo parse/upload, which will restore the state of your input fields before the last Browse or Parse command.

At the bottom of the Output section you'll see a link Download output lexicon. Click this to save your output lexicon as a text file.

If your browser doesn’t support these, I’ve kept the old methods, which involved consolidating everything into the Sound Changes field. It was then up to you, using cut and paste, to save these in a file.

Parse .sc will parse the consolidated text in the Sound Changes text box into the appropriate input boxes.

Back to .sc will collect the text in the input boxes and place them in Sound Changes.

`k/s/_[ie]`	Change `k` to `s` before either `i` or `e`.
`[ao]u/o/_`	Either `au` or `ou` is changed to `o`.
`m/n/_[dt#]`	Change `m` to `n` before dentals and word-finally.

`m//²`		Change `mm` to `m`.
`M=mn M//_²`		Change `mm` to `m` and `nn` to `n`, but leave `mn` and `nm` alone.

`Bi/Dj/_`		Instances of `B` plus `i` are changed to the corresponding member of `D` plus `j`.
`Nd/bM/_V`		Instances of `N` plus `d` before a vowel are changed to `b` plus the corresponding member of `M`; note that this is a more complicated metathesis.

`k/s/_F/#s_`		`k` changes to `s` before a front vowel, but not after word-initial `s`.
`M/N/#_/_CF`		Category `M` changes to category `N` word initially, but not before another consonant followed by a front vowel.