|
The Sound Change Applier 2 is an updated version of my C program which applies a set of sound changes to a lexicon. You can use it to help work out a reconstruction for actual languages, to create plausible descendants of a conlang, or in fact to make any structured set of lexical changes to a database of words.
This version is written in Javascript, which means it runs in your browser. The advantage is that it supports Unicode, it’ll run on all systems, and you don’t have to mess with ASCII or command lines anymore.
Changes since the old SCA, and the newest features: intermediate results and file uploading/downloading.
As if by magic, a selection of Latin words has turned into Portuguese.leitor doutor fogo jogo distrito cidade adotar obra segundo
Output format tells how you want each line of the output to look like. The first option just prints each output word; this is good for generating a new list of words (e.g. as input for the next round of changes). The second is suitable for use in a dictionary with the etymology in brackets. The third gives the input and output words in order. (See here for how to add glosses.)
Show differences from last run, if checked, will boldface any changes from the last run when you hit Apply. This can be very useful to see what the effect of a changed rule is. (Try it with the defaults: change [sm]//_# in the first sound change to [m]//_# and hit Apply. You should see several of the words change, now retaining their final s.)
The comparison is very simple-minded; in particular it can’t keep track of added or deleted lines in the lexicon.
Note that if you hit Apply without making any changes, all the bolding is removed (since in fact nothing changed between runs).
Report which rules apply prints a report in the Output section listing every time a rule applies, like this:
u/o/_# applies to districtu at 8This is useful for understanding why a rule applies (or doesn’t) when you expected the opposite.
Rewrite on output controls whether the rewrite rules should be reversed when writing the output lexicon.
Files include input lexicon affects the file sharing buttons.
Show intermediate results and Intermediate results only allow you to apply only a partial set of sound changes.
Apply applies the sound changes to the input lexicon, generating the output lexicon. We’ll talk about exactly what that means below.
Browse / Download are used for saving files; Parse .sc/Back to .sc are an alternative using the clipboard.
Help me! brings up this help file.
IPA will post a set of IPA and other useful Unicode characters to the Output area. You can then copy and paste a character into any of the input boxes.
On Safari and Firefox, Undo will work as it should: you can make a change, hit Apply, and if you don’t like the results, click on the text box you changed and select Undo. This doesn’t work on IE.
c/g/V_VThis rule says to change c to g between vowels. (We’ll see how to generalize this rule below.)
More generally, a sound change looks like this:
target/replacement/environmentthat is, the target string is changed to the replacement string within the given environment.
Optionally you can use → in place of the first slash. So the above rule can also be written
c→g/V_V
The environment must always contain an underline _, representing the part that changes. That can be all there is, as in
gn/nh/_which tells the program to replace gn with nh unconditionally.
The character # represents the beginning or end of the word. So
u/o/_#means to replace u with o, but only at the end of the word.
The replacement string can be blank, as in
s//_#This means that s is deleted when it ends a word.
The environment can contain variables, like V above. These are defined in the Categories box. I use capital letters for this, though this is not a requirement. Variables can only be one character long (unless you use rewrite rules). You can define any variables needed to state your sound changes. E.g. you could define S to be any stop, or K for any coronal, or whatever.
So the category definition and rule
F=iemeans that c changes to i after a front vowel and before a t.
c/i/F_t
You can use variables in the first two parts as well. For instance, suppose you’ve defined
S=ptcThis means that the stops ptc change to their voiced equivalents bdg between vowels. In this usage, the variables must correspond one for one— p goes to b, t goes to d, etc. Each character in the replacement variable (here Z) gives the transformed value of each character in the input variable (here S). If the replacement category is shorter than the target category, the matching input will be deleted.
Z=bdg
S/Z/V_V
A variable can also be set to a fixed value, or deleted. E.g.
Z//V_Vsays to delete voiced stops between vowels.
Rules apply in the order they’re listed. So, with the word opera and the rules
p/b/V_Vthe first rule voices the p, resulting in obera; the second deletes an e between a consonant and an intervocalic r, resulting in obra.
e//C_rV
One or more elements in the environment can be marked as optional with parentheses. E.g.
u/ü/_C(C)Fsays to change u to ü when it’s followed by one or two consonants and then a front vowel.
SCA² treats spaces as word boundaries. So if you have a rule
k/s/#_then it will not only turn kima to sima, but kima kimaka to sima simaka.
Epenthesis is supported by leaving the target part of the rule blank. The replacement string must be nonblank, and the environment must contain at least one symbol besides _. For instance
/j/_ktwill insert j before every instance of kt.
Simple metathesis is supported by the special replacement string \\. For instance
nt/\\/_Vwill turn all instances of nt before a vowel to tn. (To be precise, the input string is reversed; it can be of any length.)
Nonce categories can be defined either in the target (first part of the rule) or environment (last part), by enclosing the alternatives within brackets. Examples:
With the SCA1 I found myself writing a lot of similar rules, and nonce categories let them be combined.
k/s/_[ie] Change k to s before either i or e. [ao]u/o/_ Either au or ou is changed to o. m/n/_[dt#] Change m to n before dentals and word-finally.
Nonce categories in the environment (only) can include other categories:
k/g/_[VL] Change k to g before any member of categories V or L.
Nonce categories in the environment can include the word boundary #.
Degemination can be accomplished using the special character ². (Note that this is the first character shown in the IPA display.)
Finally, SCA² now supports extended category substitution. The target must still begin with a category; however, other material may occur after it. And the replacement string may contain any number of characters, with a category string given at any point. Examples:
m//² Change mm to m. M=mn
M//_²Change mm to m and nn to n, but leave mn and nm alone.
Bi/Dj/_ Instances of B plus i are changed to the corresponding member of D plus j. Nd/bM/_V Instances of N plus d before a vowel are changed to b plus the corresponding member of M; note that this is a more complicated metathesis.
You can do gemination on category substitution, like this:
M/M²/_This will geminate all members of category M.
You can use a special wildcard … to match anything. This allows you to test for something earlier or later on in the word. E.g. this rule will change a member of S to Z if there is a vowel V anywhere following it:
S/Z/_…V
The … symbol is the third character in the IPA list. I didn’t use * because a) it’s very computery and b) people may have used it in their sound changes and I didn’t want to break them.
focus ‣ fireHere’s the output you’ll get from that (with the default sound changes), in each of the output formats:
fogo ‣ fireNo sound changes will apply to anything after ‣, but rewrite rules do apply, so if you use this option I recommend using non-English characters for the rewrite rules (e.g. use χ rather than x for kh).
fire → fogo ‣ fire
fogo ‣ fire [focus]
Because of the difficulty of lining up the _ in both environments, the exception environment can't include optional characters (those in parentheses) before the underline. (They can occur after it.)
k/s/_F/#s_ k changes to s before a front vowel, but not after word-initial s. M/N/#_/_CF Category M changes to category N word initially, but not before another consonant followed by a front vowel.
If you use digraphs, you must follow the rules in this section. SCA²; won’t handle digraphs properly on its own.Rules with diagraphs will work so long as they can be treated as sequences of characters. For instance, these all work fine:
c/ch/_aBut you can’t define categories with digraphs. E.g. this was probably intended to define three fricatives kh sh zh
sh/zh/V_V
u/o/_ng
F=khshzhbut in fact it defines the F category as k h s h z h, which won’t at all do what you expect.
The old SCA required that you use single characters instead. E.g. you might write
F=xßΩThat still works, but you can use rewrite rules instead. E.g. define some rules like this:
kh|xNow you can use kh zh sh ng in any of the other input boxes— categories, sound changes, input lexicon. The SCA will apply the rewrite rules to provide single characters it can work with, and then apply them again backwards to provide output using digraphs.
zh|ž
sh|š
ng|ŋ
You could also use rewrite rules to allow longer or mnemonic names for your categories. E.g.
<front>|FNow you could write sound changes like
i/ü/_<front>(The category names still have to be unique— you can’t use F to define both front vowels and fricatives. But recall that you can use any Unicode character now for category names.)
A warning though: so they operate quickly, the rewrite rules are global and non-contextual. The results may surprise you if you didn’t realize your transcription system was ambigious. E.g. don’t use kh both for IPA /x/ and for the cluster /k h/.
If you need contextual rewrite rules... just use SCA²! Add your rewrite rules at the top and bottom of the file, with the appropriate context specifications.
Sometimes you want the rewrite rules to apply only to the input. (For instance, the orthography may only apply to the parent language.) In that case, make sure Rewrite on output is unchecked.
You do this by including the special line -* at the appropriate point in your sound changes file. (I encourage you to fill in the line as a comment, e.g. -* Old Portuguese.
To see the intermediate results, check Show intermediate results. E.g. if you add -* just after the rule gn/nh/_ in the default sound changes, you get output that looks like this:
opera; obraIf you check Intermediate results only, you’ll get only the intermediate results. In the above case, you’d get
secundo; segundo
operaYou can have multiple -* commands, though I don’t attest to the readability of the results. This can be useful for debugging, as you can see before-and-after results for a block of rules.
secundo
Browse lets you read in a file, and distribute it properly to the input fields. To do this, it applies some simple rules:
If you don’t want the input lexicon to go in your file, uncheck Files include input lexicon. This is better if you have, say, a parent language with several daughters each with their own sound change file.
Did you accidentally erase all your work? Hit Undo parse/upload, which will restore the state of your input fields before the last Browse or Parse command.
At the bottom of the Output section you'll see a link Download output lexicon. Click this to save your output lexicon as a text file.
If your browser doesn’t support these, I’ve kept the old methods, which involved consolidating everything into the Sound Changes field. It was then up to you, using cut and paste, to save these in a file.
Parse .sc will parse the consolidated text in the Sound Changes text box into the appropriate input boxes.
Back to .sc will collect the text in the input boxes and place them in Sound Changes.