En français Numbers from 1 to 10 in Over 5000 Languages (One file)

Compiled by the irrepressible Mark Rosenfelder. Additions and corrections welcome.

The links on this page are to a single 1.1-megabyte file with all the numbers, displayed using Unicode. If your browser can't handle either of these things, click below.

This page with links to smaller non-Unicode files

Click here to see the entire collection, or click on the map to move to the languages for that area.

By family

Special collections


The Sources Page gives the sources for each language (and also lists languages I don't have, and connects the languages to other wide-scale classifications: Ruhlen, Voegelin & Voegelin, Campbell, and the Ethnologue).

I dearly appreciate everyone who's sent me numbers; but I want to particularly salute those whose kindness and hard work have been extraordinary: Jarel Deaton of Ohio, who is single-handedly responsible for more than a quarter of the numbers seen here; Eugene S.L. Chan of Hong Kong, who sent me his entire Austronesian database; and Carl Masthay of St. Louis and Pavel Petrov of Kaliningrad, who sent me their enormous, worldwide collection of numbers.

Special thanks to the Claudia Griffith and the staff of the SIL Library in Duncanville, Texas, whose wonderful hospitality made a week of research in the summer of 2004 both pleasant and productive.

Some caveats

Bru muej bār paj pōn se:ng tepat tetekual tikeas mencit
Bru muoi bar p´i poun sau'ng tapoât tapul takual takêh muoi chít
Gurma yèn.dó lyé lwö.bà lèle: pá:nì pyêgà
Gurma n lè nlé nta nna nmu nluoba n lele nni n-ya ka piga

Language variations

People can get very excited about what's a language vs. what's a dialect. There is nothing inherent in the language variety to tell us what it is. Linguists sometimes use "language" to refer to a mutually intelligible group of dialects (but note that intelligibility can be partial).

Ordinary people generally call something a "language" if it has a prestigious standard form; but that's a fact about people's attitudes, not about language.

I generally rely on Voegelin & Voegelin, or on the original source for the numbers, in deciding whether to list something as a dialect (italicized). Some of my sources list multiple dialects; I usually try to pick the most widely spoken ones, and list others only if they're interestingly divergent.

Corollary: please don't complain to me about what's a dialect or a language-- you're arguing about nothing. (But feel free to send me additional dialects, or point out where I've messed up the names.)

Especially in the Amerind sections, I sometimes list older sources which may be of historical interest.


The mondo file linked from this page uses Unicode-- where the characters are available on my 2003 Mac and Windows computers. Annoyingly, the IPA characters are not available, so I still need some substitutions. .

* indicates a reconstructed form
+ indicates a dead language (but some are undergoing revivals)

The picture shows the representations used for a number of IPA characters. Nonetheless, I haven't been able to retain all phonetic distinctions, and some have been lost-- for instance, the distinction between a circumflex (â) and a hachek.

For African tonal languages, a macron - indicates a high level tone, not length, and is represented as _. | is another tone, usually low level.
For non-African languages, a macron indicates length and is indicated :.

? indicates the glottal stop (but if my sources spell it as an apostrophe or q, I follow them)
bold indicates a character which was dotted in the original source-- usually an emphatic or retroflex consonant
italic indicates open e and o and lax i and u, or a character that was italicized in the original source

Superscript numbers indicate a numbered toneme (e.g. 1 = first tone)
Appended numbers give tonal contours directly (e.g. 35 = high rising)

I use standard orthographies, where there is one, rather than phonetic transcriptions. This makes comparison a bit more difficult; but I prefer it, for two reasons. First, it reduces errors; even if I can correctly interpret a source's phonetic description, there can be orthographic irregularities that make a straight transcription ludicrous. Secondly, an orthography is generally closer to a phonemic representation, which is arguably what people have in their heads. 

Numbers about Numbers

Languages with more than a million native speakers are named in boldface.

Number of speakers is one of the least interesting attributes of a language; but there are so many languages here that some highlighting of the most common ones seems necessary. I used the high end of David Crystal's estimates.

How many languages aren't here? Well, there's almost 5000 living languages listed in Ruhlen's volume; I have numbers for about 83% of them, so there's at least a thousand more. (If the math doesn't seem to work out, note that I have plenty of dialects and conlangs not included in Ruhlen's list.) There are about 200 languages with more than a million speakers, all of which are in the list.

Am I going to do higher numbers? Or zero? Probably not, unless I do it for a subset of languages only. Many of the sources don't even have numbers above ten.

How was this done?

People sometimes ask me how I accumulated all these numbers, or how to do this sort of research.

The answer is simple: libraries. I have access to a few good university libraries, and when I can I visit others. You look in grammars, dictionaries, and books or journal articles surveying entire families.

And, if possible, find others who've been bitten by the same bug!