Date: 9-15-93

Files: README (this file), cmudict.0.1.Z (compressed), cmulex.0.1.Z,
cmudict.0.2.Z (compressed), cmudict.0.3.Z (compressed), cmulex.0.3.Z,
phoneset.0.1, phoneset.0.3.

This directory contains a pronunciation dictionaries (cmudict.0.1.Z is
the first one we put out, cmudict.0.3.Z is the latest and most up-to-date)
containing approximately 100k words and their transcriptions; lists of the
words are in cmulex.0.1.Z and cmulex.0.3.Z. We use these dictionaries
at CMU in our speech understanding systems.

The phone set for this dictionary contains 39 phones, which can be found in
phoneset.0.3.

Stress is indicated by means of a numeral [012] attached to a vowel:
  0 = no stress
  1 = primary stress
  2 = secondary stress

Alternate transcriptions are identified with a numeral in parentheses as
part of the lexical entry.

We generated this dictionary using the following independent sources:
- a 20k+ general English dictionary, built by hand at CMU
  (extensively proofed and used).
- a 200k+ UCLA-proofed version of the shoup dictionary.
- a 32k subset of the Dragon dictionary.
- a 53k+ dictionary of proper names, synthesiser-generated, unproofed.
- a 200k dictionary generated with Orator, unproofed.
- a 200k dictionary generated with Mitalk, unproofed.

All entries that occur solely in copyrighted sources, like the Dragon
dictionary, are not currently included in this dictionary. if you have
words and transcriptions that you would like included in this unrestricted
resource, please send them to Robert L. Weide (weide@cs.cmu.edu) and we
will consider them for an upcoming version.

All of the above sources were preprocessed and the transcriptions in the
current cmudict.0.1 were selected from the transcriptions in the sources or
a combination thereof. We have removed some potentially unreliable
transcriptions from this dictionary, including those based on only one
source, and will reintroduce them once we have verified the transcriptions.

CMU does not guarantee the accuracy of this dictionary, nor its suitablity
for any specific purpose. In fact, we expect a number of errors, omissions
and inconsistencies to remain in the current result. We intend to continually
update the dictionary as we make progress in correcting them. We will make
subsequent versions available via anonymous ftp, and those who would like
notification when updated versions are available should send email to
weide@cs.cmu.edu.

We welcome input from users: send e-mail to Robert L. Weide (weide@cs.cmu.edu)
for comments and suggestions on the content of the dictionary, or to Peter
Jansen (pjj@cs.cmu.edu) for questions regarding the combination process.

The Carnegie Mellon Pronouncing Dictionary [cmudict.0.1] is Copyright 1993
by Carnegie Mellon University. Use of this dictionary, for any research or
commercial purpose, is completely unrestricted.  If you make use of or
redistribute this material, we would appreciate acknowlegement of its origin.

Finally, if you add words to or correct words in this dictionary, we would like
the additions and corrections sent to us (weide@cs) for consideration in a
subsequent version. All final entries will be approved by Robert L. Weide
and Peter Jansen, editors of the dictionary.
