What is Soundexing?
An old legal principle states that if a reasonable person
would use the same pronunciation for names that are spelled differently,
the names are the same. Robert C. Russell of Pittsburgh, Pennsylvania,
realized that it should be possible to apply this principle to
indexing—in
other words, to index names by their sounds rather than their spelling.
Russell was issued patent number 1,261,167 on April 2, 1918 for
inventing “certain
new and useful Improvements in Indexes
that came to be known as “soundexing."
"American Soundex"
The so-called “American” Soundex
system is an improvement on Russell’s invention,
and was used by
the National Archives and Record Administration to index the 1880,
1890, 1900,
1910, and 1920 U.S. Censuses. The Soundex code consists of the
first letter
of the name followed by three digits selected from the table at
right, using three simple rules:
- Double letters are coded as one letter:
Williams = W452
- Letters of the same code not separated by other letters
are coded as one letter:
Schmidt = S530
- Zeroes are added to the end of the code to make up three
digits:
Lee = L000
Daitch-Mokotoff Soundex
Although the Soundex is useful, many names that sound
the same are not coded the same—Carr is C600 but Kerr is K600,
for example. Additionally, the Soundex code only adds three significant
letters to the first letter of the name, so that long names may be coded
the same as short ones (Peters and Peterson, for example). The Daitch—Mokotoff
Soundex system resolves these problems.
The Daitch—Mokotoff Soundex system is quite a bit
more complex than the “American” Soundex system. First, it
is six digits long, providing more granularity. It is based on
letter clusters rather than individual letters, and recognizes
multiple phonetic
possibilities for those clusters when appropriate. Each cluster
consists of one or more letters, and is assigned three values in
the range 0–9:
one value for when the cluster begins the name; one value for when
the cluster is followed by A, E, I, J, O, U, or Y; and one value
for all
other cases except A, E, H, I, J, O, U, and Y, which have no “all
other cases” value. Finally, a name may have more than one Daitch—Mokotoff
Soundex code. The complete rules are available in "Soundexing and
Genealogy" by Gary Mokotoff. |
|
|