Papers Collage
Lineages, Inc.
 
Home Professional Research Products Info Center About Us Contact Us
 

What is Soundexing?

An old legal principle states that if a reasonable person would use the same pronunciation for names that are spelled differently, the names are the same. Robert C. Russell of Pittsburgh, Pennsylvania, realized that it should be possible to apply this principle to indexing—in other words, to index names by their sounds rather than their spelling. Russell was issued patent number 1,261,167 on April 2, 1918 for inventing “certain new and useful Improvements in Indexes that came to be known as “soundexing."

"American Soundex"
soundex code The so-called “American” Soundex system is an improvement on Russell’s invention, and was used by the National Archives and Record Administration to index the 1880, 1890, 1900, 1910, and 1920 U.S. Censuses. The Soundex code consists of the first letter of the name followed by three digits selected from the table at right, using three simple rules:

  1. Double letters are coded as one letter:
         Williams = W452
  2. Letters of the same code not separated by other letters are coded as one letter:
         Schmidt = S530
  3. Zeroes are added to the end of the code to make up three digits:
         Lee = L000

Daitch-Mokotoff Soundex
Although the Soundex is useful, many names that sound the same are not coded the same—Carr is C600 but Kerr is K600, for example. Additionally, the Soundex code only adds three significant letters to the first letter of the name, so that long names may be coded the same as short ones (Peters and Peterson, for example). The Daitch—Mokotoff Soundex system resolves these problems.

The Daitch—Mokotoff Soundex system is quite a bit more complex than the “American” Soundex system. First, it is six digits long, providing more granularity. It is based on letter clusters rather than individual letters, and recognizes multiple phonetic possibilities for those clusters when appropriate. Each cluster consists of one or more letters, and is assigned three values in the range 0–9: one value for when the cluster begins the name; one value for when the cluster is followed by A, E, I, J, O, U, or Y; and one value for all other cases except A, E, H, I, J, O, U, and Y, which have no “all other cases” value. Finally, a name may have more than one Daitch—Mokotoff Soundex code. The complete rules are available in "Soundexing and Genealogy" by Gary Mokotoff.



Bookcase

   
 
© 2004-2007 Lineages, Inc. All Rights Reserved.
Home | Professional Research | Products | Info Center | About Us | Contact Us
Logo