There are couple of ideas I've found useful for scoping out this sort of thing.Given a table CREATE TABLE names (nm varchar(100) NOT NULL) and a sufficiently large tally table Numbers:Find all the characters used in the names:SELECT ASCII(c) AS code, c, COUNT(*) AS ctFROM ( SELECT SUBSTRING(nm, n, 1) AS c FROM names INNER JOIN junk..Numbers ON n BETWEEN 1 AND LEN(nm) ) AS AGROUP BY cORDER BY ct DESC
Create a name 'signature' by reducing consecutive letters to a single character. This needs a bit more work if you have many names with accented letters, etc.It's not necessary to create the view separately, but it may be useful for other queries -- for example, if you want to examine the names with unique signatures. The multiple character replacement scheme shown is good for up to 41 consecutive letter. To replace more requires another REPLACE: doing 21×'a' inside the 6×'a' one allow up to 461!CREATE VIEW NameSigASSELECT nm, REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( REPLACE(REPLACE(REPLACE(nm, 'a','a'), 'b','a'), 'c','a'), 'd','a'), 'e','a'), 'f','a'), 'g','a'), 'h','a'), 'i','a'), 'j','a'), 'k','a'), 'l','a'), 'm','a'), 'n','a'), 'o','a'), 'p','a'), 'q','a'), 'r','a'), 's','a'), 't','a'), 'u','a'), 'v','a'), 'w','a'), 'x','a'), 'y','a'), 'z','a'), 'aaaaaa','a'), 'aaa','a'), 'aa','a'), 'aa','a') AS sigFROM NamesGOSELECT sig, COUNT(*)FROM NameSigGROUP BY sigORDER BY COUNT(*) DESC
On some sample data with about 85000 names, the latter returned:a a 82462a a a 919a a'a 514a a-a 407a-a a 69a a a a 5a a-a-a 4a-a a-a 3a a a-a 3a a a'a 3a a-a-a-a 2a a'a a 1a'a a 1a-a a'a 1a' a 1
Edited by - Arnold Fribble on 05/11/2003 16:39:36