Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Combining similar rows

Author  Topic 

mla275
Starting Member

3 Posts

Posted - 2009-07-06 : 15:14:14
Hey everyone, I don't even know if this is humanly...or databasely...possible. What I am trying to do is group similar rows together so Dell, Dell Inc., Dell Computers, etc. will all be counted as one item and tallied as such. I would prefer to do it dynamically so I don't have to maintain a list of synonyms, but I can see how it is probably necessary to do so. Below is my current query. Any assistance would be greatly appreciated. Thanks in advance.

SELECT manu, COUNT(*) AS Counter FROM (SELECT Manufacturer AS manu FROM ' + @table + ') AS A GROUP BY manu ORDER BY Counter DESC

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2009-07-06 : 15:50:52
Not sure if you are asking for a t-sql algorithm to group by an expression that will "common-ize" similar names or if you are looking for an architectual solution that will allow you to avoid the problem - like a normalized EntityID value. An expression will likely never be perfect. Because if IBM is sometimes International Business Machines or KBMG is also Bearing Point then a regular expression could not really deal with that.

Be One with the Optimizer
TG
Go to Top of Page

mla275
Starting Member

3 Posts

Posted - 2009-07-06 : 16:02:18
Hi TG, thanks for the reply. The data is populated based on the BIOS information for each computer when it logs onto the network, so some are reported as 'Gateway' others as 'Gateway Systems' or some other permutation. I can manipulate the data programatically after the query to get what I want, but I would much rather handle it directly in the query itself. I figured I would need to keep a library of synonyms ('Dell', 'Dell Inc', 'Dell Computers', etc) for each possibility, which in itself is a bit of a pain, which i why I thought possibly doing a match on the first 4 characters which would put all the Dells and Gateways together, but still leaves out IBM and any other acronymmed entries. If anyone has any ideas about how to approach the query, I am wide open to suggestions.
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2009-07-06 : 16:15:15
If you have already collected a pretty good history of values then you may be able to pretty easily create a mapping table.
([BIOS_name] PK, [Normalized_name])
JOIN to the table by BIOS_name and GROUP BY Normalized_name.

Be One with the Optimizer
TG
Go to Top of Page

mla275
Starting Member

3 Posts

Posted - 2009-07-07 : 13:05:42
I was hoping that there was a more elegant way, but I guess T-SQL isn't to the point where it can read my mind yet...Anyway, thanks again
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2009-07-07 : 14:09:54
I think SQLSVR 2008 has mind-reading capability, but you have to store your thoughts in XML format, so it is probably not worth it.

________________________________________________
If it is not practically useful, then it is practically useless.
________________________________________________
Go to Top of Page
   

- Advertisement -