Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Rainmann
Starting Member
5 Posts |
Posted - 2011-08-22 : 16:21:49
|
| I have some data I need to clean up. Because of poor database design there are orphaned records. The orphans, however, can usually be matched back up to valid parent records. What I’m not sure of is how to best build the statement to fix up the data.First an example of a valid record:Table: TeamMaster TeamID: 1 TeamName: Green Bay PackersIn almost all cases the orphaned records are a subset of valid records and might look something like: BadTeamName: Green Bay PckrsWhat I’d like to do is take the bad team name and do a ‘like’ search over the valid records, trimming the bad team name by a single letter until I get a match.So my first query would look like: Select * from Teams where TeamName like 'Green Bay Pckrs%'and would return no matches.And my final query would look like: Select * from Teams where TeamName like 'Green Bay P%'and would find the valid record.I can do the above queries by trimming, but is there a better way? Also, my starting point will be a work table or result set in memory. I can do it via iteration but am not sure how to write set based operations to perform the same task.I’m aware that I might wind up with some records still orphaned, but this will greatly minimize the manual work needed. |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2011-08-22 : 17:35:03
|
Best way to start this type of thing is to get an idea of how many orphans and the nature of the mis-matches. select t.teamName, count(*)from teams tleft outer join teamMaster tm on tm.TeamName = t.TeamNamewhere tm.teamName is nullorder by t.teamName--to see by order of number of occurrences--order by count(*) desc I assume your "fixes" will be to set the TeamID, right? Your child table should just have the ID not the teamName repeated.You can first update the TeamID for all direct matches. Then do specific updates starting with the largest occurrence count.Then fix your model so that you have foreign key constraints that don't allow orphaned rows.Be One with the OptimizerTG |
 |
|
|
Rainmann
Starting Member
5 Posts |
Posted - 2011-08-23 : 09:56:57
|
| Yes to all the things you mentioned. Changing to Id's. Changing direct matches first. Adding proper constraints to prevent this from happening again in the future.In this particular table I have 40,000 out of 50,000 that can be replaced as direct matches. Of the 10,000 that remain there are only 113 distinct records that are causing an issue.However, there are many other tables with the same problem. I'm planning in implementing this same fix in all of the other tables as well. |
 |
|
|
|
|
|
|
|