Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
LLatinsky
Starting Member
38 Posts |
Posted - 2007-03-20 : 17:46:01
|
Hello, I have a table with approximately 1 million records in it with a primary key (int, identity) that is referenced by more than 50 tables as a foreigh key (some of the tables allow the foreign key to be null, some not). This table was populated during a conversion, and it looks like some of the ids do not exist in any of the foreign key tables - which violates the rule in this database that if there is a record in this table it must exist in at least one table that references it. I am trying to delete the records that violate the rule (or at least see how many of them are there) but since the table is so big and I need to check whether it exists in more than 50 tables it takes time. I was trying exists, not in, left join with is null - nothing is fast enough. I am so frustrated that I am considering trying to put all distinct id values from the foreign key tables into one temporary table and then delete by using left join and the value is null from my primary key table. But I am not sure this will make it faster. Does anybody have an idea what else I could try? the majority of fk tables that have large amounts of data have indexes on this fk.Thank you |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2007-03-21 : 02:15:53
|
How do you plan to enforce such a rule in the future? even if you remove the parents with no children now, more will likely creep in later. Do you plan to clean it up periodically? If there's one thing I have learned about large data sets, it's that any rule that's not explicitly enforced will be broken.I guess if I had to do this as a one-off I would dump all the fk ids from the various 50 child tables into a single temp table, and then left join to that temp table to see which parents have no children. consider using a pk with ignore_dup_key=on in the temp table so it doesn't get any bigger than it needs to be. www.elsasoft.org |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-03-21 : 03:08:11
|
OrSELECT PK AS ID, 'MasterTable' AS TableName INTO #Temp FROM MasterTableFor All Tables INSERT #Temp SELECT FK, '{TableName}' FROM {TableName}SELECT ID, COUNT(*) FROM #Temp GROUP BY ID HAVING COUNT(*) = 1Peter LarssonHelsingborg, Sweden |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-03-21 : 03:29:29
|
Assuming this is a production system, with large tables, I would take a slight more rigorous approach than Peso!1) Pre-create the TEMP table (so as not to block TEMPDB access). If its going to be around for a few days whilst you sort this out make a permanent table in a "sandpit" database - don't make the table in the Live/Production database. Make sure the new table has a PK index2) Copy the IDs from the largest table into the TEMP table.3) Copy any "new" IDs for either of the other tables into the TEMP table.4) Report / examine / delete!CREATE TABLE #TEMP( ID int NOT NULL, PRIMARY KEY ( ID ))GOINSERT INTO #TEMP(ID)SELECT DISTINCT IDFROM dbo.MyFirstTableGOINSERT INTO #TEMP(ID)SELECT DISTINCT IDFROM dbo.MyOtherTable AS O LEFT OUTER JOIN #TEMP AS T ON T.ID = O.IDWHERE T.ID IS NULL -- New IDs onlyGO Then to get the ones that are "unused"SELECT COUNT(*)-- DELETE M -- Use this to delete the "Unused" recordsFROM dbo.TheMasterTable AS M LEFT OUTER JOIN #TEMP AS T ON T.ID = M.IDWHERE T.ID IS NULL -- IDs that don't exist in #TEMP Take a Backup first!Kristen |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-03-21 : 08:19:45
|
I have an article that gives some methods to do this, using a technique very similar to Kristen's:http://weblogs.sqlteam.com/jeffs/archive/2004/10/07/2190.aspxIt explains the process step by step. Once you create the "translation" table, it is very easy to use the UPDATE statement given to update all tables that reference your primary table; all you need to do is change the table name and column name and keep running the update.as Kristen mention: BACK UP everything and TEST THIS IN A DEVELOPMENT environment FIRST! If you add everything to a big sql script as you go, you can test it all, generate a script, and when you are ready, just run the script on production.- Jeffhttp://weblogs.sqlteam.com/JeffS |
 |
|
LLatinsky
Starting Member
38 Posts |
Posted - 2007-03-21 : 15:53:53
|
Thank you everybody for your replies and advice. The approach described in the web log is exactly what I am doing with consolidating all related records to one. This code has been working for years with no problem. But I just wanted to delete the "orphans" if I may call them this first in order to take them out of the equation. Unfortunately, there is no natural key in the pk table - the combination of all columns will make the record unique, and I have code in place that will insert a new record in it only when it cannot find an existing, but there are things that are out of my control that I have to take care of post factum. I really didn't want to create a table of all the ids that are used throughout the database in order to delete the unused ones, but I guess I don't have a choice. Thank you again |
 |
|
|
|
|
|
|