delete records from a large table

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

delete records from a large table

Author

Topic

LLatinsky
Starting Member

38 Posts

Posted - 2007-03-20 : 17:46:01

Hello, I have a table with approximately 1 million records in it with a primary key (int, identity) that is referenced by more than 50 tables as a foreigh key (some of the tables allow the foreign key to be null, some not). This table was populated during a conversion, and it looks like some of the ids do not exist in any of the foreign key tables - which violates the rule in this database that if there is a record in this table it must exist in at least one table that references it. I am trying to delete the records that violate the rule (or at least see how many of them are there) but since the table is so big and I need to check whether it exists in more than 50 tables it takes time. I was trying exists, not in, left join with is null - nothing is fast enough. I am so frustrated that I am considering trying to put all distinct id values from the foreign key tables into one temporary table and then delete by using left join and the value is null from my primary key table. But I am not sure this will make it faster. Does anybody have an idea what else I could try? the majority of fk tables that have large amounts of data have indexes on this fk.
Thank you

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2007-03-21 : 02:15:53

How do you plan to enforce such a rule in the future? even if you remove the parents with no children now, more will likely creep in later. Do you plan to clean it up periodically? If there's one thing I have learned about large data sets, it's that any rule that's not explicitly enforced will be broken.

I guess if I had to do this as a one-off I would dump all the fk ids from the various 50 child tables into a single temp table, and then left join to that temp table to see which parents have no children.

consider using a pk with ignore_dup_key=on in the temp table so it doesn't get any bigger than it needs to be.

www.elsasoft.org

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-03-21 : 03:08:11

Or

SELECT PK AS ID, 'MasterTable' AS TableName INTO #Temp FROM MasterTable

For All Tables
INSERT #Temp SELECT FK, '{TableName}' FROM {TableName}

SELECT ID, COUNT(*) FROM #Temp GROUP BY ID HAVING COUNT(*) = 1

Peter Larsson
Helsingborg, Sweden

Kristen
Test

22859 Posts

Posted - 2007-03-21 : 03:29:29

Assuming this is a production system, with large tables, I would take a slight more rigorous approach than Peso!

1) Pre-create the TEMP table (so as not to block TEMPDB access). If its going to be around for a few days whilst you sort this out make a permanent table in a "sandpit" database - don't make the table in the Live/Production database. Make sure the new table has a PK index

2) Copy the IDs from the largest table into the TEMP table.

3) Copy any "new" IDs for either of the other tables into the TEMP table.

4) Report / examine / delete!


CREATE TABLE #TEMP
(
    ID int NOT NULL,
    PRIMARY KEY
    (
        ID
    )
)
GO

INSERT INTO #TEMP(ID)
SELECT DISTINCT ID
FROM dbo.MyFirstTable
GO
INSERT INTO #TEMP(ID)
SELECT DISTINCT ID
FROM dbo.MyOtherTable AS O
     LEFT OUTER JOIN #TEMP AS T
        ON T.ID = O.ID
WHERE T.ID IS NULL -- New IDs only
GO

Then to get the ones that are "unused"


SELECT COUNT(*)
-- DELETE M    -- Use this to delete the "Unused" records
FROM dbo.TheMasterTable AS M
    LEFT OUTER JOIN #TEMP AS T
        ON T.ID = M.ID
WHERE T.ID IS NULL -- IDs that don't exist in #TEMP

Take a Backup first!

Kristen

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-03-21 : 08:19:45

I have an article that gives some methods to do this, using a technique very similar to Kristen's:

http://weblogs.sqlteam.com/jeffs/archive/2004/10/07/2190.aspx

It explains the process step by step. Once you create the "translation" table, it is very easy to use the UPDATE statement given to update all tables that reference your primary table; all you need to do is change the table name and column name and keep running the update.

as Kristen mention: BACK UP everything and TEST THIS IN A DEVELOPMENT environment FIRST!

If you add everything to a big sql script as you go, you can test it all, generate a script, and when you are ready, just run the script on production.

- Jeff
http://weblogs.sqlteam.com/JeffS

LLatinsky
Starting Member

38 Posts

Posted - 2007-03-21 : 15:53:53

Thank you everybody for your replies and advice. The approach described in the web log is exactly what I am doing with consolidating all related records to one. This code has been working for years with no problem. But I just wanted to delete the "orphans" if I may call them this first in order to take them out of the equation. Unfortunately, there is no natural key in the pk table - the combination of all columns will make the record unique, and I have code in place that will insert a new record in it only when it cannot find an existing, but there are things that are out of my control that I have to take care of post factum. I really didn't want to create a table of all the ids that are used throughout the database in order to delete the unused ones, but I guess I don't have a choice. Thank you again

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources