Author |
Topic |
basicconfiguration
Constraint Violating Yak Guru
358 Posts |
Posted - 2011-12-07 : 03:25:52
|
I have a table that has 200M records. We have to archive it in order to achieve better performance. Basically, copy the records to an archive database and delete the ones from the source. What is the best way to tackle this? Will like some info or steps how to do it. |
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2011-12-07 : 03:59:29
|
you can stored procedure for doing this. Add a parameter to specify period before which you need to do archiving with a default value. To automate it, add a sql server agent job to call this procedure and do the archiving as per convenient schedule------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
|
|
jassi.singh
Posting Yak Master
122 Posts |
Posted - 2011-12-07 : 06:33:44
|
Yes, I had done this as a project.I can tell you some pseudo code1)Create same table structure on you destination table2)Insert data based on you archive rule from source to destination3)Finally delete from source database.Also, TRANSACTION plays an important role otherwise everything will get messy....You can use linked server to achieve this between different serverJava Database |
|
|
basicconfiguration
Constraint Violating Yak Guru
358 Posts |
Posted - 2011-12-07 : 12:33:14
|
How do you not lock the table cause there are 200 millions record. I think if I do insert and delete, it will lock the table and slow down the servers and the Log file is going to grow like crazy. |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2011-12-07 : 12:41:48
|
why not do archiving out of peak hours so that it doesnt affect other processing. Also it need not be very frequent. based on data growth it can even be a weekly activity------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
|
|
basicconfiguration
Constraint Violating Yak Guru
358 Posts |
Posted - 2011-12-07 : 19:39:02
|
We don't want to delete all the records at once. We want to delete more records than inserted records. how can i achieve this?. My plan is to have a temp table between source and destination table. Set up a SQL job to run it daily and eventually it will delete all the records that are less than GETUTCDATE()- 10.source: 150 milliondestination: 0temp: 0Steps:1. from source i will insert the records into temp WHERE Date < GETUTCDATE()- 102. Insert from temp to destination3. delete source where records exists in temp.4. drop tempI don't know if this is a good solution but if you have a better solution. I will like to hear it. |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2011-12-08 : 01:33:11
|
We don't want to delete all the records at once. We want to delete more records than inserted recordsthat doesnt make much sense. how do you delete more records than you inserted?? ------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
|
|
basicconfiguration
Constraint Violating Yak Guru
358 Posts |
Posted - 2011-12-08 : 02:14:38
|
yeah that statement is confusing. so disregard that. basically, I dont want to delete all the records at once because i dont wanna lock the database....so,lets say we have 100 records...and 80 records are 10 days older..I want to delete these 80. 1st day: I setup my job and run it tomorrow. the job deletes 20 records that older than 10 days ago...2nd day: the job again deletes 20 records that older than 10 days ago...3rd day: the job again deletes 20 records that older than 10 days ago...so on...To do this i have these steps what do you think?Steps:1. from source i will insert the records into temp WHERE Date < GETUTCDATE()- 102. Insert from temp to destination3. delete source where records exists in temp.4. drop temp |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-12-09 : 02:58:09
|
What we do is:SELECT [T_ID] = IDENTITY(int, 1, 1), PrimaryKeyField1, PrimaryKeyField2, ...INTO #TEMPTABLEFROM MyTableWHERE MyDeleteCrtieria < @SomeValue AND ...ORDER BY PrimaryKeyField1, PrimaryKeyField2, ...DECLARE @intRowCount int, @intOffset int, @intBatchSize intSELECT @intRowCount=1, -- Force first iteration @intOffset=1, @intBatchSize=20 -- Number of rows to delete on each iteration (probably should be 10,000 rather than 20 !! )WHILE @intRowCount >= 1BEGIN DELETE D FROM #TEMPTABLE AS T JOIN MyTable AS D ON D.PrimaryKeyField1 = T.PrimaryKeyField1 AND D.PrimaryKeyField2 = T.PrimaryKeyField2 AND ... WHERE T_ID >= @intOffset AND T_ID <= @intOffset + @intBatchSize SELECT @intRowCount = @@ROWCOUNT, @intOffset = @intOffset + @intBatchSize WAITFOR DELAY '00:00:02' -- Delay 2 seconds to allow other processes to runEND this has the benefit that the rows deleted in each iteration are clustered by the Primary Key - so likely to be on the same index page and thus locking/blocking will be minimal (NOTE: I am assuming your Primary Key is using the Clustered Index, if not the ORDER BY should be by Clustered Index keys instead). There will still be random access to secondary indexes though.The use of WAITFOR DELAY allows other processes to run in between each iterationYou can adjust the Batch Size to have minimal impact, and the DELAY to allow other processes to run (we adjust them dytnamically so that as load increases on server the DELETE operation reduces its batch size). You can also have the loop abort if a certain time is reached - so that the routine only runs during the defined "quiet time" |
|
|
basicconfiguration
Constraint Violating Yak Guru
358 Posts |
Posted - 2011-12-12 : 13:34:22
|
delete Source.dbo.Transaction With (tablockx, holdlock)from Source.dbo.Transaction where TransactionID IN (SELECT TransactionID FROM Archive.dbo.Transaction)Something happened to my process I have records in my source table that has not been deleted. I'm running the above query to delete them since they have been already moved to the destination table. However its taking forever to execute. Any ideas to make it faster? |
|
|
basicconfiguration
Constraint Violating Yak Guru
358 Posts |
Posted - 2011-12-12 : 17:50:58
|
kRISTEN,SELECT [T_ID] = IDENTITY(int, 1, 1), PrimaryKeyField1, PrimaryKeyField2, ...INTO #TEMPTABLEFROM MyTablei think select into doesnt work with identity. i get an error.Msg 177, Level 15, State 1, Line 2The IDENTITY function can only be used when the SELECT statement has an INTO clause. |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2011-12-13 : 01:02:50
|
you posted part looks fine. can you post whole query? seems like you've some attahed part which is causing the error.------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
|
|
basicconfiguration
Constraint Violating Yak Guru
358 Posts |
Posted - 2011-12-13 : 13:03:58
|
Wrote the code again. Seems to work. Let me know if there is anything I could change.----------------------------------------------------------------------------------------- dbo.ComputerDocument---------------------------------------------------------------------------------------SET NOCOUNT ONSELECT TOP 100000 ComputerDocument.ComputerDocumentID INTO #ComputerDocument FROM ORACLE.dbo.ComputerDocument WITH (TABLOCKX, HOLDLOCK) WHERE ComputerDocument.DateCompleted < GETUTCDATE()- 7 INSERT ORACLEArchive.dbo.ComputerDocument SELECT ComputerDocument.ComputerDocumentID, ComputerDocument.TransactionID, ComputerDocument.DocumentID, ComputerDocument.DocType, ComputerDocument.PartnerID, ComputerDocument.[Status], ComputerDocument.DateCompleted, ComputerDocument.QueuedDate, ComputerDocument.PageStatus FROM ORACLE.dbo.ComputerDocument WITH (TABLOCKX, HOLDLOCK) JOIN #ComputerDocument ON ComputerDocument.ComputerDocumentID = #ComputerDocument.ComputerDocumentIDDECLARE @BatchSize INTSET @BatchSize = 10000WHILE (@BatchSize <> 0)BEGIN DELETE TOP (@BatchSize) ORACLE.dbo.ComputerDocument FROM ORACLE.dbo.ComputerDocument WITH (TABLOCKX, HOLDLOCK) WHERE ComputerDocument.ComputerDocumentID IN (SELECT #ComputerDocument.ComputerDocumentID FROM #ComputerDocument) SELECT @BatchSize = @@rowcountENDDROP TABLE #ComputerDocumentGO |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-12-14 : 09:32:38
|
I would use a separate value for BatchSize and RowCount - using @BatchSize for @@rowcount may cause the batch size to reduce from the initial size - e.g. if at the time the delete runs there are rows missing - which would then cause subsequent batches to be smaller / or possibly very small & inefficient.I'm not sure that the Lock hints you are using are sensible. I think they will cause blocking for prolonged periods of time.You are not using a WAITFOR DELAY, so you aren't providing an decent opportunity for other processes to run, so you may find this is not much different to just trying to delete all the rows "in one go".I think you should use ORDER BY ComputerDocument.DateCompleted on you initial creation of #ComputerDocument - otherwise you will be deleting rows randomly within ORACLE.dbo.ComputerDocument which I think will leave to confusion on any queries of that table.My suggestion was to use an IDENTITY in the #ComputerDocument and use that to control the delete batches. Your approach of using DELETE TOP (@BatchSize) will become progressively more inefficient as the process proceeds, as fewer and fewer matching rows will exist in #ComputerDocument |
|
|
basicconfiguration
Constraint Violating Yak Guru
358 Posts |
Posted - 2011-12-14 : 13:18:51
|
thanks K, for the insight...but cannot use identity on the temp table becausae i'm already copying the identity from the source table.SELECT TOP 100000 IDENTITY(int, 1, 1) as [T_ID], TransactionDocument.TransactionDocumentID INTO #TransactionDocument ...Msg 8108, Level 16, State 1, Line 1Cannot add identity column, using the SELECT INTO statement, to table '#TransactionDocument', which already has column 'TransactionDocumentID' that inherits the identity property. |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-12-14 : 13:56:18
|
In that case CAST the identity column from the source table as INT. Better still pre-create the table and then INSERT into it. |
|
|
basicconfiguration
Constraint Violating Yak Guru
358 Posts |
Posted - 2011-12-16 : 13:19:10
|
4 questions:You are not using a WAITFOR DELAY, so you aren't providing an decent opportunity for other processes to run, so you may find this is not much different to just trying to delete all the rows "in one go".1.Sorry, stupid Q... What are the other processess? Why 2 and not 1?My suggestion was to use an IDENTITY in the #ComputerDocument and use that to control the delete batches. Your approach of using DELETE TOP (@BatchSize) will become progressively more inefficient as the process proceeds, as fewer and fewer matching rows will exist in #ComputerDocument2. Sorry, stupid Q... Could you give me an example of this?I think you should use ORDER BY ComputerDocument.DateCompleted on you initial creation of #ComputerDocument - otherwise you will be deleting rows randomly within ORACLE.dbo.ComputerDocument which I think will leave to confusion on any queries of that table.3. Order by slows down the whole thing...not good. Why do i need to order by datecompleted? Even if its deleting randomly eventually will delete all the rows that are < GETUTCDATE()- 7. What is the purpose of the order by?4. I just also wonder if I need a begin transaction commit and rollback? |
|
|
|