Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Archive data

Author  Topic 

basicconfiguration
Constraint Violating Yak Guru

358 Posts

Posted - 2011-12-07 : 03:25:52
I have a table that has 200M records. We have to archive it in order to achieve better performance. Basically, copy the records to an archive database and delete the ones from the source. What is the best way to tackle this? Will like some info or steps how to do it.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2011-12-07 : 03:59:29
you can stored procedure for doing this. Add a parameter to specify period before which you need to do archiving with a default value. To automate it, add a sql server agent job to call this procedure and do the archiving as per convenient schedule

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

jassi.singh
Posting Yak Master

122 Posts

Posted - 2011-12-07 : 06:33:44
Yes, I had done this as a project.

I can tell you some pseudo code
1)Create same table structure on you destination table
2)Insert data based on you archive rule from source to destination
3)Finally delete from source database.

Also, TRANSACTION plays an important role otherwise everything will get messy....

You can use linked server to achieve this between different server


Java Database
Go to Top of Page

basicconfiguration
Constraint Violating Yak Guru

358 Posts

Posted - 2011-12-07 : 12:33:14
How do you not lock the table cause there are 200 millions record. I think if I do insert and delete, it will lock the table and slow down the servers and the Log file is going to grow like crazy.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2011-12-07 : 12:41:48
why not do archiving out of peak hours so that it doesnt affect other processing. Also it need not be very frequent. based on data growth it can even be a weekly activity

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

basicconfiguration
Constraint Violating Yak Guru

358 Posts

Posted - 2011-12-07 : 19:39:02
We don't want to delete all the records at once. We want to delete more records than inserted records. how can i achieve this?. My plan is to have a temp table between source and destination table. Set up a SQL job to run it daily and eventually it will delete all the records that are less than GETUTCDATE()- 10.


source: 150 million
destination: 0
temp: 0


Steps:
1. from source i will insert the records into temp WHERE Date < GETUTCDATE()- 10
2. Insert from temp to destination
3. delete source where records exists in temp.
4. drop temp

I don't know if this is a good solution but if you have a better solution. I will like to hear it.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2011-12-08 : 01:33:11
We don't want to delete all the records at once. We want to delete more records than inserted records

that doesnt make much sense. how do you delete more records than you inserted??


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

basicconfiguration
Constraint Violating Yak Guru

358 Posts

Posted - 2011-12-08 : 02:14:38
yeah that statement is confusing. so disregard that.

basically, I dont want to delete all the records at once because i dont wanna lock the database....so,lets say we have 100 records...and 80 records are 10 days older..I want to delete these 80.
1st day: I setup my job and run it tomorrow. the job deletes 20 records that older than 10 days ago...
2nd day: the job again deletes 20 records that older than 10 days ago...
3rd day: the job again deletes 20 records that older than 10 days ago...
so on...

To do this i have these steps what do you think?

Steps:
1. from source i will insert the records into temp WHERE Date < GETUTCDATE()- 10
2. Insert from temp to destination
3. delete source where records exists in temp.
4. drop temp
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-12-09 : 02:58:09
What we do is:

SELECT [T_ID] = IDENTITY(int, 1, 1),
PrimaryKeyField1, PrimaryKeyField2, ...
INTO #TEMPTABLE
FROM MyTable
WHERE MyDeleteCrtieria < @SomeValue
AND ...
ORDER BY PrimaryKeyField1, PrimaryKeyField2, ...

DECLARE @intRowCount int,
@intOffset int,
@intBatchSize int
SELECT @intRowCount=1, -- Force first iteration
@intOffset=1,
@intBatchSize=20 -- Number of rows to delete on each iteration (probably should be 10,000 rather than 20 !! )

WHILE @intRowCount >= 1
BEGIN
DELETE D
FROM #TEMPTABLE AS T
JOIN MyTable AS D
ON D.PrimaryKeyField1 = T.PrimaryKeyField1
AND D.PrimaryKeyField2 = T.PrimaryKeyField2
AND ...
WHERE T_ID >= @intOffset
AND T_ID <= @intOffset + @intBatchSize
SELECT @intRowCount = @@ROWCOUNT,
@intOffset = @intOffset + @intBatchSize
WAITFOR DELAY '00:00:02' -- Delay 2 seconds to allow other processes to run
END

this has the benefit that the rows deleted in each iteration are clustered by the Primary Key - so likely to be on the same index page and thus locking/blocking will be minimal (NOTE: I am assuming your Primary Key is using the Clustered Index, if not the ORDER BY should be by Clustered Index keys instead). There will still be random access to secondary indexes though.

The use of WAITFOR DELAY allows other processes to run in between each iteration

You can adjust the Batch Size to have minimal impact, and the DELAY to allow other processes to run (we adjust them dytnamically so that as load increases on server the DELETE operation reduces its batch size). You can also have the loop abort if a certain time is reached - so that the routine only runs during the defined "quiet time"
Go to Top of Page

basicconfiguration
Constraint Violating Yak Guru

358 Posts

Posted - 2011-12-12 : 13:34:22
delete Source.dbo.Transaction With (tablockx, holdlock)
from Source.dbo.Transaction
where TransactionID IN (SELECT TransactionID FROM Archive.dbo.Transaction)


Something happened to my process I have records in my source table that has not been deleted. I'm running the above query to delete them since they have been already moved to the destination table. However its taking forever to execute. Any ideas to make it faster?
Go to Top of Page

basicconfiguration
Constraint Violating Yak Guru

358 Posts

Posted - 2011-12-12 : 17:50:58
kRISTEN,

SELECT [T_ID] = IDENTITY(int, 1, 1),
PrimaryKeyField1, PrimaryKeyField2, ...
INTO #TEMPTABLE
FROM MyTable


i think select into doesnt work with identity. i get an error.

Msg 177, Level 15, State 1, Line 2
The IDENTITY function can only be used when the SELECT statement has an INTO clause.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2011-12-13 : 01:02:50
you posted part looks fine. can you post whole query?
seems like you've some attahed part which is causing the error.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

basicconfiguration
Constraint Violating Yak Guru

358 Posts

Posted - 2011-12-13 : 13:03:58
Wrote the code again. Seems to work. Let me know if there is anything I could change.


---------------------------------------------------------------------------------------
-- dbo.ComputerDocument
---------------------------------------------------------------------------------------
SET NOCOUNT ON

SELECT TOP 100000 ComputerDocument.ComputerDocumentID INTO #ComputerDocument
FROM ORACLE.dbo.ComputerDocument WITH (TABLOCKX, HOLDLOCK)
WHERE ComputerDocument.DateCompleted < GETUTCDATE()- 7


INSERT ORACLEArchive.dbo.ComputerDocument
SELECT ComputerDocument.ComputerDocumentID,
ComputerDocument.TransactionID,
ComputerDocument.DocumentID,
ComputerDocument.DocType,
ComputerDocument.PartnerID,
ComputerDocument.[Status],
ComputerDocument.DateCompleted,
ComputerDocument.QueuedDate,
ComputerDocument.PageStatus
FROM ORACLE.dbo.ComputerDocument WITH (TABLOCKX, HOLDLOCK)
JOIN #ComputerDocument
ON ComputerDocument.ComputerDocumentID = #ComputerDocument.ComputerDocumentID


DECLARE @BatchSize INT
SET @BatchSize = 10000

WHILE (@BatchSize <> 0)
BEGIN

DELETE TOP (@BatchSize) ORACLE.dbo.ComputerDocument
FROM ORACLE.dbo.ComputerDocument WITH (TABLOCKX, HOLDLOCK)
WHERE ComputerDocument.ComputerDocumentID
IN (SELECT #ComputerDocument.ComputerDocumentID FROM #ComputerDocument)

SELECT @BatchSize = @@rowcount

END


DROP TABLE #ComputerDocument
GO
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-12-14 : 09:32:38
I would use a separate value for BatchSize and RowCount - using @BatchSize for @@rowcount may cause the batch size to reduce from the initial size - e.g. if at the time the delete runs there are rows missing - which would then cause subsequent batches to be smaller / or possibly very small & inefficient.

I'm not sure that the Lock hints you are using are sensible. I think they will cause blocking for prolonged periods of time.

You are not using a WAITFOR DELAY, so you aren't providing an decent opportunity for other processes to run, so you may find this is not much different to just trying to delete all the rows "in one go".

I think you should use ORDER BY ComputerDocument.DateCompleted on you initial creation of #ComputerDocument - otherwise you will be deleting rows randomly within ORACLE.dbo.ComputerDocument which I think will leave to confusion on any queries of that table.

My suggestion was to use an IDENTITY in the #ComputerDocument and use that to control the delete batches. Your approach of using DELETE TOP (@BatchSize) will become progressively more inefficient as the process proceeds, as fewer and fewer matching rows will exist in #ComputerDocument
Go to Top of Page

basicconfiguration
Constraint Violating Yak Guru

358 Posts

Posted - 2011-12-14 : 13:18:51
thanks K, for the insight...but cannot use identity on the temp table becausae i'm already copying the identity from the source table.

SELECT TOP 100000
IDENTITY(int, 1, 1) as [T_ID],
TransactionDocument.TransactionDocumentID INTO #TransactionDocument
...



Msg 8108, Level 16, State 1, Line 1
Cannot add identity column, using the SELECT INTO statement, to table '#TransactionDocument', which already has column 'TransactionDocumentID' that inherits the identity property.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-12-14 : 13:56:18
In that case CAST the identity column from the source table as INT. Better still pre-create the table and then INSERT into it.
Go to Top of Page

basicconfiguration
Constraint Violating Yak Guru

358 Posts

Posted - 2011-12-16 : 13:19:10
4 questions:

You are not using a WAITFOR DELAY, so you aren't providing an decent opportunity for other processes to run, so you may find this is not much different to just trying to delete all the rows "in one go".

1.Sorry, stupid Q... What are the other processess? Why 2 and not 1?



My suggestion was to use an IDENTITY in the #ComputerDocument and use that to control the delete batches. Your approach of using DELETE TOP (@BatchSize) will become progressively more inefficient as the process proceeds, as fewer and fewer matching rows will exist in #ComputerDocument

2. Sorry, stupid Q... Could you give me an example of this?



I think you should use ORDER BY ComputerDocument.DateCompleted on you initial creation of #ComputerDocument - otherwise you will be deleting rows randomly within ORACLE.dbo.ComputerDocument which I think will leave to confusion on any queries of that table.

3. Order by slows down the whole thing...not good. Why do i need to order by datecompleted? Even if its deleting randomly eventually will delete all the rows that are < GETUTCDATE()- 7. What is the purpose of the order by?


4. I just also wonder if I need a begin transaction commit and rollback?
Go to Top of Page
   

- Advertisement -