Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Andre412
Starting Member
11 Posts |
Posted - 2009-05-05 : 10:32:15
|
Hi Im new to this forum, this is my first post. I wanted to run soemthing past you experts before i go ahead with what i have plannedWe have a few tables in our DB that are filling with too much data. we are sitting at 48 million records in one of them as i write this.The tables contain data pertainnig to who is browsing our website, it is the basis of our pay per click engine that identifies valid "human traffic" as opposed to bots, crawlers etcMy plan is to archive the data reflected by non human traffic. The database is fully normalised with good referential integrity. I can flag the data i need to archive by running each record in the base table against another table that identifies if the record refelctas a human or a bot. Once flagged i plan to itterate throught the records copying the information across to the "archive db" and removing the record from the production db table which in turn will cascade and remove references from other tables.My question is, is this a good approach to controlling bloating of data that will never be used, archive then delete. Or is there a prefered industry standard approach.I would also like to know the impact on a server if i need to add a "bit" field to a table containing 40M records.Hope thats enough detail for you alllooking forward to your responses |
|
pootle_flump
1064 Posts |
Posted - 2009-05-06 : 04:30:53
|
This practice if broadly fine. The only two things I would say are:Why "iterate through the records"? Why not perform this in a set based operation?Why flag the rows and then act upon them? Why bother with the bit column? Why not perform the actions based on joins to this other table, rather than writing bit column values (less processes, less space required, no normalisation problems).Also, what version of SQL Server? |
|
|
Andre412
Starting Member
11 Posts |
Posted - 2009-05-06 : 05:25:03
|
Hi PThanks for responding.I was going to use the bit field as a flag. The reason for this is due to high level of table transactions. This table is constantly being inserted to. Any lock on the table must be for a minimum amount of time. The best way i could think of was to create an insert trigger that cross references the "browser details" against our list ~5000 records and set the bit. I was then going to periodically run the "archive" SP as a job to move the records across to the archive db and delete from production.The iteration would take place through a cursor, since running a cursor has minimal impact on table locking where as running a large copy and delete would lock the table until the transaction was complete.We are running SQL 2005 in our dev and production environments.Unless you can confirm that a large "batch" operation using joins on nvarchar(max) fields on a table with 40million records is faster than a cursor on a fully indexed bit field iterating one row at a time? |
|
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-05-06 : 06:27:00
|
Why delete the records at all during busy times?If you have a bit field which marks the rows you don;t care about can't you index that field then use that in your queries to cut down on execution times.Then do a job that you could schedule during your "quiet" times (if you have any) to remove the unneeded rows.If that's just not possible how about using a paged delete instead of row by bloody row?do a loop and deleted x rows at a time rather than 1 at a time.with some experimentation you will find a "sweet spot" for the max number of rows to be removedCharlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
|
|
Andre412
Starting Member
11 Posts |
Posted - 2009-05-06 : 06:51:28
|
Hi CharlieUnfortunately there is no quiet period, this is a global www and when europe goes quiet the US wakes up.So deleting in a quiet time although ideal is not an option in this case.Paged delete sounds like a good idea, i will take a look now.Just to update you all. A small amount of investigation revealed that any browser in the world performing a request will not present an "http user agent" id length of more than 450, well 99.9% of the time doesnt exceed 300, however an nvarchar of 450 can be indexed, hence chosing this value of 450 is the preferential.So i can now operate on a fully indexed set of fields improving performance even further in the trigger.Thanks for your responses so far, been very helpful |
|
|
Andre412
Starting Member
11 Posts |
Posted - 2009-05-11 : 06:17:46
|
Hi CharlieI was taken off of this issue last week in favour of a more pressing problem, I've just returned to this again and cannot seem to find much in the way of information regarding paged deletes.Can you explain briefly how i can go about paged deletingthanks in advanceme http://drayblog.gotdns.comcompany http://www.lowcarboneconomy.com |
|
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-05-11 : 07:11:05
|
oh -- I just meant that instead of performing a huge set based delete in one go (for thousands of records and up) then you could so a loop and just do something like this-- Setup the tablesIF OBJECT_ID('tempdb..#seed') IS NOT NULL DROP TABLE #seedIF OBJECT_ID('tempdb..#del') IS NOT NULL DROP TABLE #delCREATE TABLE #seed ([a] INT)INSERT #seed ([a]) SELECT 1UNION SELECT 2UNION SELECT 3UNION SELECT 4UNION SELECT 5UNION SELECT 6UNION SELECT 7UNION SELECT 8UNION SELECT 9UNION SELECT 10CREATE TABLE #del ( [a] INT IDENTITY(1,1) PRIMARY KEY , [b] VARCHAR(50) )INSERT #del ([b]) SELECT CAST(CAST(NEWID() AS VARBINARY(16)) AS VARCHAR(16)) FROM #seed a CROSS JOIN #seed b CROSS JOIN #seed c CROSS JOIN #seed d CROSS JOIN #seed e CROSS JOIN #seed fGODECLARE @num INTDECLARE @rows INTDECLARE @error INTSET @rows = 1SET @num = 500WHILE @rows > 0 BEGIN BEGIN TRANSACTION DELETE TOP (@num) FROM #del WHERE [a] > 500000 SELECT @error = @@ERROR, @rows = @@ROWCOUNT IF @error = 0 BEGIN COMMIT TRANSACTION END ELSE BEGIN ROLLBACK TRANSACTION RETURN -- @error ENDEND Here there is a table with 1 million records and we want to remove 500 000 of them. If you do them all at once it takes a long time. If you do it one at a time it takes a long time. However if you have a transaction where you remove a page at once (set to 500) in this example it doesn'tCharlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
|
|
Andre412
Starting Member
11 Posts |
Posted - 2009-05-13 : 20:37:35
|
Hi Charliethanks for your response, though for some reason i didnt see it until now.I went about it a slightly different way based on this post [url]http://www.sqlusa.com/bestpractices2005/hugeupdate/[/url]It was such a performance increase i was shocked. updated 17 million records in 35 minutes with minimal impact on the traffic.I was so impressed i wrote a quick blog about it here [url]http://drayblog.gotdns.com/index.php/2009/05/14/sql-update-updating-millions-of-records-nicely/[/url]Im sure there is likely further optimising but it was very late and i needed to finish.Your code snippet has given me some more ideas with regards automation, thanks again.me http://drayblog.gotdns.comcompany http://www.lowcarboneconomy.com |
|
|
|
|
|
|
|