Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
alext
Starting Member
1 Post |
Posted - 2011-07-28 : 03:06:56
|
| I am currently involved in developing a cost-sensitive solution that needs to run on either SQL 2K8 Express/Std editions, where time-ordered data is logged into the database as fast as possible. Currently, we are using MERGE, with a Table-Valued Parameter in a stored procedure to pass chunks of data (around 50K rows at a time) into the database and this has thus far given us acceptable performance. However, given the nature and potential size of the data, depending on the customer's requirement, we wish to have the ability to split the data between files for simpler maintenance/backup purposes, as well as address potential performance issues once the table sizes become large. Given we cannot use SQL Enterprise to implement table partitioning, I am attempting to use a partitioned view to achieve something similar. The majority of data written to the database is likely to be inserted, not updated. Also, the log data is used to generate KPIs whose evaluation time range can be defined by the user, but the majority of queries are likely to be within the last few months.The main issue I have now is how do I replace the MERGE in the stored procedure with a more conventional UPSERT syntax, since MERGE doesn't appear to support partitioned views. I have found this article (http://bwunder.wordpress.com/2010/12/10/persisting-data-part-4-upsert-and-merge/) that is closest to what I want to do but since I'm not a seasoned SQL expert, I'm struggling to implement the example for our purposes. Any assistance/comments/suggestions/pitfalls of how to modify the stored procedure to use a standard UPDATE/INSERT (or a totally different approach) would be highly appreciated. From the results posted in the above link, it would appear the performance of the UPSERT vs the MERGE should be quite similar with the quantity of rows we are using. This is the basis of current stored procedure:USE [LogDB]GOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGO--CREATE TYPE dbo.DataLogTableType AS TABLE-- ( LogID int,Value float,Attribute int,[TimeStamp] DateTime2) -- matches the structure of the LogData table----LogData table has PK of [LogID] and [TimeStamp], no identity fields or other indexes.CREATE PROCEDURE [dbo].[sp_Log_BulkInsertUpdate]@IncomingLogData DataLogTableType READONLYASMERGE LogData AS TargetUSING@IncomingLogData AS SOURCEON (TARGET.TimeStamp = SOURCE.TimeStamp AND TARGET.LogID = SOURCE.LogID)WHEN NOT MATCHED-- BY TARGET THEN INSERT(LogID,Value,Attribute,TimeStamp)VALUES(SOURCE.LogID,SOURCE.Value,SOURCE.Attribute,SOURCE.TimeStamp)WHEN MATCHED THEN UPDATE SET TARGET.Value = SOURCE.Value,TARGET.Attribute = SOURCE.Attribute;SELECT @@ROWCOUNTGOThe partitioning plan is to programmaticaly create time-ranged log tables, e.g. LogData20110101, LogData20110201, LogData20110301 for example on the fly if a monthly partition was chosen and update the partitioned view appropriately and make the view (called LogDataView, for example), the target in the above stored procedure. A sample LogDataView would be:SELECT Value, Attribute [TimeStamp], LogIDFROM LogData20110101UNION ALLSELECT Value, Attribute [TimeStamp], LogIDFROM LogData20110201UNION ALLSELECT Value, Attribute [TimeStamp], LogIDFROM LogData20110301and each LogData table would have the appropriate CHECK constraint implemented on the [TimeStamp] column. |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-07-31 : 15:23:29
|
| The equivalent of a merge isupdate rows that existinset rows that do not existsoupdate LogData SET Value = SOURCE.Value,Attribute = SOURCE.Attributefrom LogData Targetjoin @IncomingLogData SOURCEON TARGET.TimeStamp = SOURCE.TimeStamp AND TARGET.LogID = SOURCE.LogIDinsert LogData (LogID,Value,Attribute,TimeStamp)select SOURCE.LogID,SOURCE.Value,SOURCE.Attribute,SOURCE.TimeStampfrom @IncomingLogData SOURCEleft join LogData TargetON TARGET.TimeStamp = SOURCE.TimeStamp AND TARGET.LogID = SOURCE.LogIDwhere TARGET.TimeStamp is nullYou might findthat the partitioning doesn't help though unless you are inserting into a new table. I would populate that then index it and then add it to the partitioned view. Could it be that you just need to reassess the indexes? You have two columns for this join - if that is a clustered inex it could be slowing everthing downIf you use the new table as I suggested above then you could bcp the new rows out to a file then bcp them into the table which should be much faster for a lot of data.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
|
|
|
|
|