Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Partitioned data (Non Enterprise SQL), Table-Value

Author  Topic 

alext
Starting Member

1 Post

Posted - 2011-07-28 : 03:06:56
I am currently involved in developing a cost-sensitive solution that needs to run on either SQL 2K8 Express/Std editions, where time-ordered data is logged into the database as fast as possible.

Currently, we are using MERGE, with a Table-Valued Parameter in a stored procedure to pass chunks of data (around 50K rows at a time) into the database and this has thus far given us acceptable performance. However, given the nature and potential size of the data, depending on the customer's requirement, we wish to have the ability to split the data between files for simpler maintenance/backup purposes, as well as address potential performance issues once the table sizes become large.

Given we cannot use SQL Enterprise to implement table partitioning, I am attempting to use a partitioned view to achieve something similar. The majority of data written to the database is likely to be inserted, not updated. Also, the log data is used to generate KPIs whose evaluation time range can be defined by the user, but the majority of queries are likely to be within the last few months.

The main issue I have now is how do I replace the MERGE in the stored procedure with a more conventional UPSERT syntax, since MERGE doesn't appear to support partitioned views. I have found this article (http://bwunder.wordpress.com/2010/12/10/persisting-data-part-4-upsert-and-merge/) that is closest to what I want to do but since I'm not a seasoned SQL expert, I'm struggling to implement the example for our purposes. Any assistance/comments/suggestions/pitfalls of how to modify the stored procedure to use a standard UPDATE/INSERT (or a totally different approach) would be highly appreciated. From the results posted in the above link, it would appear the performance of the UPSERT vs the MERGE should be quite similar with the quantity of rows we are using.

This is the basis of current stored procedure:

USE [LogDB]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

--CREATE TYPE dbo.DataLogTableType AS TABLE
-- ( LogID int,Value float,Attribute int,[TimeStamp] DateTime2) -- matches the structure of the LogData table
--
--LogData table has PK of [LogID] and [TimeStamp], no identity fields or other indexes.


CREATE PROCEDURE [dbo].[sp_Log_BulkInsertUpdate]
@IncomingLogData DataLogTableType READONLY
AS

MERGE LogData AS Target
USING
@IncomingLogData
AS SOURCE

ON (TARGET.TimeStamp = SOURCE.TimeStamp AND TARGET.LogID = SOURCE.LogID)

WHEN NOT MATCHED-- BY TARGET
THEN
INSERT(LogID,Value,Attribute,TimeStamp)
VALUES(SOURCE.LogID,SOURCE.Value,SOURCE.Attribute,SOURCE.TimeStamp)

WHEN MATCHED
THEN
UPDATE SET
TARGET.Value = SOURCE.Value,
TARGET.Attribute = SOURCE.Attribute;

SELECT @@ROWCOUNT

GO


The partitioning plan is to programmaticaly create time-ranged log tables, e.g. LogData20110101, LogData20110201, LogData20110301 for example on the fly if a monthly partition was chosen and update the partitioned view appropriately and make the view (called LogDataView, for example), the target in the above stored procedure.

A sample LogDataView would be:

SELECT Value, Attribute [TimeStamp], LogID
FROM LogData20110101
UNION ALL
SELECT Value, Attribute [TimeStamp], LogID
FROM LogData20110201
UNION ALL
SELECT Value, Attribute [TimeStamp], LogID
FROM LogData20110301

and each LogData table would have the appropriate CHECK constraint implemented on the [TimeStamp] column.

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-07-31 : 15:23:29
The equivalent of a merge is
update rows that exist
inset rows that do not exist

so
update LogData
SET Value = SOURCE.Value,
Attribute = SOURCE.Attribute
from LogData Target
join @IncomingLogData SOURCE
ON TARGET.TimeStamp = SOURCE.TimeStamp AND TARGET.LogID = SOURCE.LogID

insert LogData
(LogID,Value,Attribute,TimeStamp)
select SOURCE.LogID,SOURCE.Value,SOURCE.Attribute,SOURCE.TimeStamp
from @IncomingLogData SOURCE
left join LogData Target
ON TARGET.TimeStamp = SOURCE.TimeStamp AND TARGET.LogID = SOURCE.LogID
where TARGET.TimeStamp is null

You might findthat the partitioning doesn't help though unless you are inserting into a new table. I would populate that then index it and then add it to the partitioned view. Could it be that you just need to reassess the indexes? You have two columns for this join - if that is a clustered inex it could be slowing everthing down

If you use the new table as I suggested above then you could bcp the new rows out to a file then bcp them into the table which should be much faster for a lot of data.

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page
   

- Advertisement -