Query large data - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Query large data

Author

Topic

tech1
Starting Member

49 Posts

Posted - 2011-07-18 : 15:40:29

i have a client and they are very...hmm. how shall I say, "non comformative".

They are going to be inserting literally a couple of millions of rows of data in a single DB PER DAY

the DB is SQL Server 2008 R2 Standard.

There are 2 tables:

TableA has about 7/8 fields, about 3/4 of the fields are fixed length chars (of 7).
Table B containes about 4 fields which are the same from TableA (to make a form of a "composite" key, and also 1 field which is varchar(MAX) to hold large strings. This is simply for storage and retrieval.

What is the best and fastest way to query TableB?

We have:

TransDateTime
BranchNumber (char)
MachineNumber (char)
TransNumber (char)
DataString (varchar(MAX))

so there maybe a query where we need to retrieve a singular record based on all fields of the above except for the last field. The last field is our data field which we need.

any ideas? Currently querying a table which has 30mill rows takes... hours. literally.

Would it be better to split TableB into seperate tables based on TransDateTime ? (so per day when they are inserting data, a new table is made with the same schema but prefixed for that date and only data for that date is stored)

thoughts?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-07-18 : 15:51:11

I can query a table with a couple billion rows in under a second. Don't you have indexes in place?

You do not need to split the table up. Your table is considered small these days.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tech1
Starting Member

49 Posts

Posted - 2011-07-18 : 16:21:10

Thanks fellow MVP.

There is a clustered index however it is still slow.

let me give you an example, a real example:

today at around 4:45pm I ran a query. The query is STILL executing as I write this. it is now 9:22pm

ok so there is an exception here, and that is my local dev box (they supplied) has 12GB of RAM and 98% is being used by SQL.

so currently its taking 3 hours and 50 mins to execute...and STILL currently is executing.

the query:


DECLARE @transDate datetime
SET @transDate = '2011-07-10 12:18:00'


	SELECT [DataString] FROM MySchema.TableB
	WHERE BranchNumber = '0630    '
	AND MachineNumber = '106    '
	AND TransNumber = '40     '

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-07-18 : 16:43:34

Show us the DDL of the table and be sure to include the clustered index. You likely need to add more indexes.

Have you checked the wait stats and/or blocking?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tech1
Starting Member

49 Posts

Posted - 2011-07-18 : 16:58:41

the execution plan states that its 100% at the clustered index scan.

and as I write this, the query finally executed! it took 4 hours and 1 minute.


CREATE TABLE [MySchema].[TableA](
	[ProcessDateTime] [smalldatetime] NOT NULL,
	[TransactionDate] [smalldatetime] NOT NULL,
	[TransactionDateTime] [datetime] NOT NULL,
	[TransactionHour] [tinyint] NOT NULL,
	[BranchNumber] [char](8) NOT NULL,
	[TillNumber] [char](7) NOT NULL,
	[TransNumber] [char](7) NOT NULL
 CONSTRAINT [TableA_pk] PRIMARY KEY CLUSTERED 
(
	[TransactionDateTime] ASC,
	[BranchNumber] ASC,
	[TillNumber] ASC,
	[TransNumber] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
EXEC sys.sp_addextendedproperty @name=N'PKEngine_LastRowCount', @value=N'0' , @level0type=N'SCHEMA',@level0name=N'MySchema', @level1type=N'TABLE',@level1name=N'TableA'
GO


SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [MySchema].[TableB](
	[TransactionDateTime] [datetime] NOT NULL,
	[BranchNumber] [char](8) NOT NULL,
	[TillNumber] [char](7) NOT NULL,
	[TransNumber] [char](7) NOT NULL,
	[StringData] [varchar](max) NOT NULL,
 CONSTRAINT [TableB_pk] PRIMARY KEY CLUSTERED 
(
	[TransactionDateTime] ASC,
	[BranchNumber] ASC,
	[TillNumber] ASC,
	[TransNumber] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2011-07-18 : 17:05:18

quote:

Originally posted by tech1


DECLARE @transDate datetime
SET @transDate = '2011-07-10 12:18:00'


	SELECT [DataString] FROM MySchema.TableB
	WHERE BranchNumber = '0630    '
	AND MachineNumber = '106    '
	AND TransNumber = '40     '

Isn't your WHERE clause missing the criteria for TransactionDateTime?
Since that is the first column or your key you need it to take advantage of the clustered index

Be One with the Optimizer
TG

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-07-18 : 17:05:51

A clustered index scan is a table scan, hence your problem.

For this query:

SELECT [DataString] FROM MySchema.TableB
WHERE BranchNumber = '0630 '
AND MachineNumber = '106 '
AND TransNumber = '40 '

You should add this covering index:
create index idx_BranchNo_MachineNo_TransNo_Inc_DataString
on myschema.tableb (BranchNumber, MachineNumber, TransNumber)
include (DataString)

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tech1
Starting Member

49 Posts

Posted - 2011-07-18 : 17:13:18

Thanks.

sorry - I also did include the datetime previously and that alone took 2 hours and 56 mins.

tkizer: I will be using SQL Bulk copy to insert data. would this slow down the performance if I add the covering index?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-07-18 : 17:49:32

Yes, but not by much. Without the right index, your query is going to be slow and there is no way to fix that.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tech1
Starting Member

49 Posts

Posted - 2011-07-18 : 17:52:57

thanks. will see what happens tomorrow.

I'll put in the index now, and since early in the morning it starts to do the BULK copy... will see how slow it will get. and then also compare it with the query. But my fear is now that the query will be much faster because the query is cached... ?? or would I be better now to restart SQL Server service and then apply the new index?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-07-18 : 18:07:02

You don't have to restart SQL to clear the cache. You can use DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS to achieve it.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tech1
Starting Member

49 Posts

Posted - 2011-07-18 : 18:09:24

the FREEPROCCACHE didn't really work. I still see 13GB being used by SQL.

just running the query you had given me. its been 5 minutes now... still executing

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-07-18 : 18:12:35

FREEPROCCACHE clears the procedure cache, it doesn't release memory back to Windows.

Run these queries to see how many plans are cached:

http://sqlserverperformance.wordpress.com/2010/04/19/a-dmv-a-day-%E2%80%93-day-20/
http://sqlserverperformance.wordpress.com/2010/04/15/a-dmv-a-day-%E2%80%93-day-16/

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-07-18 : 18:15:49

If the index didn't fix your performance issue, then you've got something wrong. Check the lastwaittype column in sysprocesses for your spid. What is it? Is it changing? Is IO increasing?

Oh and most importantly what is your maxdop set to? This could be a parallelism issue. You could choose to add MAXDOP=1 to your query or change the server-side setting to 1 (or 2).

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tech1
Starting Member

49 Posts

Posted - 2011-07-18 : 18:18:15

hmm. still running (17 mins now)

I will post back when its finished executing...

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-07-18 : 18:18:58

Please read my last post. We need you to get these answers while it is in this state.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tech1
Starting Member

49 Posts

Posted - 2011-07-18 : 18:27:45

sure, no worries.

im not a SQL Guru but are you able to guide me on where to get the information from?

tech1
Starting Member

49 Posts

Posted - 2011-07-18 : 18:51:40

last wait type: SOS_SCHEDULER_YIELD

Parallelism:
Cost threshold for Parallelism: 5
Locks: 0
Max Degree of Parallelism: 0
Query Wait -1

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-07-18 : 18:56:15

Consider changing maxdop to 1 or 2 either at the query level or at the server level. You've got a CPU issue that is most likely due to parallelism.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tech1
Starting Member

49 Posts

Posted - 2011-07-18 : 18:57:40

the CPU has 8 cores. I will do that when it comes back alive....

tech1
Starting Member

49 Posts

Posted - 2011-07-19 : 06:50:34

ok so I applied that index. And it was doing the bulk insert. it was taking AGES. usually it takes around 20 mins/30 mins but it was taking more than 2 hours. as soon as I removed that index, bam - everything was speeding up.

we still then have the issue of querying data.

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources