table design for table with 10 million records

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

table design for table with 10 million records

Author

Topic

CrazyT
Yak Posting Veteran

73 Posts

Posted - 2010-04-14 : 15:34:11

I am creating a simple table with a lot of records

basic design

[rowID] [int] IDENTITY(1,1) NOT NULL,
[anotherUniqueID] [int] NULL,

the other columns are all unique

is this the best design or should i do something different.

Im not use to dealing with tables with this many records and want to amke sure it can be queried properly etc

DBA in the making
Aged Yak Warrior

638 Posts

Posted - 2010-04-14 : 15:48:55

It's indexes that you need to worry about. That will depend on the queries you're running against the data.

------------------------------------------------------------------------------------
Any and all code contained within this post comes with a 100% money back guarantee.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-04-14 : 16:10:36

There is really no special consideration for a table that small. 10 million rows can be queried in a few milliseconds with the proper indexes and query design. I have a table with a few hundred million rows that can be queried in less than 300 milliseconds 99% of the time.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

CrazyT
Yak Posting Veteran

73 Posts

Posted - 2010-04-14 : 16:26:14

if i were making a create table statement how would i include the indexes in it.

hanbingl
Aged Yak Warrior

652 Posts

Posted - 2010-04-14 : 16:47:41

at least define primary key and foreign keys. then common searched columns like names, dates. who knows what developers can come up with.

quote:
Originally posted by CrazyT

if i were making a create table statement how would i include the indexes in it.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-04-14 : 17:01:09

Typically you work with the developers to figure out what indexes are needed. Don't add an index to satisfy every query, just most queries. You don't want to overly index a table as that'll hurt inserts, updates, and deletes.

An index can be created via the CREATE INDEX command or you can use the GUI.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Kristen
Test

22859 Posts

Posted - 2010-04-15 : 02:27:10

" the other columns are all unique"

Every column is unique? Kinda unusual ... but maybe it is just a "conversion" from IDs for various parties which refer to the same objects.

You need a Unique Constraint (which in SQL will create an index) on all columns.

If some columns also allow NULLs then use a Filtered Index (i.e. to only index the NOT NULL values)

That will give you best performance on any column that is involved in a JOIN (to another table)

DBA in the making
Aged Yak Warrior

638 Posts

Posted - 2010-04-15 : 04:47:07

quote:
Originally posted by Kristen
You need a Unique Constraint (which in SQL will create an index) on all columns.

I wouldn't create an index on every column.

------------------------------------------------------------------------------------
Any and all code contained within this post comes with a 100% money back guarantee.

Kristen
Test

22859 Posts

Posted - 2010-04-15 : 04:57:12

If:

If every column is UNIQUE
Every column is used as a JOIN to another table

then I would. Seems an improbable scenario though

I would have second thoughts though if there are UPDATEs (seems that a UNIQUE value which is an FKey to another table will be unlikely to change (although some may change from NULL to Value, and maybe the other direction too), but if they change like the wind then Indexes may not be a good idea.

As ever ... "it depends" ... and if you ask two SQL Gurus you'll get at least 3 answers!

Kristen
Test

22859 Posts

Posted - 2010-04-15 : 04:58:08

"I wouldn't create an index on every column"

Just in case open to interpretation

then I meant a separate index on each column, not one index with keys for every column!

DBA in the making
Aged Yak Warrior

638 Posts

Posted - 2010-04-15 : 05:20:02

quote:
Originally posted by Kristen
Just in case open to interpretation then I meant a separate index on each column, not one index with keys for every column!

Just to be 100% clear, I wouldn't do either of those things. That's going to add quite a bit of overhead for each insert, update and delete.

------------------------------------------------------------------------------------
Any and all code contained within this post comes with a 100% money back guarantee.

Kristen
Test

22859 Posts

Posted - 2010-04-15 : 05:37:01

As I assume it (by way of an example)

The table has, say, 10 columns, all UNIQUE, and JOINs to other tables which use each of those 10 (unique) column values

I'm assuming that
1a) SELECTs are far more common than INSERTS
1b) All columns participate in a WHERE or a JOIN reasonably often
2) DELETEs are very rare
3a) UPDATES may occur which change NULL columns to a VALUE (so will cause an ADD to that index, but no changes to other indexes)
3b) UPDATES to other columns (to change an existing value) will be very rare

I don't see the problem with an index (on each column) in such circumstances.

I agree that if there are significant numbers of updates to existing (indexed) values, or significant numbers of deletes, or lots of column values that are never/rarely selected and never/rarely participate in JOIN or WHERE clauses, then an index would be a waste of time.

But would you still not index each column given my assumptions?

(Unique Constraint is going to be required on each column, but that could be enforced by Trigger to avoid using an index - e.g. if a column is never used in a JOIN or a WHERE)

DBA in the making
Aged Yak Warrior

638 Posts

Posted - 2010-04-15 : 06:26:54

quote:
Originally posted by Kristen

As I assume it (by way of an example)

The table has, say, 10 columns, all UNIQUE, and JOINs to other tables which use each of those 10 (unique) column values

I'm assuming that
1a) SELECTs are far more common than INSERTS
1b) All columns participate in a WHERE or a JOIN reasonably often
2) DELETEs are very rare
3a) UPDATES may occur which change NULL columns to a VALUE (so will cause an ADD to that index, but no changes to other indexes)
3b) UPDATES to other columns (to change an existing value) will be very rare

That's a lot of assumptions. There's nothing from the OP that indicates any of them are true.

quote:
But would you still not index each column given my assumptions?

It's the assumptions I have issue with. They very well may be correct. But until we know that, the suggestion of 8 additional indexes is a little premature.

quote:
(Unique Constraint is going to be required on each column, but that could be enforced by Trigger to avoid using an index - e.g. if a column is never used in a JOIN or a WHERE)

A trigger would be worse than an index. If there's no index on the column, then the trigger would have to do a table scan for every record in the table.

The other thing to consider is that "unique" isn't always "unique". If I'd put a unique constraint on every column where the client has told me "Yeah, we'll never have 2 of those that are the same", I'd have a lot of broken code.

------------------------------------------------------------------------------------
Any and all code contained within this post comes with a 100% money back guarantee.

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2010-04-15 : 06:30:54

Testing is king.

Do you have a UAT environment or similar where you can test out index changes? If so then just create indexes as Kristen suggests and then you can monitor them. Here's a bit of SQL that you can use to get the read's and writes for each index. If an index is written to far more often than read from then it's probably not a great index (except of course if those few reads are vitally important)


/*** Index Usage Statistics ***************************************************
*
* Reports the reads and writes to the indices of the current database.
*
* Reads indicate that the index has been used in queries
*
* writes indicate the effort of maintaining the index. Therefore the indicies
* with the highest read to write ratio are the indicies used the most be the
* system
*
* Charlie (2009-Dec-18)
*
******************************************************************************/

-- Set this to limit the results to tables / indices above a certain no of rows
DECLARE @rowThreshold INT              SET @rowThreshold = 1000

/*****************************************************************************/
SET NOCOUNT ON

SELECT 
	o.[name] AS [Table Name]
	, i.[name] AS [Index Name]
	, i.[index_id] AS [Index Id]
	, s.[user_seeks] + s.[user_scans] + s.[user_lookups] AS [Total Reads]
	, s.[user_updates] AS [Writes]
	, rws.[rows] AS [Rows]

	, CASE
			WHEN s.[user_updates] < 1 THEN 100
			ELSE 1.00 * (s.[user_seeks] + s.[user_scans] + s.[user_lookups]) / s.[user_updates]
		END AS [Reads Per Write]

	, 'DROP INDEX '
		+ QUOTENAME(i.[name])
		+ ' ON '
		+ QUOTENAME(sc.[name])
		+ '.'
		+ QUOTENAME(OBJECT_NAME(s.[object_id]))
		AS [Drop Statement]

FROM
	sys.dm_db_index_usage_stats s  
	JOIN sys.indexes i ON i.[index_id] = s.[index_id] AND s.[object_id] = i.[object_id]
	JOIN sys.objects o ON s.[object_id] = o.[object_id]
	JOIN sys.schemas sc ON o.[schema_id] = sc.[schema_id]

	-- Rows for the object / index
	JOIN (
		SELECT
			p.[index_Id]
			, p.[object_ID]
			, SUM(p.[rows]) AS [rows]
		FROM
			sys.partitions p
		GROUP BY
			p.[index_Id]
			, p.[object_Id]
		)
		rws ON rws.[index_Id] = s.[index_Id] AND rws.[object_Id] = s.[object_Id]
		
WHERE
	OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1
	AND s.[database_id] = DB_ID()   
	AND i.[type_desc] = 'nonclustered'
	AND i.[is_primary_key] = 0
	AND i.[is_unique_constraint] = 0
	AND rws.[rows] > @rowThreshold
ORDER BY
	[Reads Per Write] ASC
	, [Rows] DESC

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

Kristen
Test

22859 Posts

Posted - 2010-04-15 : 07:35:15

"That's a lot of assumptions"

Indeed. But if it is a table full of unique IDs (**IF**!!) then I think they are a reasonable bet.

No substitute for the O/P telling us the actual SP though. (Probably a poor choice of acronym in this forum! SP=Starting Price in Racing parlance ...)

CrazyT
Yak Posting Veteran

73 Posts

Posted - 2010-04-15 : 14:58:15

all the other columns are not unique. typo on my part.

as for what is being done. I will retrieve a 100 rows at a time (primary key, unique key for the data, and a another column -- passing a start/end row with row_number(). Then all I need to do is update two columns in the database on each rows.

So its a simple select but with alot of records. Only 3 columns will have any data to index, with the primary key being one.

DBA in the making
Aged Yak Warrior

638 Posts

Posted - 2010-04-15 : 16:57:53

quote:
Originally posted by DBA in the making
The other thing to consider is that "unique" isn't always "unique".

quote:
Originally posted by CrazyT

all the other columns are not unique. typo on my part.

LOL. ;)

quote:
as for what is being done. I will retrieve a 100 rows at a time (primary key, unique key for the data, and a another column -- passing a start/end row with row_number(). Then all I need to do is update two columns in the database on each rows.

So its a simple select but with alot of records. Only 3 columns will have any data to index, with the primary key being one.

I don't think using row_number is going to be an efficient way of querying such a table. I just tried it on a table with 1.2m rows, and when selecting high row numbers, the number of reads was huge. The execution plan said it used a clustered index scan, which means it's reading each record sequentially from the clustered index.

------------------------------------------------------------------------------------
Any and all code contained within this post comes with a 100% money back guarantee.

hanbingl
Aged Yak Warrior

652 Posts

Posted - 2010-04-15 : 17:03:29

agree leave the pagination to programmers!

quote:
Originally posted by DBA in the making

quote:
Originally posted by DBA in the making
The other thing to consider is that "unique" isn't always "unique".

quote:
Originally posted by CrazyT

all the other columns are not unique. typo on my part.

LOL. ;)
quote:
as for what is being done. I will retrieve a 100 rows at a time (primary key, unique key for the data, and a another column -- passing a start/end row with row_number(). Then all I need to do is update two columns in the database on each rows.

So its a simple select but with alot of records. Only 3 columns will have any data to index, with the primary key being one.

I don't think using row_number is going to be an efficient way of querying such a table. I just tried it on a table with 1.2m rows, and when selecting high row numbers, the number of reads was huge. The execution plan said it used a clustered index scan, which means it's reading each record sequentially from the clustered index.

------------------------------------------------------------------------------------
Any and all code contained within this post comes with a 100% money back guarantee.

CrazyT
Yak Posting Veteran

73 Posts

Posted - 2010-04-16 : 11:29:38

DBA in the making -- i need to retrieve 100 records at a time to pass off to threads to work on.

row_number was the only way i could think of.

how would you propose doing that?

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources