SUM column drops clustered index? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

SUM column drops clustered index?

Author

Topic

rihardh
Constraint Violating Yak Guru

307 Posts

Posted - 2004-01-13 : 09:51:32

Has anybody experienced that a SUM column in a simple SELECT statement with a GROUP BY in it, drops (ignores) the index (order) of the records?

I appologize, If there's allready an answer for this issue...

RH

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-13 : 09:52:56

Not sure what you mean.
The query isn't using an index that you expect it to?
You will have to give more info - the query and table structure.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

rihardh
Constraint Violating Yak Guru

307 Posts

Posted - 2004-01-13 : 09:56:15

It's very simple, take any table and make a clustered index on a column. Then take this same table and use it in a agregate query without explixitly selecting the column with the index or stating an ORDER BY. The order of the selected records changes.

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-13 : 10:11:03

If there is no order by then the resultset can have any order - it can change with the same query just because of the plan the server has decided to use that time or due to changes in the way the data is stored.
You should not expect two queries to have any correspondence in the record order without an order by.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

rihardh
Constraint Violating Yak Guru

307 Posts

Posted - 2004-01-13 : 16:56:43

Hmmm... I can't use the column for the order by statement, because the summation wouldn't work. I'll post some DDL, if you'll find the time, be my guest. Thanks for the reply.

RH

elwoos
Master Smack Fu Yak Hacker

2052 Posts

Posted - 2004-01-14 : 03:32:53

If you're just adding a column why would you need to know the order of it?

If you want to do the sum

1 +
2000 +
24

you don't rearrange it before you add up because there would be no benefit

steve

Steve no function beer well without

rihardh
Constraint Violating Yak Guru

307 Posts

Posted - 2004-01-14 : 05:26:49

Okay, Its difficult to describe the problem in words so here's the DDL:

The table:
CREATE TABLE [dbo].[table_t] (
[t_id] [int] IDENTITY (1, 1) NOT NULL ,
[t_a] [int] NULL ,
[t_q] [decimal](18, 2) NULL ,
[t_d] [smallint] NULL ,
[t_p] [money] NULL ,
[t_dt] [datetime] NULL ,
[t_uid] [uniqueidentifier] NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[table_t] WITH NOCHECK ADD
CONSTRAINT [PK_table_t] PRIMARY KEY CLUSTERED
(
[t_id]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[table_t] ADD
CONSTRAINT [DF_table_t_t_q] DEFAULT (0) FOR [t_q],
CONSTRAINT [DF_table_t_t_d] DEFAULT (1) FOR [t_d],
CONSTRAINT [DF_table_t_t_p] DEFAULT (0) FOR [t_p],
CONSTRAINT [DF_table_t_t_dt] DEFAULT (getdate()) FOR [t_dt],
CONSTRAINT [DF_table_t_t_uid] DEFAULT (newid()) FOR [t_uid]
GO

The Data:
INSERT INTO table_t (t_a,t_q,t_d,t_p)
VALUES (1,1,1,50)
INSERT INTO table_t (t_a,t_q,t_d,t_p)
VALUES (1,2,1,51)
INSERT INTO table_t (t_a,t_q,t_d,t_p)
VALUES (1,3,1,52)
INSERT INTO table_t (t_a,t_q,t_d,t_p)
VALUES (1,4,1,53)

The view:
SELECT t_a, SUM(t_q * t_d) AS t_x, t_p, t_uid
FROM dbo.table_t
GROUP BY t_a, t_p, t_uid
HAVING (SUM(t_q * t_d) > 0)

The problem:.
Try the view with and without the "AND t_a = 1". See what happens if you remove the column t_uid from the table. I'm open to suggestions.

RH

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2004-01-14 : 07:21:25

I'm presuming the "and t_a = 1" is being applied to the having condition.
In this case...the difference in results is just a different sort order.....it still gives the same SET of results.

As stated above....no sort order => no guarantee what way the Query processer actions the data.
An ORDER BY 1,2 will resolve that.

Your statement "I can't use the column for the order by statement, because the summation wouldn't work" doesn't seem to hold up....in what way does it change the result set to make it unusable?

rihardh
Constraint Violating Yak Guru

307 Posts

Posted - 2004-01-14 : 07:34:32

The sort order is the key issue here. Unless the "t_a" condition is added the query responds to the clustered index on the source table. I can see no reason why the second condition changes the sort order (overrides the clustered index) while all others don't!?
Did you try to delete the t_uid column in the table? Without the UID the view works just fine!!!???
I can't add any more columns (like t_id or t_dt) because of the grouping.

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2004-01-14 : 07:47:40

I think you're still missing the point....

Unless you have an ORDER BY, you cannot GUARANTEE the order of the results.
The situation that is working for you without the ORDER BY is totally ACCIDENTAL, and cannot be guaranteed to work (ie return the result IN THE ORDER you want them) 100% of the time.

What may be influencing the actual order of the datasets without the ORDER BY...is the QUERY PLAN...ie how the Query Processor decides to drive the production of the results....inclusion/exclusion of fields in a query can change how this plan looks....in particular if you include a field which is NOT in an index, the QP may decide not to traverse the table using an index, but to read each and every record bypassing the index....and thus bypassing any inherant ORDERING that may result from the index....

Also....going by your query...the GROUP BY is going to be redundant....because the very nature of the t_id field is that it is unique across the database/server....and the whole purpose of a GROUP BY statement is to support aggregation of data....which means working with DOUBLES.

I think you are going to have to a) supply a bigger sample data set (covering other values of t_id) and b) the desired result set....as you see it....

You may need to rethink what you are asking for....

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-14 : 08:11:09

Still
If there is no order by then the resultset can have any order - it can change with the same query just because of the plan the server has decided to use that time or due to changes in the way the data is stored.
You should not expect two queries to have any correspondence in the record order without an order by.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

rihardh
Constraint Violating Yak Guru

307 Posts

Posted - 2004-01-14 : 08:57:09

OK. By looking at the execution plan I guess you're right. Still, can anyone tell me the solution to this "order" problem. Consider that the t_id column has unique values. The grouping HAS be on the t_uid column which can have duplicate records. The order though must in the same way the records were inserted (by t_id or t_dt).

Here's some more data to play with:

INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)
VALUES (1,4,1,40,'5BA4945D-7968-47B7-A050-80D09706F39A')
INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)
VALUES (1,3,1,35,'C44A14A8-11B2-408F-9FD4-E24201FF93A5')
INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)
VALUES (1,5,1,38,'41729C62-4A2D-422A-9865-090113FE515A')
INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)
VALUES (1,2,1,37,'398ACFB2-F4A0-41A7-88A7-CFF508840377')
INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)
VALUES (1,2,-1,40,'5BA4945D-7968-47B7-A050-80D09706F39A')
INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)
VALUES (1,3,-1,35,'C44A14A8-11B2-408F-9FD4-E24201FF93A5')
INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)
VALUES (1,4,-1,38,'41729C62-4A2D-422A-9865-090113FE515A')

Hope this makes sense...

Thanks in advance!

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2004-01-14 : 09:53:40

Show us YOUR expected results....having worked them out on paper....stay away from what SQL will calculate for you for the moment.

I presume you are working with the full set of sample data supplied....ie 11 rows.

ps....my understanding was that a UNIQUEIDENTIFIER ([t_uid] [uniqueidentifier])....would be a) unique to the table...and b) unique to teh datbase/server.....and that it would be system generated (if not supplied)....whereas you are forcing in duplicates...(that may not exist when it comes to real live data...or are you working with real-live-data?)

rihardh
Constraint Violating Yak Guru

307 Posts

Posted - 2004-01-14 : 10:02:12

No, only the rows in the last post (7)!

The expected result:

t_a t_x t_p t_uid
1 2 40 5ba494...
1 0 35 c44a1...
1 1 38 41729c...
1 2 37 398acf...

Thanks for the time...

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2004-01-14 : 10:29:56

row #2 won't appear in my result because of the "HAVING (SUM(t_q * t_d) > 0)" condition...

I get....
t_a t_x t_p t_uid
----------- ---------------------------------------- --------------------- ------------------------------------
1 2.00 37.0000 398ACFB2-F4A0-41A7-88A7-CFF508840377
1 1.00 38.0000 41729C62-4A2D-422A-9865-090113FE515A
1 2.00 40.0000 5BA4945D-7968-47B7-A050-80D09706F39A

and re the expected ORDER of your results...why this order?...what (logical reason) makes #2 subservient (come after) #1, and #3 after #2, etc....

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-14 : 12:07:14

>> Still, can anyone tell me the solution to this "order" problem.
Include an order by if you want the results ordered - it is the only possible way of guaranteeing it.
If necessary use derived tables if you get into trouble with the aggregates and columns.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

rihardh
Constraint Violating Yak Guru

307 Posts

Posted - 2004-01-15 : 02:19:52

Andrew, the order of records is CRUCIAL! Otherwise we wouldn't have this thread of posts. Imagine, you have to read records in the FIFO way (the first record which went in goes out first) so far so good, but there is still the need of grouping and summing some data and there's where the fun ends because I can't afford an order by.

But thanks for your time anyway (everybody).

mr_mist
Grunnio

1870 Posts

Posted - 2004-01-15 : 04:12:33

rihardh, Andrew was asking for more information so that he could answer your question. He needs to know what condition he can apply that will make one record appear before another record if the other ordering conditions are equal.

Saying "first in" has no meaning in a relational database sense, unless you have a date column (you could order by that) or some kind of incremental unique identifier (you could order by that.)

If you "can't afford an order by" then you wont get a reliable ordered output. Simple as.

-------
Moo. :)

rihardh
Constraint Violating Yak Guru

307 Posts

Posted - 2004-01-15 : 04:25:04

If you look at the DDL of the table I posted, you'll see that there is an "id" as well as an "datetime" column.
Why I can't afford an order by is obvious at looking at my previous posts (the one with the expected result).
Your statement that a "first in doesn't make sense in a relational DB" makes no sense to me. If this was so obsolete there would be no need for a clustered index (which purpose is to physicaly sort data) on a table right?

And another thing, I would really appreciate it, if anyone could tell me, how a "quality" GROUP BY can be done with including a column with UNIQUE values for sorting purpose!?

Thanks in advance!

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-15 : 05:06:22

>> clustered index (which purpose is to physicaly sort data) on a table right?

No - you are assuming that the clustered index has something to do with the presentation of results which is not the case. It is a by-product of having a clustered index that the optimiser may choose a plan in which the clustered index affects the order but this cannot be relied upon as the plan may change due to the data distribution or setup of the server (adding processors may change the order too).
The clustered index is there to aid maintenance and processing speed. You should not be surprised that without an order by two queries produce different ordered results - or in fact if the same query run twice produces different ordered results.
It's the basis of a relational database that the server can process a command in any way it sees fit as long as the results are correct - and an index does not change the expectd results for a query.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

rihardh
Constraint Violating Yak Guru

307 Posts

Posted - 2004-01-15 : 07:12:46

OK,OK,OK,... enough of SQL server theoretics.

My question last time: Is there a solution to MY problem (as described) or not?

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources