| Author |
Topic |
|
rihardh
Constraint Violating Yak Guru
307 Posts |
Posted - 2004-01-13 : 09:51:32
|
| Has anybody experienced that a SUM column in a simple SELECT statement with a GROUP BY in it, drops (ignores) the index (order) of the records?I appologize, If there's allready an answer for this issue...RH |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2004-01-13 : 09:52:56
|
| Not sure what you mean.The query isn't using an index that you expect it to?You will have to give more info - the query and table structure.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
rihardh
Constraint Violating Yak Guru
307 Posts |
Posted - 2004-01-13 : 09:56:15
|
| It's very simple, take any table and make a clustered index on a column. Then take this same table and use it in a agregate query without explixitly selecting the column with the index or stating an ORDER BY. The order of the selected records changes. |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2004-01-13 : 10:11:03
|
| If there is no order by then the resultset can have any order - it can change with the same query just because of the plan the server has decided to use that time or due to changes in the way the data is stored.You should not expect two queries to have any correspondence in the record order without an order by.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
rihardh
Constraint Violating Yak Guru
307 Posts |
Posted - 2004-01-13 : 16:56:43
|
| Hmmm... I can't use the column for the order by statement, because the summation wouldn't work. I'll post some DDL, if you'll find the time, be my guest. Thanks for the reply.RH |
 |
|
|
elwoos
Master Smack Fu Yak Hacker
2052 Posts |
Posted - 2004-01-14 : 03:32:53
|
| If you're just adding a column why would you need to know the order of it?If you want to do the sum1 +2000 +24you don't rearrange it before you add up because there would be no benefitsteveSteve no function beer well without |
 |
|
|
rihardh
Constraint Violating Yak Guru
307 Posts |
Posted - 2004-01-14 : 05:26:49
|
| Okay, Its difficult to describe the problem in words so here's the DDL:The table:CREATE TABLE [dbo].[table_t] ( [t_id] [int] IDENTITY (1, 1) NOT NULL , [t_a] [int] NULL , [t_q] [decimal](18, 2) NULL , [t_d] [smallint] NULL , [t_p] [money] NULL , [t_dt] [datetime] NULL , [t_uid] [uniqueidentifier] NULL ) ON [PRIMARY]GOALTER TABLE [dbo].[table_t] WITH NOCHECK ADD CONSTRAINT [PK_table_t] PRIMARY KEY CLUSTERED ( [t_id] ) ON [PRIMARY] GOALTER TABLE [dbo].[table_t] ADD CONSTRAINT [DF_table_t_t_q] DEFAULT (0) FOR [t_q], CONSTRAINT [DF_table_t_t_d] DEFAULT (1) FOR [t_d], CONSTRAINT [DF_table_t_t_p] DEFAULT (0) FOR [t_p], CONSTRAINT [DF_table_t_t_dt] DEFAULT (getdate()) FOR [t_dt], CONSTRAINT [DF_table_t_t_uid] DEFAULT (newid()) FOR [t_uid]GOThe Data:INSERT INTO table_t (t_a,t_q,t_d,t_p)VALUES (1,1,1,50)INSERT INTO table_t (t_a,t_q,t_d,t_p)VALUES (1,2,1,51)INSERT INTO table_t (t_a,t_q,t_d,t_p)VALUES (1,3,1,52)INSERT INTO table_t (t_a,t_q,t_d,t_p)VALUES (1,4,1,53)The view:SELECT t_a, SUM(t_q * t_d) AS t_x, t_p, t_uidFROM dbo.table_tGROUP BY t_a, t_p, t_uidHAVING (SUM(t_q * t_d) > 0)The problem:.Try the view with and without the "AND t_a = 1". See what happens if you remove the column t_uid from the table. I'm open to suggestions.RH |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-01-14 : 07:21:25
|
| I'm presuming the "and t_a = 1" is being applied to the having condition.In this case...the difference in results is just a different sort order.....it still gives the same SET of results.As stated above....no sort order => no guarantee what way the Query processer actions the data.An ORDER BY 1,2 will resolve that.Your statement "I can't use the column for the order by statement, because the summation wouldn't work" doesn't seem to hold up....in what way does it change the result set to make it unusable? |
 |
|
|
rihardh
Constraint Violating Yak Guru
307 Posts |
Posted - 2004-01-14 : 07:34:32
|
| The sort order is the key issue here. Unless the "t_a" condition is added the query responds to the clustered index on the source table. I can see no reason why the second condition changes the sort order (overrides the clustered index) while all others don't!?Did you try to delete the t_uid column in the table? Without the UID the view works just fine!!!???I can't add any more columns (like t_id or t_dt) because of the grouping. |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-01-14 : 07:47:40
|
| I think you're still missing the point....Unless you have an ORDER BY, you cannot GUARANTEE the order of the results.The situation that is working for you without the ORDER BY is totally ACCIDENTAL, and cannot be guaranteed to work (ie return the result IN THE ORDER you want them) 100% of the time.What may be influencing the actual order of the datasets without the ORDER BY...is the QUERY PLAN...ie how the Query Processor decides to drive the production of the results....inclusion/exclusion of fields in a query can change how this plan looks....in particular if you include a field which is NOT in an index, the QP may decide not to traverse the table using an index, but to read each and every record bypassing the index....and thus bypassing any inherant ORDERING that may result from the index....Also....going by your query...the GROUP BY is going to be redundant....because the very nature of the t_id field is that it is unique across the database/server....and the whole purpose of a GROUP BY statement is to support aggregation of data....which means working with DOUBLES.I think you are going to have to a) supply a bigger sample data set (covering other values of t_id) and b) the desired result set....as you see it....You may need to rethink what you are asking for.... |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2004-01-14 : 08:11:09
|
| StillIf there is no order by then the resultset can have any order - it can change with the same query just because of the plan the server has decided to use that time or due to changes in the way the data is stored.You should not expect two queries to have any correspondence in the record order without an order by.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
rihardh
Constraint Violating Yak Guru
307 Posts |
Posted - 2004-01-14 : 08:57:09
|
| OK. By looking at the execution plan I guess you're right. Still, can anyone tell me the solution to this "order" problem. Consider that the t_id column has unique values. The grouping HAS be on the t_uid column which can have duplicate records. The order though must in the same way the records were inserted (by t_id or t_dt). Here's some more data to play with:INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)VALUES (1,4,1,40,'5BA4945D-7968-47B7-A050-80D09706F39A')INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)VALUES (1,3,1,35,'C44A14A8-11B2-408F-9FD4-E24201FF93A5')INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)VALUES (1,5,1,38,'41729C62-4A2D-422A-9865-090113FE515A')INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)VALUES (1,2,1,37,'398ACFB2-F4A0-41A7-88A7-CFF508840377')INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)VALUES (1,2,-1,40,'5BA4945D-7968-47B7-A050-80D09706F39A')INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)VALUES (1,3,-1,35,'C44A14A8-11B2-408F-9FD4-E24201FF93A5')INSERT INTO table_t (t_a,t_q,t_d,t_p,t_uid)VALUES (1,4,-1,38,'41729C62-4A2D-422A-9865-090113FE515A')Hope this makes sense...Thanks in advance! |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-01-14 : 09:53:40
|
| Show us YOUR expected results....having worked them out on paper....stay away from what SQL will calculate for you for the moment.I presume you are working with the full set of sample data supplied....ie 11 rows.ps....my understanding was that a UNIQUEIDENTIFIER ([t_uid] [uniqueidentifier])....would be a) unique to the table...and b) unique to teh datbase/server.....and that it would be system generated (if not supplied)....whereas you are forcing in duplicates...(that may not exist when it comes to real live data...or are you working with real-live-data?) |
 |
|
|
rihardh
Constraint Violating Yak Guru
307 Posts |
Posted - 2004-01-14 : 10:02:12
|
| No, only the rows in the last post (7)!The expected result:t_a t_x t_p t_uid 1 2 40 5ba494... 1 0 35 c44a1... 1 1 38 41729c... 1 2 37 398acf...Thanks for the time... |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-01-14 : 10:29:56
|
| row #2 won't appear in my result because of the "HAVING (SUM(t_q * t_d) > 0)" condition...I get....t_a t_x t_p t_uid ----------- ---------------------------------------- --------------------- ------------------------------------ 1 2.00 37.0000 398ACFB2-F4A0-41A7-88A7-CFF5088403771 1.00 38.0000 41729C62-4A2D-422A-9865-090113FE515A1 2.00 40.0000 5BA4945D-7968-47B7-A050-80D09706F39Aand re the expected ORDER of your results...why this order?...what (logical reason) makes #2 subservient (come after) #1, and #3 after #2, etc.... |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2004-01-14 : 12:07:14
|
| >> Still, can anyone tell me the solution to this "order" problem.Include an order by if you want the results ordered - it is the only possible way of guaranteeing it.If necessary use derived tables if you get into trouble with the aggregates and columns.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
rihardh
Constraint Violating Yak Guru
307 Posts |
Posted - 2004-01-15 : 02:19:52
|
| Andrew, the order of records is CRUCIAL! Otherwise we wouldn't have this thread of posts. Imagine, you have to read records in the FIFO way (the first record which went in goes out first) so far so good, but there is still the need of grouping and summing some data and there's where the fun ends because I can't afford an order by.But thanks for your time anyway (everybody). |
 |
|
|
mr_mist
Grunnio
1870 Posts |
Posted - 2004-01-15 : 04:12:33
|
| rihardh, Andrew was asking for more information so that he could answer your question. He needs to know what condition he can apply that will make one record appear before another record if the other ordering conditions are equal.Saying "first in" has no meaning in a relational database sense, unless you have a date column (you could order by that) or some kind of incremental unique identifier (you could order by that.) If you "can't afford an order by" then you wont get a reliable ordered output. Simple as.-------Moo. :) |
 |
|
|
rihardh
Constraint Violating Yak Guru
307 Posts |
Posted - 2004-01-15 : 04:25:04
|
| If you look at the DDL of the table I posted, you'll see that there is an "id" as well as an "datetime" column.Why I can't afford an order by is obvious at looking at my previous posts (the one with the expected result).Your statement that a "first in doesn't make sense in a relational DB" makes no sense to me. If this was so obsolete there would be no need for a clustered index (which purpose is to physicaly sort data) on a table right?And another thing, I would really appreciate it, if anyone could tell me, how a "quality" GROUP BY can be done with including a column with UNIQUE values for sorting purpose!?Thanks in advance! |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2004-01-15 : 05:06:22
|
| >> clustered index (which purpose is to physicaly sort data) on a table right?No - you are assuming that the clustered index has something to do with the presentation of results which is not the case. It is a by-product of having a clustered index that the optimiser may choose a plan in which the clustered index affects the order but this cannot be relied upon as the plan may change due to the data distribution or setup of the server (adding processors may change the order too).The clustered index is there to aid maintenance and processing speed. You should not be surprised that without an order by two queries produce different ordered results - or in fact if the same query run twice produces different ordered results.It's the basis of a relational database that the server can process a command in any way it sees fit as long as the results are correct - and an index does not change the expectd results for a query.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
rihardh
Constraint Violating Yak Guru
307 Posts |
Posted - 2004-01-15 : 07:12:46
|
| OK,OK,OK,... enough of SQL server theoretics. My question last time: Is there a solution to MY problem (as described) or not? |
 |
|
|
Next Page
|