Problem with Distinct when the Datatype is Text

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Problem with Distinct when the Datatype is Text

Author

Topic

coolest
Starting Member

6 Posts

Posted - 2007-09-27 : 03:54:49

Please help me in this case,

We are having one base table and audit table, The Audit table table structure is it will have all the key columns from the base table and the audit timestamp and audit action. For ex


Base table

Key1 Key2 Nonkey1 Nonkey2

AA   BB   NK      NK

Audit Table

Key1 Key2 Audit_Time            Audit_Action


AA   BB   2007-09-27 00:00:00   Insert
AA   BB   2007-09-27 09:00:00   Update

The idea is whenever an update or insert is happening in Basetable one row with the corresponding Action will be inserted into the Audit table. This is done using a trigger.

And when we select the rows for the transaction we want to select from the base table, if we do a join between the base table and audit table , we will get two rows for the given example.

So for avoiding this i worte a query like

Select distinct A.* From BaseTable A, AuditTable B
Where A.Keys = B.keys

But i am having a problem when my base table is having Text Datatype.

Again I rewrite my query as

Select A.* from Basetable A
Where Exists ( Select 'X' From AuditTable B where A.keys = B.Keys)

But I feel , this will result in performance issues as it is like using a Corelated SUbquery. And Our Basetable and Audit table will be always huge.

IS there any alternative ways for acheving this??

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-09-27 : 04:52:08

Well, EXISTS is very smart about that.
It "stops" looking for more records when first match is found.
And if you look at the execution plan, it is very similar to your first attempt.

E 12°55'05.25"
N 56°04'39.16"

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2007-09-27 : 04:57:43

how are you capturing the Text column in your trigger? I didn't think that the inserted pseudo table supported text or blobs. anyway, that might be beside the point. do what peter says :)

-ec

coolest
Starting Member

6 Posts

Posted - 2007-09-27 : 06:03:28

Thanks for the replies.

We are not capturing Text in the Audit tables, we are capturing only the Key columns.

Is there any other better way to achieve the Same Functionality?

coolest
Starting Member

6 Posts

Posted - 2007-09-27 : 06:07:18

Thanks for the replies.

We are not capturing Text in the Audit tables, we are capturing only the Key columns.

Is there any other better way to achieve the Same Functionality?

Kristen
Test

22859 Posts

Posted - 2007-09-27 : 06:40:24

Won't there ALWAYS be a row in the Audit table for every row in the Base table (i.e. the one with the "Insert" action)

Kristen

coolest
Starting Member

6 Posts

Posted - 2007-09-27 : 06:46:31

Yes definetly there will be a row for insert in the Audit table, But for update also we will be having a row withe the Audit Action as Update. So if two updates happen on the row , there will be three rows in the Audit table and one row in the base table.

Kristen
Test

22859 Posts

Posted - 2007-09-27 : 08:35:09

In that case surely

Where Exists ( Select 'X' From AuditTable B where A.keys = B.Keys)

will always be TRUE, and is therefore redundant, or have I missed something?

Kristen

coolest
Starting Member

6 Posts

Posted - 2007-09-28 : 00:02:38

Actually , I have missed something. We are using the Audit table for reducing the load, It is actually sending the data to a Datawarehouse on monthly basis.Let me explain my scenario.

We will load the Transactional data by monthly wise, as we dont have the Timestamp in the Base table we are forced to use the Audit table to identify when a record is modified.

So let me assume , we are sending the data on Septempber 1st,2007, So i will write the query like,

Select A.* From Base table A
where exists
(Select 'X' From AuditTable B where A.keys = B.Keys and
B.Audit_Time > '2007-08-01 00:00:00') So that it will send the August month data only.

And when we run it in October, the query will be like

Select A.* From Base table A
where exists
(Select 'X' From AuditTable B where A.keys = B.Keys and
B.Audit_Time > '2007-09-01 00:00:00'), So it will fetch all the records modified and inserted in the Base table After September 1.

Now , i hope the requirement is clear.

So , Now when a record is inserted in the base table after september 1, it will be having a entry in the Audit table , and whenever it is modified it will have the corresponding entry in the Audit table. So if i assume , if it is modified 2 times, we will 2 reords in the Audit table and in total we will have 3 records in the Audit table.

So when we do a normal join with the base table 3 records will be selected as the Key columns will be always same.

For avoiding this duplicates, i want a better performant query.

Thanks
Anand

coolest
Starting Member

6 Posts

Posted - 2007-09-28 : 00:14:13

Actually , i am having something else in mind for better performance, but i dont know how to implement it.

Let me assume , i am having 10000 rows in my base table and only 10 rows inserted in the month of September.

The Query is

Select A.* from Basetable A
where exists (Select 'X' from Audit table B where A.keys = B.Keys and B.AuditTime > '2007-09-01 00:00:00')

So for identifying this 10 rows, my query will loop 10000 times as it is a Co-related Subquery based on Base Table.

What i am having in mind is , somehow i have to use the AuditTable in the Outer Query and fetch corresponding Non keyfields in the Basetable.So that now the query will loop only for 10 times.

As i explained earlier , I will be having all the Keycolumns in the Audit table and I need to select the distinct from the audit table and select the corresponding Nonkey fields from the Base Table.

Guide me in implementing this logic.

Thanks
Anand

Kristen
Test

22859 Posts

Posted - 2007-09-28 : 03:16:11

"What i am having in mind is , somehow i have to use the AuditTable in the Outer Query and fetch corresponding Non keyfields in the Basetable. So that now the query will loop only for 10 times."

I don't see why, the Query Optimiser is smart enough to correlate the data in the inner EXISTS with the outer query.

Make sure you have an index on AuditTable.AuditTime & keys (possible the other way round, try it an see which performs better).

Note that the perceived wisdom is to use the convention

EXISTS ( SELECT * FROM ...

this is written as giving the optimiser the choice of which column/index to base the Exists on, and although I think that SELECT 'X' and SELECT 1 etc. will work, because of the frequency of SELECT * usage in this construction there might be some shortcuts that the optimiser takes when it sees it

Also, you should leave the hyphens out of string dates like '2007-09-01 00:00:00' - there are configurations, would you believe!, where that won't work whereas '20070901 00:00:00', or even just '20070901', will work.

Kristen

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources