Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Poor performance in query execution

Author  Topic 

sameerv
Starting Member

29 Posts

Posted - 2002-10-20 : 07:26:45
Hi guys,

I'm facing a problem in the performance of a query.

I have 3 tables : A, B and C
Tables A and B have the following fields : BranchCode and BranchName
Branch Code is the primary key.
I have to retrieve all fields from table A wherein it contains a record with a duplicate BranchName i.e. a BranchName existing in table B.

e.g.

Table A :
BranchCode ='bc1' Branch Name ='TestBranch1'

Table B:
BranchCode ='bc1' Branch Name ='TestBranch1'

Result : The query should NOT return this record since even the BranchCodes are same which means the record in B can have its other fields updated from A.

Table A :
BranchCode ='bc1' Branch Name ='TestBranch1'

Table B:
BranchCode ='bc2' Branch Name ='TestBranch1'

Result : The query should return this record since the BranchCodes are not the same. Attempting to update table B (in the case that 'bc1' already exists in B) or inserting into table B (in the case that 'bc1' does not exist in B) will result in a duplicate Branch Name.

These records retrieved from A have to be inserted into table C which has the same structure as A.

Also, after inserting into C, these same records have to be deleted from A.
The queries to accomplish these are :

Insert Query:
Insert into C
SELECT A.*
FROM A , B
WHERE A.BranchName like B. BranchName and A.BranchCode not like B.BranchCode

Delete Query:
Delete A
from B
where
A.BranchName like B. BranchName and A.BranchCode not like B.BranchCode

Before running these queries table A has 3000 records and table B has 60000 records.
These queries take approximately 9 minutes EACH to execute.

Is there anyway I can optimize these queries to run more efficiently?

Also, is there any way I can mark the records returned by the first Select query so that I can simply delete them in the next query rather than to again filter them through the WHERE clause which takes time to execute.


robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-10-20 : 08:13:29
If you put an index on Branch Name (for both A and B tables) that might improve performance. It might also improve performance to use equal (=) and not equals (<>) comparisons instead of LIKE and NOT LIKE, although they are equivalent for this query, the equality operators will probably optimize better.

You should also be aware that you are essentially creating a cross join between these 2 tables (3,000 rows X 60,000 rows = 180 million rows) for your INSERT and DELETE operations. 9 minutes to process 180 million rows is pretty good performance if you ask me. Granted, you may not actually get that many, but since the optimizer is almost certainly scanning both tables it will generate work tables and intermediate results very close to this amount.

I'm not clear about something:
quote:
Table A :
BranchCode ='bc1' Branch Name ='TestBranch1'

Table B:
BranchCode ='bc1' Branch Name ='TestBranch1'

Result : The query should NOT return this record since even the BranchCodes are same which means the record in B can have its other fields updated from A.

Table A :
BranchCode ='bc1' Branch Name ='TestBranch1'

Table B:
BranchCode ='bc2' Branch Name ='TestBranch1'

Result : The query should return this record since the BranchCodes are not the same. Attempting to update table B (in the case that 'bc1' already exists in B) or inserting into table B (in the case that 'bc1' does not exist in B) will result in a duplicate Branch Name.
Return this record from which table? A or B? What if table B contains both bc2 and bc1 rows? It sounds like you want to return the row from Table A, but that row is invalid for one comparison and valid for another, and therefore can't be returned from A. This is exactly what I meant about the cross join, you can easily return the same row many multiple times without meaning to.

Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-10-20 : 10:05:27
Yes, it's the LIKE causing the problem: having a value from a column as the right operand means that the join has to be done as a nested loop and filter. Even without any indexing, using = and <> will mean that a hash join strategy can be used, which will be vastly quicker than a nested loop of index/table scans.


Go to Top of Page

sameerv
Starting Member

29 Posts

Posted - 2002-10-21 : 06:15:21
Hi Rob,
quote:
Return this record from which table? A or B? What if table B contains both bc2 and bc1 rows? It sounds like you want to return the row from Table A, but that row is invalid for one comparison and valid for another, and therefore can't be returned from A. This is exactly what I meant about the cross join, you can easily return the same row many multiple times without meaning to.



Table A is a table into which I bcp data from a text file.I have to then validate the records in A before inserting,updating into table B or rejecting and inserting into a log table i.e. table C in case the validation criteria fail.
It is the failed records from A that the query must return.
There are various validations that need to be checked of which checking for duplicate names is only 1 of them.

If bc1 exists in B and the Branch Name in A against the same code (bc1) differs from the value of Branch Name in B, then the Branch Name in A needs to be updated in B. But before this a check has to be made to verify whether the of the Branch Name from A does not exist in the Branch Name field of B in any of the records. If it exists this record from A will be inserted into the log Table C.
This same check is applied to each record in A independant of any other records.
So, Bc1 and Bc2 have no bearing on eachother, they are treated as separate records and processed as such.


Also, about the second part of my question.
To recap.. After inserting the failed records from A into C I have to immediately delete these records from A.

Currently, to delete the records from A I have to first identify the records by running the same WHERE clause which was executed in the previous step and which takes considerable time overhead.

Since this process has to be followed for all the other validations(approx.7) in the stored proc, I am actually running 14 retrieve Queries instead of 7.

Is there no way I can somehow mark these records during the first query (during insert into C) and then delete these from A directly.




Go to Top of Page

simondeutsch
Aged Yak Warrior

547 Posts

Posted - 2002-10-22 : 22:11:48
Why not use a temp table to store the A records that need to get deleted? You will avoid running WHERE clauses against such large sets. You can then do DELETEs and all other validation by using an INNER JOIN on your temp table. I'm not sure what you mean by saying you run 14 queries- in a scenario of filtering the same data 14 times a temp table will be useful. If you only want to 'mark' the records to delete them in the next step and that's it, this is not a solution.

Sarah Berger MCSD
Go to Top of Page
   

- Advertisement -