Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 IF...ELSE EXISTS in SELECT

Author  Topic 

MrQuizzles
Starting Member

20 Posts

Posted - 2010-01-13 : 16:52:39
Ok, another question:

I have two tables with the following structures

DataCatalog
-ID
-name

userDataAccess
-swkyasgn_id
-dataCatalogItem


There is a one-to-many relation from DataCatalog and userDataAccess. Each item in DataCatalog can have many items in userDataAccess.

What I need to do is, given a specific swkyasgn_id, list all items in DataCatalog and whether or not an entry exists for each item in userDataAccess

Example:


DataCatalog
ID name
-----------
1 item1
2 item2
3 item3
4 item4
5 item5

userDataAccess
swkyasgn_id dataCatalogItem
------------------------------
1 3
1 4
1 5
2 1
2 3
2 4


Desired Result Set given a swkyasgn_id of 1
name exists
---------------
item1 0
item2 0
item3 1
item4 1
item5 1

Desired Result Set given a swkyasgn_id of 2
name exists
---------------
item1 1
item2 0
item3 1
item4 1
item5 0


The query I'm trying right now is

SELECT name,
IF EXISTS(SELECT * FROM userDataAccess uda WHERE dc.ID = uda.dataCatalogItem AND uda.swkyasgn_id = 1)
(SELECT 1 AS accessible)
ELSE
(SELECT 0 AS accessible)
FROM DataCatalog dc


But that doesn't work. It doesn't like the IF statement, which isn't surprising considering I've rarely ever used them before. Can I plop them in the middle of SELECT statements like that? I don't know.

vijayisonly
Master Smack Fu Yak Hacker

1836 Posts

Posted - 2010-01-13 : 17:29:32
Sample data
declare @DataCatalog table
(ID int, [name] varchar(10))
insert @DataCatalog
select 1 , 'item1'
union all select 2, 'item2'
union all select 3, 'item3'
union all select 4 , 'item4'
union all select 5, 'item5'

declare @userDataAccess table
(swkyasgn_id int, dataCatalogItem int)
insert @userDataAccess
select 1, 3
union all select 1, 4
union all select 1 , 5
union all select 2 , 1
union all select 2 , 3
union all select 2 , 4


Query
select a.[name], case when b.dataCatalogItem is null then 0 else 1 end as [exists]
from @DataCatalog a
left join @userDataAccess b on a.ID = b.dataCatalogItem
and b.swkyasgn_id = 1
order by a.[name]


Result for id = 1
name       exists
---------- -----------
item1 0
item2 0
item3 1
item4 1
item5 1

Result for id = 2
name       exists
---------- -----------
item1 1
item2 0
item3 1
item4 1
item5 0
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-01-14 : 00:15:20
just to make it clear. you need to replace b.swkyasgn_id = 1 with b.swkyasgn_id = @id in previous query and you can pass values to @id like 1,2 etc to get result for each.
Go to Top of Page

MrQuizzles
Starting Member

20 Posts

Posted - 2010-01-14 : 12:10:33
Hmmm, it just occurred to me, though, that this is an On^2 operation. In the actual database, each table has about 1000 records, meaning that such a query will cause 1,000,000 checks.

I think I'm going to have to find a different way of getting what I want, meaning a bit of clever C#. I'm pretty sure I could get this done in linear time otherwise.
Go to Top of Page

vijayisonly
Master Smack Fu Yak Hacker

1836 Posts

Posted - 2010-01-14 : 12:20:59
Did you atleast try the query in your DB before coming to a conclusion?
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2010-01-14 : 14:21:04
> Horse > Water <> Drink

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2010-01-14 : 15:30:19
quote:
Originally posted by MrQuizzles

Hmmm, it just occurred to me, though, that this is an On^2 operation. In the actual database, each table has about 1000 records, meaning that such a query will cause 1,000,000 checks.

I think I'm going to have to find a different way of getting what I want, meaning a bit of clever C#. I'm pretty sure I could get this done in linear time otherwise.



If you are inferring that it is O(2) by looking at the query plan for Vijay's query and seeing the table scans and nested loops that should not stop you from trying it out. Vijay was demonstrating an example with table variables that are small in size and have no indexes. If you look at the query plan when running against the actual data, you might find it to be very different.
Go to Top of Page

MrQuizzles
Starting Member

20 Posts

Posted - 2010-01-15 : 14:52:36
Well, after further analysis, I'll admit I'm wrong. The database engine should be capable of performing the operation at something much closer to O(nlogn). Of course, I really don't know the inner workings of the database engine, so if anyone can tell me the actual cost of the operation, I'd be interested.


My plan in C# was to perform two queries with some intermediate processing. What I need to do is create a bunch of checkboxes that are either checked or not dependent on the database (yes, this will create quite a lot of checkboxes, but this is what I'm being asked for). I was going to first do a (SELECT name, ID FROM DataCatalog ORDER BY ID ASC) and generate the list of checkboxes with their associated names from that, setting the ID of the checkbox to the associated ID from the query. Then I would perform a second query (SELECT dataCatalogItem FROM userDataAccess WHERE swkyasgn_id = *some number* ORDER BY userDataAccess ASC) and iterate through the checkboxes, checking them if their ID shows up in the second query.

I figure the first query is linear, the creation of the checkboxes is linear, the second query is linear, and the checking of the boxes, due to its sequential nature (both the checkboxes and the result set are in the same sequence), is also linear, so the whole deal never breaks linear time.


So which method, then, would be quicker?
Theoretical efficiency points to the second method, but is it this way in practicality?



Also, from a purely academic standpoint, would joining the DataCatalog table to a derived table be quicker than the given query.

Is this:
select a.[name], case when b.dataCatalogItem is null then 0 else 1 end as [exists]
from DataCatalog a
left join userDataAccess b on a.ID = b.dataCatalogItem and b.swkyasgn_id = *some number*


Faster, slower or the same as this:

select a.[name], case when b.dataCatalogItem is null then 0 else 1 end as [exists]
from DataCatalog a
left join (SELECT * FROM userDataAccess WHERE swkyasgn_id = *some number*) AS b on a.ID = b.dataCatalogItem


and why?


I'm a computer science undergrad who's working as a web development intern, so I'm genuinely interested in knowing these things and I certainly don't mind having my mistakes pointed out to me as long as it's explained why they're wrong.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-01-16 : 05:17:13
I haven't read the thread, so I'm just looking at your last question comparing the two queries.

You can get SQL Server to show you the Query Plan for each. You might find that they are identical. SQL Server can also tell you the logical I/O involved in each, which would equate to "cost given an average world" ("Logical" I/O being different to "Physical", because in the real world all the data you need might be cached in memory anyway).

-- Comment in the SHOWPLAN to see the Query Plan, OR the STATISTICS to see the logical stats
-- SET SHOWPLAN_TEXT ON
GO
-- SET STATISTICS IO ON; SET STATISTICS TIME ON

*** put query here ***

SET STATISTICS IO OFF; SET STATISTICS TIME OFF
GO
SET SHOWPLAN_TEXT OFF
GO

If they are not the same I would expect (1) to be faster.

You have a SELECT * in your sub-select in (2) - I think the optimiser will fix that to the columns you actually need, but it might not (and you could obviously fix that yourself)

If (2) you are forcing how you want the query planner to work. A newer version of SQL, or patch, could come up with a better query plan to (1), or the query plan for (1) might change over time - as the number of rows, or the distribution of the data, or whether you add/drop indexes, impacts things.

Personally I would always use (1) and if it was slow 99.99% of the time I would "influence" (well "tune" would be a better word) it in other ways. Usually by adding indexes.
Go to Top of Page

MrQuizzles
Starting Member

20 Posts

Posted - 2010-01-19 : 13:56:40
Oooh, I did not see that I could get it to tell me the query plan. That will be a useful tool in the future. I'm fairly new to Transact-SQL, having been taught with MySQL. Both are ANSI SQL compliant, so they're largely the same. I do very much so enjoy using stored procedures for queries that include user input, though. I'll have to consider Postgres for future open source projects.


The queries don't end up being identical. It seems that the creation of the subquery table causes it to compute a scalar for that table, which is causing that query to cost ever so slightly more.
Go to Top of Page
   

- Advertisement -