Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Very Strange join behavior

Author  Topic 

bfrank1972
Starting Member

11 Posts

Posted - 2008-05-01 : 13:10:46
Ok maybe someone smarter than me (not difficult) can help me out :)

Two queries:

#1:

select
a.load_id,
b.attribute_name,
a.attribute_loc,
b.attribute_loc
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (select distinct top 53 load_id from PCI_Load')


#2:

select
a.load_id,
b.attribute_name,
a.attribute_loc,
b.attribute_loc
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (select distinct top 54 load_id from PCI_Load')


#1 Produces a correct left outer join, any values in PCI_Template_NR_Map that are not in PCI_Master show null. This works for any number of load_id values in the subselect up to 53.

#2 Is the exact same query, except I am no longer limiting it to 53, when i get to 54 (or if I take away the top altogether) it returns rows as if it were a normal inner join, instead of a left outer join (i.e. it only shows rows that are a match between PCI_Master and PCI_Template_NR_Map).

Can anyone explain to me what is happening here, and how to get around this issue? I need to be able to filter this for as many load_ids as I need (usually aobut 200). Thanks in advance,

Brad

RyanRandall
Master Smack Fu Yak Hacker

1074 Posts

Posted - 2008-05-01 : 13:15:03
Is there a null load_id in PCI_Load? If so, you will need to handle it...

Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-05-01 : 13:17:42
You cant guarantee the order of loadid's that are returned from subquery unless you specify an order by clause inside subquery. I guess that may be a reason for query results to vary. Can you try giving an order by clause inside on both cases. ALso are you getting almost the same loadid records in both the cases?
Go to Top of Page

bfrank1972
Starting Member

11 Posts

Posted - 2008-05-01 : 13:19:52
Nope, just integers. The really really strange thing is I can completely remove the "and a.load_id in (select..." and just have the simple join clause of a.attribute_name=b.attribute_name and it still acts like an inner join instead of a left outer join?!?! The only thing I can see is that the two attribute_name fields are nvarchar(255), it's not the best thing to join two tables on (as opposed to something simpler like an integer) but I figure it should still work?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-05-01 : 13:29:26
quote:
Originally posted by bfrank1972

Nope, just integers. The really really strange thing is I can completely remove the "and a.load_id in (select..." and just have the simple join clause of a.attribute_name=b.attribute_name and it still acts like an inner join instead of a left outer join?!?! The only thing I can see is that the two attribute_name fields are nvarchar(255), it's not the best thing to join two tables on (as opposed to something simpler like an integer) but I figure it should still work?


Have you tried using some kind of order in subquery?
Go to Top of Page

bfrank1972
Starting Member

11 Posts

Posted - 2008-05-01 : 13:32:28
Wow that was a good suggestion on the order by - not that it completely fixes things :)

1) I originally had the sub query with no top, and the left join didn't work right

2) I noticed that just using one value worked, so i started limiting that subquery with "top", trial and error and noticed it stopped working at 54

3) I just tried an order by, and reversing it to desc changed the ceiling completely - that tells me it's not the number of values but a specific value??

4) Via trial and error, i took the last value in the subquery right at the number it would stop working, and used that number (3667) by itself in the "in" clause, and it failed by itself (and previous numbers worked)

So this tells me that at lease one load_id value causes the left outer join to fail - not throw an exception but act like an inner join.

I *still* don't understand why this is happening - that key value 3667 exists in PCI_Load and PCI_Master, but why is it causing the left outer join to work as an inner join? Why do the other numbers work?

Any wisdom would be greatly appreciated.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-05-01 : 13:39:07
quote:
Originally posted by bfrank1972

Wow that was a good suggestion on the order by - not that it completely fixes things :)

1) I originally had the sub query with no top, and the left join didn't work right

2) I noticed that just using one value worked, so i started limiting that subquery with "top", trial and error and noticed it stopped working at 54

3) I just tried an order by, and reversing it to desc changed the ceiling completely - that tells me it's not the number of values but a specific value??

4) Via trial and error, i took the last value in the subquery right at the number it would stop working, and used that number (3667) by itself in the "in" clause, and it failed by itself (and previous numbers worked)

So this tells me that at lease one load_id value causes the left outer join to fail - not throw an exception but act like an inner join.

I *still* don't understand why this is happening - that key value 3667 exists in PCI_Load and PCI_Master, but why is it causing the left outer join to work as an inner join? Why do the other numbers work?

Any wisdom would be greatly appreciated.


what is happening to record with key value 3667?
Does the attribute_name value for this record in PCI_Master table is one which exists in PCI_Template_NR_Map table?
Go to Top of Page

bfrank1972
Starting Member

11 Posts

Posted - 2008-05-01 : 13:52:14
First apologies for that last note, I typed it pretty quickly - I just re-read it, and it's pretty cryptic :)

Ok so, PCI_Template_NR_Map is my reference table, and has all possible attribute_names, and there is just one set. PCI_Master has many sets of attribute_names, grouped by load_id. So for a given single load_id, the PCI_Master table could have all attribute_names that are in the PCI_Template_NR_Map table, or a subset. I want to compare the two tables, for each load_id, and the left outer join will give me nulls for where the PCI_Master is missing the attribute_names that are in PCI_Template_NR_Map.

select
a.load_id,
b.attribute_name,
a.attribute_loc,
b.attribute_loc
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (3666)

Outer join works perfectly, I get results like an outer join should be giving me (nulls for PCI_Master attribute_names that do not exist in PCI_Template_NR_Map).

select
a.load_id,
b.attribute_name,
a.attribute_loc,
b.attribute_loc
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (3667)

Outer join doesn't work, I get results back as if it were an inner join.

Both these queries have a subset of attribute_names compared to PCI_Template_NR_Map, so I *should* get all attribute_names back from the PCI_Template_NR_Map table, with nulls for the ones that do not exist in PCI_Master for a given load_id. In the first query (3666) it works beautifully. In the 2nd query (3667), it returns only the rows that match in PCI_Master for load_id 3667 (inner join). No nulls in the PCI_Master dataset, no nulls in the PCI_Template_NR_Map dataset. What could be a possible reason this join is not giving me all rows from PCI_Template_NR_Map for that specific load_id?

Thanks again

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-05-01 : 14:03:55
quote:
Originally posted by bfrank1972

First apologies for that last note, I typed it pretty quickly - I just re-read it, and it's pretty cryptic :)

Ok so, PCI_Template_NR_Map is my reference table, and has all possible attribute_names, and there is just one set. PCI_Master has many sets of attribute_names, grouped by load_id. So for a given single load_id, the PCI_Master table could have all attribute_names that are in the PCI_Template_NR_Map table, or a subset. I want to compare the two tables, for each load_id, and the left outer join will give me nulls for where the PCI_Master is missing the attribute_names that are in PCI_Template_NR_Map.

select
a.load_id,
b.attribute_name,
a.attribute_loc,
b.attribute_loc
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (3666)

Outer join works perfectly, I get result

select
a.load_id,
b.attribute_name,
a.attribute_loc,
b.attribute_loc
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (3667)

Outer join doesn't work, I get results back as if it were an inner join.

Both these queries have a subset of attribute_names compared to PCI_Template_NR_Map, so I *should* get all attribute_names back from the PCI_Template_NR_Map table, with nulls for the ones that do not exist in PCI_Master for a given load_id. In the first query (3666) it works beautifully. In the 2nd query (3667), it returns only the rows that match in PCI_Master for load_id 3667 (inner join). No nulls in the PCI_Master dataset, no nulls in the PCI_Template_NR_Map dataset. What could be a possible reason this join is not giving me all rows from PCI_Template_NR_Map for that specific load_id?

Thanks again




This is the expected behaviour as only if the attribute_name exists in PCI_Template_NR_Map table will the values including loadid will be retrieved from PCI_Master table and it will be able to filter using the in (subquery).I think you probably are looking for this:-


select
a.load_id,
b.attribute_name,
a.attribute_loc,
b.attribute_loc
from
PCI_Template_NR_Map b
full outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
(a.load_id in (select distinct top 53 load_id from PCI_Load')
or a.load_id is null)
Go to Top of Page

bfrank1972
Starting Member

11 Posts

Posted - 2008-05-01 : 14:44:46
Hmmm not sure if I was clear in my previous note, all I want are all attribute_values from PCI_Template_NR_Map, and matching values in PCI_Master for a given load_id OR nulls. That's my understanding of the left outer join.

Ok so more detail, comparing 3666 and 3667.

3666 - 753 rows, all a subset of the 1593 in PCI_Template_NR_Map.

3667 - 1593 rows, has all attribute_names that are in PCI_Template_NR_Map.

Here is what happens:

1: This gives me 1593 rows, with nulls from the PCI_Master side. This is good, how I expect a left outer join to work.

select <columns>
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (3666)

2: This gives me 1593 rows, with all matching columns, as the PCI_Master set has all of the same attribute_names that PCI_Template_NR_Map does. This is good, how I expect a left outer join to work, and I expect it to look like an inner join as there are no missing attribute names on the PCI_Master side.

select <columns>
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (3667)

3: This gives me 2346 (753+1593) rows, with no nulls from the PCI_Master side for 3666, and all rows for 3667. I want to have 3186 rows (1593+1593) from this, but it seems when I query more than one load_id it gets rid of the nulls for 3666 and gives me only matching rows (inner join). I know I must be doing something wrong here....

select <columns>
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (3666,3667)


Go to Top of Page

cat_jesus
Aged Yak Warrior

547 Posts

Posted - 2008-05-01 : 15:02:58
Just for clarity's sake I would put "and
a.load_id in (3666,3667)"

in the where clause. It also might performa little better.

I find your use of aliases just a tad confusing.

Is this the same as Number 3 ?



select <columns>
from PCI_Template_NR_Map A
left outer join PCI_Master B
on B.attribute_name = A.attribute_name

Where B.load_id in (3666,3667)


Do you care about duplicate values of Attribute_name in either table?



An infinite universe is the ultimate cartesian product.
Go to Top of Page

bfrank1972
Starting Member

11 Posts

Posted - 2008-05-01 : 15:24:23
Ok so I *think* I understand it a bit better:

I am querying two sets of data against the 'reference' table. Since I am querying more than one set of data (load_id using the 'in' clause), I am not getting the null rows for 3666, because I am getting a match for everything on 3667. It sees everything as one dataset as opposed to two seperate datasets. For instance if I did this:

select
a.load_id,
b.attribute_name,
a.attribute_loc,
b.attribute_loc
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (3666)

union

select
a.load_id,
b.attribute_name,
a.attribute_loc,
b.attribute_loc
from
PCI_Template_NR_Map b
left outer join
PCI_Master a
on a.attribute_name=b.attribute_name and
a.load_id in (3667) order by a.load_id

I get 3186 rows. I'm thinking I am going to have to approach this in another way....
Go to Top of Page

bfrank1972
Starting Member

11 Posts

Posted - 2008-05-01 : 15:42:49
cat_jesus, yes that query should be the same as #3, and I think you hit on what I think is the problem. There are duplicate attribute_names in the PCI_Master table. For instance, I want to check the single set of all attribute_names in PCI_Template_NR_Map against maybe 150 or so data sets in PCI_Master (150 different load_ids). Each load_id in PCI_Master can have 0 to 1593 different attributes... many of them have about 753, a few of them have all 1593 that are listed in PCI_Template_NR_Map. In the end, I want a query that returns any rows from PCI_Master that have different attribute_loc's (see the first post with the column detail) OR where the attribute exists in PCI_Template_NR_Map but not in PCI_Master. The trick here is I need resultsets for each load_id - the join is lumping them all into one. Is there a way to do this without a union of 150 select statements?! :) (or a sp with a cursor for that matter?)

BTW for those who are curious why I am doing this, we have a system that uses an excel template with named ranges. We map these named ranges to physical cell locations. We distribute them, users put data in them, and return them. Sometimes the corrupt the named ranges with cut and paste - so this is a system that loads all these excel documents and compares the named range values with the original template's to see if they match up. If not I want it to come out in the report, so we can fix things efficiently (and slap the person who messed it up).
Go to Top of Page

cat_jesus
Aged Yak Warrior

547 Posts

Posted - 2008-05-01 : 15:58:59
Stay away from cursors.

It seems to me that you should make use of common table expressions.

Have you ever thought of creating views and linking the excel files to the SQL server views? That way if they screw up the spreadsheet it breaks the link and they have to cry for help without corrupting the data.



An infinite universe is the ultimate cartesian product.
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-05-01 : 16:03:39
quote:
Originally posted by cat_jesus

Just for clarity's sake I would put "and
a.load_id in (3666,3667)"

select <columns>
from PCI_Template_NR_Map A
left outer join PCI_Master B
on B.attribute_name = A.attribute_name
Where B.load_id in (3666,3667)
Although it is hard to tell exactly what the OP wants, adding that condition to the WHERE clause is not the same thing. In essense that changes the query to an INNER JOIN. To help clarify, here is a link where I show some sample code (near the end of the thread) about the affect of placing a resriction on the join versus the where clause: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=98481
Go to Top of Page

cat_jesus
Aged Yak Warrior

547 Posts

Posted - 2008-05-01 : 16:14:52
Good point Lamprey. I was a bit confused there and was thinking about the where clause referencing the base table which doesn't make it an inner join. Nice catch.

An infinite universe is the ultimate cartesian product.
Go to Top of Page

bfrank1972
Starting Member

11 Posts

Posted - 2008-05-01 : 16:58:41
I'm trying to be as clear as I can, let's try this:

BEGIN

--Create our tables here

--reference table
declare @PCI_Template_MR_Map table
(
attribute_name varchar(255),
attribute_loc varchar(255)
)

--data from loaded forms
declare @PCI_Master table
(
load_id int,
attribute_name varchar(255),
attribute_loc varchar(255)
)

--insert our data, named range and absolute excel cell loacation
--this is our 'reference' table with all possible named ranges
--and the location they are supposed to be at
insert into @PCI_Template_MR_Map values ('name1','$C$1')
insert into @PCI_Template_MR_Map values ('name2','$C$2')
insert into @PCI_Template_MR_Map values ('name3','$C$3')
insert into @PCI_Template_MR_Map values ('name4','$C$4')
insert into @PCI_Template_MR_Map values ('name5','$C$5')

--insert our data, load_id (an individual sheet created from the template,
--we create many to be filled out from the original template), named range,
--and cell location. Notice some of the cell locations do not match the
--reference data, I want to catch this.
insert into @PCI_Master values (1,'name1','$C$1')
insert into @PCI_Master values (1,'name2','$C$1')
insert into @PCI_Master values (1,'name3','$C$3')
insert into @PCI_Master values (2,'name1','$C$1')
insert into @PCI_Master values (2,'name2','$C$2')
insert into @PCI_Master values (2,'name3','$C$3')
insert into @PCI_Master values (2,'name4','$C$5')
insert into @PCI_Master values (2,'name5','$C$5')

--For now let's just do a select to see what we get back. here we get back
--one row for every value in PCI_Master, 8 rows total. I want to get back
--10 rows. In other words, all attributes from the reference table for each
--load id 5 for load_id 1 and 5 for load_id 2.

select
pmast.load_id,
ptemp.attribute_name,
ptemp.attribute_loc,
pmast.attribute_loc
from
@PCI_Template_MR_Map ptemp
left outer join
@PCI_Master pmast
on ptemp.attribute_name=pmast.attribute_name and pmast.load_id in (1,2)


--this gives me what i want (sort of), except I need the load_id for the missing rows
select
pmast.load_id,
ptemp.attribute_name,
ptemp.attribute_loc,
pmast.attribute_loc
from
@PCI_Template_MR_Map ptemp
left outer join
@PCI_Master pmast
on ptemp.attribute_name=pmast.attribute_name and pmast.load_id in (1)

union

select
pmast.load_id,
ptemp.attribute_name,
ptemp.attribute_loc,
pmast.attribute_loc
from
@PCI_Template_MR_Map ptemp
left outer join
@PCI_Master pmast
on ptemp.attribute_name=pmast.attribute_name and pmast.load_id in (2)

End
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-05-01 : 18:01:09
I'm not sure I understand how the data actually relates versus what you want. Here is a query to get the correct number of rows:
SELECT 
*
FROM
@PCI_Template_MR_Map ptemp
CROSS JOIN
(
SELECT DISTINCT load_id
FROM @PCI_Master AS pmast
) AS Temp
But, that does not maintain the "attribute_name" association.
Go to Top of Page

cat_jesus
Aged Yak Warrior

547 Posts

Posted - 2008-05-02 : 08:49:56
quote:

--this gives me what i want (sort of), except I need the load_id for the missing rows



What do you mean by missing rows?





An infinite universe is the ultimate cartesian product.
Go to Top of Page

bfrank1972
Starting Member

11 Posts

Posted - 2008-05-02 : 09:32:24
Hmmm explaining things is not my strong point, bear with me. Here is as simple as I can put it -

Table 1:
attribute_name | value
------------------------
name1 | A1
name2 | B2
name3 | C2
name4 | C3
name5 | C5

Table 2:
load_id | attribute_name | value
----------------------------------
1 | name1 | A1
1 | name2 | B2
1 | name3 | C2
2 | name1 | A1
2 | name2 | B10
2 | name3 | C2
2 | name4 | C3
2 | name5 | C6
3 | name3 | C4
4 | name2 | B2
4 | name5 | C5

Ultimately, I am trying to get a result set that looks like:

load_id | attribute_name (table 1) | value (table 1) | value (table 2)
----------------------------------------------------------------------
1 | name4 | C3 | null
1 | name5 | C5 | null
2 | name2 | B2 | B10
2 | name5 | C5 | C6
3 | name1 | A1 | null
3 | name2 | B2 | null
3 | name3 | C2 | C4
3 | name4 | C3 | null
3 | name5 | C5 | null
4 | name1 | A1 | null
4 | name3 | C2 | null
4 | name4 | C3 | null

There are 4 sets of data in table 2 in this example, with various names. Table 1 hold the set of all possible names. My goal is to compare each set of data (1-4) in table 2 against the names/values in table 1. For each data set, compare the value for each name against the value in Table 1. IF the two values do not match, or IF table 2 is missing a name that exists in table 1, return those rows. Basically a comparison query. A simple left outer join won't work for this. I have it working in a very ham-fisted way of a SP getting all load_ids and then running a left outer join comparison query for each individual load_id (eh doesn't take *that* long).

Apologies for not being able to explain this better, I hope this helps.
Go to Top of Page

RyanRandall
Master Smack Fu Yak Hacker

1074 Posts

Posted - 2008-05-02 : 10:04:28
Try this...

declare @Table1 table (attribute_name varchar(10), value varchar(10))
insert @Table1
select 'name1', 'A1'
union all select 'name2', 'B2'
union all select 'name3', 'C2'
union all select 'name4', 'C3'
union all select 'name5', 'C5'

declare @Table2 table (load_id int, attribute_name varchar(10), value varchar(10))
insert @Table2
select 1, 'name1', 'A1'
union all select 1, 'name2', 'B2'
union all select 1, 'name3', 'C2'
union all select 2, 'name1', 'A1'
union all select 2, 'name2', 'B10'
union all select 2, 'name3', 'C2'
union all select 2, 'name4', 'C3'
union all select 2, 'name5', 'C6'
union all select 3, 'name3', 'C4'
union all select 4, 'name2', 'B2'
union all select 4, 'name5', 'C5'

select a.load_id, a.attribute_name, a.value, b.value
from
(select * from @Table1 cross join (select distinct load_id from @Table2) a) a
left outer join @Table2 b on a.load_id = b.load_id and a.attribute_name = b.attribute_name
where (a.value <> b.value or b.value is null)
order by a.load_id, a.attribute_name


Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
Go to Top of Page
    Next Page

- Advertisement -