Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 SQL join to show one row per unique record

Author  Topic 

zchase
Starting Member

8 Posts

Posted - 2015-04-23 : 09:28:42
Hi,

I am new to SQL but trying to do join a few tables to get result showing showing one row per unique record.

Tables include:-
1. REQ
2. RFQ
3. PO
4. DOCUMENT (contains LAST_DOCUMENT_STATUS, DOCUMENT_ID, DOCUMENT_NUMBER, for example, REQ_CANCELLED, REQ_ID, REQ_NO)
5. DOCUMENT_STATUS (contains status of document, REQ_CREATE)
6. DOCUMENT_TRAIL (contains link between documents, PARENT_DOCUMENT, CURRENT_DOCUMENT, for example, REQ_ID (PARENT_DOCUMENT), RFQ_ID (CURRENT_DOCUMENT)
7. PO_REVISION (contains PO REVISION, when link with DOCUMENT, PO_REV_NO)

Currently when i tried to join all the TABLES, i get multiple lines against REQ_NO.

I realised the multiple lines generated due to the following:-

One to many relationships:
A. RFQ - 1 or more PO
B. PO - 1 or more PO_REVISON

I was thinking how to MAX the records in PO to show only the last PO_REVISION. It seems that DOCUMENT_TRAIL will contain 1 base document PO and 1 or more PO_REVISION.

Would appreciate some assistance to teach me on some techniques.

Regards,

Kristen
Test

22859 Posts

Posted - 2015-04-23 : 12:29:06
Something like this perhaps?

SELECT Col1, Col2, ...
FROM
(
SELECT Col1, Col2, ...,
ROW_NUMBER OVER
(
PARTITION BY R.SomeID
ORDER BY R.SomeID, R.PO_REV_NO DESC
) AS T_RowNumber
FROM PO
JOIN PO_REVISION AS R
ON R.SomeID = PO.SomeID
WHERE ...
) AS X
WHERE T_RowNumber = 1 -- Include only the first row from PO_REVISION
ORDER BY Col1, ...


There might be smarter ways of doing this depending on:

What version of SQL are you using?
Go to Top of Page

zchase
Starting Member

8 Posts

Posted - 2015-05-01 : 04:48:58
Microsoft SQL Server version 2008 R2.

What is the smarter ways of doing this.
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2015-05-01 : 08:39:42
In 2012, you could use the FIRSTVALUE function, but Kristen's solution is good for 2008R2
Go to Top of Page

zchase
Starting Member

8 Posts

Posted - 2015-05-01 : 22:39:24
i was able to get the results with the following script:-
- Removed the PO revision but still retain multiple POs reference to each REQ/RFQ.


SELECT doc1.document_number AS REQ_NUM,
CONVERT(INT, doc2.document_number) AS RFQ_NUM,
CONVERT(INT, doc3.document_number) AS PO_NUM,
CONVERT(INT, doc4.document_number) AS PO_REV
FROM requisition req
LEFT JOIN document doc1
ON doc1.document_id = req.document_id
LEFT JOIN document_trail dt1
ON dt1.document_id = doc1.document_id
LEFT JOIN document_trail dt2
ON dt2.parent_document_id = doc1.document_id
AND dt2.document_type_id = 2
LEFT JOIN document doc2
ON doc2.document_id = dt2.document_id
LEFT JOIN rfq rf
ON rf.document_id = doc2.document_id
LEFT JOIN (SELECT *
FROM (SELECT dt.document_id,
dt.parent_document_id,
Row_number()
OVER (
partition BY dt.parent_document_id,
dt.document_id
ORDER BY dt.document_id ASC ) AS
T_RowNumber
FROM document_trail dt
INNER JOIN po p
ON p.document_id = dt.document_id
WHERE document_type_id = 3) AS temp
WHERE temp.t_rownumber = 1) dt3
ON ( dt3.parent_document_id = doc2.document_id
OR dt3.parent_document_id = doc1.document_id )
LEFT JOIN document doc3
ON doc3.document_id = dt3.document_id
LEFT JOIN po p
ON p.document_id = doc3.document_id
LEFT JOIN po_revision pov
ON pov.document_revision_id = p.document_id
AND pov.document_revision_id = p.current_revision_id
LEFT JOIN document doc4
ON doc4.document_id = p.current_revision_id
LEFT JOIN document_ship dship
ON dship.document_id = doc1.document_id
LEFT JOIN ship s
ON s.ship_id = dship.ship_idWHERE s.hidden = 0
ORDER BY doc1.document_number
Go to Top of Page

zchase
Starting Member

8 Posts

Posted - 2015-05-02 : 02:45:28
i was able to get most of the results correct but i still have some records without PO_NUM.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2015-05-02 : 03:15:39
I haven't checked it carefully but I don't think you can do:

Row_number()
OVER (
partition BY dt.parent_document_id,
dt.document_id
ORDER BY dt.document_id ASC

but rather you must include the PARTITION columns in the ORDER BY, and then any additional "tie break" column(s)

Row_number()
OVER (
partition BY dt.parent_document_id,
dt.document_id
ORDER BY dt.parent_document_id, dt.document_id, dt.document_id ASC


Please put
[CODE]
...
[/CODE]

tags around your code to preserve the formatting.

By the by you can simplify

JOIN
(
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER (...) AS T_RowNumber,
....
FROM ...
) AS TEMP
WHERE TEMP.T_RowNumber = 1
) AS DT3
ON ( dt3.parent_document_id = doc2.document_id
OR dt3.parent_document_id = doc1.document_id )

to

JOIN
(
SELECT ROW_NUMBER() OVER (...) AS T_RowNumber,
....
FROM ...
) AS DT3
ON ( dt3.parent_document_id = doc2.document_id
OR dt3.parent_document_id = doc1.document_id )
AND dt3.T_RowNumber = 1


Go to Top of Page

zchase
Starting Member

8 Posts

Posted - 2015-05-02 : 06:58:25
I suspect it may be due to data issue but unable to verify it.

[CODE]
SELECT doc1.document_number AS REQ_NUM
,CONVERT(INT, doc2.document_number) AS RFQ_NUM
,CONVERT(INT, doc3.document_number) AS PO_NUM
,CONVERT(INT, doc4.document_number) AS PO_REV
FROM requisition req
LEFT JOIN document doc1
ON doc1.document_id = req.document_id
LEFT JOIN document_trail dt1
ON dt1.document_id = doc1.document_id
LEFT JOIN document_trail dt2
ON dt2.parent_document_id = doc1.document_id
AND dt2.document_type_id = 2
LEFT JOIN document doc2
ON doc2.document_id = dt2.document_id
LEFT JOIN rfq rf
ON rf.document_id = doc2.document_id
LEFT JOIN (SELECT *
FROM (SELECT dt.document_id
,dt.parent_document_id
,Row_number()
OVER (
partition BY dt.parent_document_id
,dt.document_id
ORDER BY dt.parent_document_id, dt.document_id DESC) AS T_RowNumber
FROM document_trail dt
INNER JOIN po p
ON p.document_id = dt.document_id
WHERE dt.document_type_id = 3) AS temp
) AS dt3
ON ( dt3.parent_document_id = doc2.document_id
OR dt3.parent_document_id = doc1.document_id )
AND dt3.t_rownumber = 1
LEFT JOIN document doc3
ON doc3.document_id = dt3.document_id
LEFT JOIN po p
ON p.document_id = doc3.document_id
LEFT JOIN po_revision pov
ON pov.document_revision_id = p.document_id
AND pov.document_revision_id = p.current_revision_id
LEFT JOIN document doc4
ON doc4.document_id = p.current_revision_id
LEFT JOIN document_ship dship
ON dship.document_id = doc1.document_id
LEFT JOIN ship s
ON s.ship_id = dship.ship_id
WHERE s.hidden = 0
ORDER BY doc1.document_number
[/CODE]
Go to Top of Page

zchase
Starting Member

8 Posts

Posted - 2015-05-02 : 07:04:45
sample data i got from the above script.
Basically everything in order except for first line.
I got about 300+ REQ_NUM without PO_NUM.

[CODE]
REQ_NUM RFQ_NUM PO_NUM PO_REV
ETASEN0000017 2007008 NULL NULL
KENADK0000590 NULL 1700122 0
KENAEN0000114 2004704 5007456 0
KENAEN0000114 2004704 5007457 2
KENAEN0000114 2004704 5007458 1
KENAEN0000114 2004704 5007459 1
KENAEN0000403 7101818 8102483 0
KENAEN0000403 7101818 8102484 0
KENAEN0000588 7104470 8104098 0
[/CODE]
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2015-05-02 : 13:55:54
Note that I said "and then any additional "tie break" column(s)" but you have left those out. It may not influence whether you get PO_NUM, but without it you will potentially get different results each time you run it. I gave you a full example, you've only copied part of it. It is important that you understand the code, otherwise you are not going to be able to make it work correctly, safely.

Same with moving "dt3.t_rownumber = 1" into the JOIN's "ON" clause - you've left part of the original code behind. It won't make any difference to the outcome but it suggests that you either don't understand the code and/or you are happy with sloppy code. I'm not!

I can't see how we can solve the rows without PO_NUM issue without some sample data which illustrates the problem. Given the number of tables you have I doubt that will be easy ...
Go to Top of Page

zchase
Starting Member

8 Posts

Posted - 2015-05-03 : 06:31:11
Is there a way i can provide sample date for your to check my SQL script?

Perhaps maybe you can use this link to generate the data and test it.

http://sqlfiddle.com/#!6/d5203/1
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2015-05-03 : 08:57:12
SQL Fiddle is fine (personally I'd prefer that the Tables where #TEMP otherwise I have to create / use a temporary database etc. etc. so probably fewer people will bother to help you if you use permanent tables)

REQ_NUM = ETASEN0000017

has no corresponding row(s) in [PO]. There is one corresponding row in DOCUMENT_trail (DOCUMENT_ID=7400034180) which also has document_type_id=3, but no row in [PO] for DOCUMENT_ID=7400034180. Thus PO_NUM displays NULL for that row.

Note that DOCUMENT_trail aliased as "dt1" table is unused, so could be removed from the query
Go to Top of Page

zchase
Starting Member

8 Posts

Posted - 2015-05-03 : 10:11:44
i am pretty new at this since i just learn SQL in like a week and running through the data model and data dictionary which i am totally not familiar with.

I had to work with it since no one else in the IT department wants to help.
Even the company who developed the application wants to charge me for the SQL script.

For your information, i am suspecting it could be data issue but i can't rule out the incompetency of me not knowing the best way to write a script.

Nevertheless, i will submit my results to the company for more investigation work.

Thx for your help on the matter since i came on this forum to seek assistance.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2015-05-03 : 13:20:23
quote:
Originally posted by zchase

i am suspecting it could be data issue



I suppose you might need to ORDER the sub-query DT3 so that any row in [PO] that DOES have a corresponding [document_id] in [DOCUMENT] table was ordered first. It might be that the first ROW_NUMBER() ... OVER() that you are getting, ordered by:

dt.parent_document_id, dt.document_id, dt.document_id DESC

happens NOT to have a corresponding row in [DOCUMENT] table (i.e. but there are other joined rows that DO have a corresponding [DOCUMENT] row). (However, maybe you will tell me that THAT should never happen, and all rows should have a [DOCUMENT] record, and thus that in itself would be a "Data Issue"? )

What I would suggest you do is to query for missing data - i.e. data that SHOULD be there but is missing.

I had a quick look at your query but I don't have enough time to manipulate it into an "exceptions" report.

Basically what you are looking for is:

SELECT req.DOCUMENT_ID
FROM (Your Query)
WHERE NOT EXISTS (a row with doc3.document_id IS NOT NULL)

In this example that would be ETASEN0000017

Then feed that list of IDs back into the query and display ALL rows (not just the ROW_NUMBER() OVER() #1 first row) - so that you can see if there IS anything in doc3 that SHOULD have been included in your report.

If you find anything you need to improve your query (to include it) ...

... if you find NOTHING then I agree that it looks like a Data Issue - and your "Exception Report" can then be used to find any/all the other instances where the data is goofy.
Go to Top of Page
   

- Advertisement -