I need help designing a query - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

I need help designing a query

Author

Topic

medtech26
Posting Yak Master

169 Posts

Posted - 2009-08-31 : 18:10:24

Hello,

I have a "parent child" relations table, each parent may have many children and vice versa, also, each child may be a parent to other children as well. All I have as of now are two columns with parent ids and children ids.

I wonder if there's a way to query the table and bring the results ordered in something similar to a "tree browse". The other way that I can think of would be some sort of recursive function through the application, a solution I would like to avoid if possible.

Any direction, hint ... anything would be appreciated.

Thanks in advance.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-08-31 : 18:26:09

If you look in Books Online under "CTE" (Common Table Expressions) there is just such an example.

medtech26
Posting Yak Master

169 Posts

Posted - 2009-08-31 : 20:33:35

Thanks, it looks exactly like what I was looking for.

medtech26
Posting Yak Master

169 Posts

Posted - 2009-09-01 : 16:08:34

OK, I have found the following:

[url]http://www.simple-talk.com/sql/sql-server-2005/sql-server-2005-common-table-expressions/ (great to understand the concept of CTE)[/url]
[url]http://www.anujgakhar.com/2008/06/24/sql-server-2005-common-table-expressions/ (exactly what I was looking for, made my life easier)[/url]

which helped me a lot as well as saved some time, however, in my table there are about 1.4m records (relations) and it takes 4-7m to bring results for selected parent. Here's my query:

WITH  CatTree(p_id, p_term, c_id, c_term, level) 
AS 
	( 
		SELECT p_id, p_term, c_id, c_term, 0 As [level] 
		FROM relationships 
		WHERE p_id = 12345 /* seed- initial parent */
		AND p_term NOT LIKE '*%' 
		AND c_term NOT LIKE '*%' 

	UNION ALL 

		SELECT t1.p_id, t1.p_term, t1.c_id, t1.c_term, [level]+1 AS [level]
		FROM relationships t1 INNER JOIN CatTree ON CatTree.c_id=t1.p_id 
		AND t1.p_term NOT LIKE '*%' 
		AND t1.c_term NOT LIKE '*%' 
	)

SELECT TOP 50000 p_id, p_term, c_id, c_term, MIN([level]) AS [level] 
FROM CatTree 
GROUP BY p_id, p_term, c_id, c_term 
OPTION (maxrecursion 10000)

note: p_term and c_term are not indexed, however, even when removing "AND p_term NOT LIKE '*%' AND c_term NOT LIKE '*%' " from both queries it still takes long time to process and might reduce only few seconds ... not more.

Is there any way you might think of to speed things up? Do you think the query needs some improvements? BTW, I'm running SQL Server 2005.

Thanks in advance

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-09-01 : 17:24:29

Couple of suggestions:

1. Make sure p_id and c_id are indexed.
2. You should index p_term and c_term as well, especially if you'll never have a wildcard prefix like '%blah blah%'.
3. Check your maxrecursion level, are you really modelling 10,000 levels? And do you really want to return that many? Reducing this may permit an optimization that 10,000 would not.
4. Without more info on your data, I don't see the need for GROUP BY and MIN. Typically parent-child hierarchies would not have multiple levels per key, in fact I can't see how they could be stored correctly in this structure (unless they're dupes)
5. You may have circular references in your data, or excessive dupe parent-child entries.

medtech26
Posting Yak Master

169 Posts

Posted - 2009-09-01 : 17:48:05

1. They are.
2. Done (should have been done before ...).
3. I set it up high to avoid error messages, it's just for testing at this stage.
4. Without GROUP BY and MIN I get many duplicates (same p_id/c_id with different levels), maybe the query is wrong ... ? As of the data, both p_id and c_id (together) used as the record unique key and indexed separately. Unfortunately that's how data is provided from third party.
5. How can check it?

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-09-01 : 18:11:43

I'd run this section and see how many rows you get:

SELECT p_id, p_term, c_id, c_term, 0 As [level]
FROM relationships
WHERE p_id = 12345 /* seed- initial parent */
AND p_term NOT LIKE '*%'
AND c_term NOT LIKE '*%'

If you have to, choose another p_id that only returns 2-3 rows at most. Change the CTE to remove the TOP, GROUP BY and MIN, change the maxrecursion to 3 and see how many total rows you get. Increase the maxrecursion until the rowcount does not increase. If you get to 10-12 and the rowcount keeps increasing each time, I suspect you have dupes in the table. You can find them with:

SELECT p_id, c_id, count(*) dupes FROM relationships GROUP BY p_id, c_id HAVING count(*)>1

If you don't find any dupes, then the CTE is probably written incorrectly.

I gotta get back to you on circular references, I don't have a solution off the top of my head.

medtech26
Posting Yak Master

169 Posts

Posted - 2009-09-01 : 19:39:37

I also suspect that the CTE might be written wrong as there are no duplicates in the relationships table (suggested query returned no results). The first query though returned 155 records so I switched the id to something more ... I mean less, with less relationships (2, as you suggested), now the results are coming in 0s and there are only 2-10 records for each id I tested with two relationships.

When taking off the GROUP BY and MIN I keep getting dupes with different levels in the results set.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources