Can't seem to find a way to optimize this...

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Can't seem to find a way to optimize this...

Author

Topic

Lohkay
Starting Member

5 Posts

Posted - 2007-11-14 : 10:52:16

Hi, we have surveys that gather data and we're trying to find the best way to query this data for reporting. It is not too late to change everything so we're open to any suggestions. Here is my current setup:

Results table from which each row represents a user that filled in the survey so it contains a unique identifier, a survey identifier and some other data which we use elsewhere.

Result_Answers table which holds the answers for each result. So it contains the result ID, the question ID and the answer ID

Questions and Answers table which hold the actual text for each identifier used in result_answers.

What we're trying to do is retrieve the count for each answers (thats easy) based on a set of conditional answers. For exemple:

Q1 - Are you male or female

Q2 - What is your age group

Q3 - What type of music do you listen to.

So lets say I want the count of each answers for each question but limited to the results that answered male on Q1 and 18-25 OR 25-35 on Q2. The only way I found to do this is create a cursor to loop trough each questions, select the result IDs that answered Q1 correctly and then remove from that temp table each result ID that didn't answer the subsequent questions correctly. This gives me a page long query with a cursor and multiple IN commands (for when there's more than one answer valid for a single question)

I'm not saying the results are abysmal, I'm still getting my results inside 2s with a bit over 1 million rows in those tables. But unfortunatly, this is basically only one survey... I can only imagine that it will get longer and longer as we add more surveys.

So my question is: Is there a better way to query this data or worst case, is my table schema not efficient enough? In both cases, how can I fix/optimize it?

I'm really hoping you guys can help me out!

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-11-14 : 11:24:37

You want help optimizing queries and schemas, neither of which you posted?

CODO ERGO SUM

Lohkay
Starting Member

5 Posts

Posted - 2007-11-14 : 11:37:07

Good point. But then again, I didn't wantto influence the answers. Anyway, here it is:

Results Schemas

CREATE TABLE [dbo].[Results](
[Result_ID] [int] IDENTITY(1,1) NOT NULL,
[ResultDate] [datetime] NULL,
[Jukebox_ID] [varchar](30) COLLATE Latin1_General_CI_AS NULL,
[Status] [varchar](50) COLLATE Latin1_General_CI_AS NULL,
[Survey_ID] [int] NOT NULL,
[Group_ID] [int] NULL,
CONSTRAINT [PK_Results_1] PRIMARY KEY CLUSTERED
(
[Result_ID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE UNIQUE NONCLUSTERED INDEX [Survey_ID] ON [dbo].[Results]
(
[Result_ID] ASC,
[Survey_ID] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

Result_Answers Schemas

CREATE TABLE [dbo].[Result_Answers](
[Result_ID] [int] NOT NULL,
[Question_ID] [int] NOT NULL,
[Answer_ID] [int] NULL,
[Answer_Text] [varchar](500) COLLATE Latin1_General_CI_AS NULL
) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [Answer_ID] ON [dbo].[Result_Answers]
(
[Answer_ID] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

CREATE UNIQUE CLUSTERED INDEX [Main Index] ON [dbo].[Result_Answers]
(
[Result_ID] ASC,
[Question_ID] ASC,
[Answer_ID] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

Stored Proc.

An AnswerList would look like: 455,456;467,468

ALTER PROCEDURE [dbo].[sp_GetAnswerResult]
@Question_ID as int = 0,
@AnswersList as varchar(600) = ''
AS
BEGIN
if ( @Question_ID != 0 )
BEGIN
SELECT 1 as Result

if ( @AnswersList != '' )
BEGIN
DECLARE @possibleAnswer as varchar(500)

DECLARE myCursor CURSOR FOR
SELECT value FROM dbo.ufn_Split( @AnswersList,';')

OPEN myCursor
FETCH NEXT FROM myCursor INTO @possibleAnswer

select Result_ID INTO #tmp FROM Result_Answers
WHERE Result_Answers.Answer_ID in (SELECT value FROM dbo.ufn_Split(@possibleAnswer, ','))

WHILE @@FETCH_STATUS = 0
BEGIN
DELETE FROM #tmp WHERE Result_ID NOT IN
( SELECT Result_ID FROM Result_Answers WHERE Answer_ID IN
( SELECT value FROM dbo.ufn_Split(@possibleAnswer, ',') ))

FETCH NEXT FROM myCursor INTO @possibleAnswer
END

CLOSE myCursor
DEALLOCATE myCursor

select Answers.Answer, count(distinct Result_Answers.Result_ID) as 'Value'
from #Tmp, Result_Answers LEFT OUTER JOIN Answers
ON Answers.Answer_ID = Result_Answers.Answer_ID
WHERE Result_Answers.Result_ID IN ( select Result_ID from #Tmp )
AND Result_Answers.Question_ID = @Question_ID
GROUP BY Answers.Answer
END
ELSE
BEGIN
select Answers.Answer, count(distinct Result_Answers.Result_ID) as 'Value'
from Result_Answers LEFT OUTER JOIN Answers
ON Answers.Answer_ID = Result_Answers.Answer_ID
WHERE Result_Answers.Question_ID = @Question_ID
GROUP BY Answers.Answer
END
END
ELSE
BEGIN
SELECT 0 as Result
END
END

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-11-14 : 11:52:32

What is this doing?
SELECT value FROM dbo.ufn_Split( @AnswersList,';')

Does this mean you are storing your answer in a delimited list, or other non-relational format?

CODO ERGO SUM

Lohkay
Starting Member

5 Posts

Posted - 2007-11-14 : 12:00:27

Yes, the Answerlist comes from a web form. They describe the filter the user selected. Its in the format XX,XX;XX,XX where each answer selected for one question are delimited by commas and each question by semi-colon. ufn_Split is a user function that splits the string into rows based on the delimiter you give it.

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-11-14 : 12:06:29

quote:
Originally posted by Lohkay

Yes, the Answerlist comes from a web form. They describe the filter the user selected. Its in the format XX,XX;XX,XX where each answer selected for one question are delimited by commas and each question by semi-colon. ufn_Split is a user function that splits the string into rows based on the delimiter you give it.

It is much better to store those in a relational format, with one row for each answer. Just because the data comes from the web form that way is not a good reason to store it that way. Instead of splitting the data into rows when you need to use it, split the data into rows before you store it into into tables.

CODO ERGO SUM

Lohkay
Starting Member

5 Posts

Posted - 2007-11-14 : 12:15:43

We don't store that information anywhere. Its only a filter passed by the user. And anyway, the splitting part (into tables) is almost free (doesn't use any cpu/IO). Our main problem is the cursor and SELECT that loops through 1M+ rows to crunch the numbers.

evilDBA
Posting Yak Master

155 Posts

Posted - 2007-11-14 : 12:57:58

>Its only a filter passed by the user. And anyway, the splitting part (into tables) is almost free (doesn't use any cpu/IO). Our main problem is the cursor and SELECT that loops through 1M+ rows to crunch the numbers.

NO, IT IS NOT FREE
It is because of splitting you need to "loop thru 1M+ rows"
If you use a realtional design, you could get an answer in milliseconds without traversing all rows.

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2007-11-14 : 13:13:17

The schema not withstanding, the issue you are facing is trying to create a web form where a user can select answers from different questions and they want to get a count of all the "Results" (users?) that provided all those same answers?

If this is what you are trying to do, I think this might be a case where you can take advantage of Dynamic SQL. Some other thigns you may want to add are a filter by Survey_ID or something to restrict the result set down to only that Survey that youa re looking at (assuming that is how it works). In general you would want to filter out as much as possible so if you can filter by Survey_ID and other criteria do it!! :)

Here is a sample I slapped together and did not test at all, but might give you an idea of how to use dynamic/dirty SQL to do what you want in one set based query rather than using a cursor.

DECLARE @AnswerByQuestion TABLE (AnswerList VARCHAR(500))

-- Load the Answers by Question
INSERT @AnswerByQuestion
SELECT value 
FROM dbo.ufn_Split(@AnswersList, ';') 

-- IF @@ROWCOUNT > 0

DECLARE @Sql VARCAHR(8000)
DECLARE @Answer VARCHAR(500)

-- Set up the "base" query
SET @Sql = '' +
	'SELECT 
		Answers.Answer, 
		count(DISTINCT Result_Answers.Result_ID) as [Value] 
	FROM 
		Result_Answers 
	LEFT OUTER JOIN 
		Answers 
	ON 
		Answers.Answer_ID = Result_Answers.Answer_ID 
	WHERE 
		Result_Answers.Question_ID = ' + CAST(@Question_ID  AS VARCHAR)

-- For each Question, add the Answerlist to the WHERE clause
SELECT TOP 1 @Answer = AnswerList
FROM @AnswerByQuestion

WHILE @Answer IS NOT NULL
BEGIN
	SET @Sql = @Sql + ' AND Result_Answers.Answer_ID IN(SELECT value FROM dbo.ufn_Split(''' + @Answer + ''', '',''))' 

	DELETE @AnswerByQuestion
	WHERE AnswerList = @Answer

	SELECT TOP 1 @Answer = AnswerList
	FROM @AnswerByQuestion

END --WHILE

SET @Sql = @Sql + ' GROUP BY Answers.Answer'

--PRINT @Sql
EXEC @Sql

EDIT - Couple of fix ups..

Lohkay
Starting Member

5 Posts

Posted - 2007-11-14 : 14:12:39

Thank you Lamprey, this is exactly what I was looking for. I didn't know you could build queries in a variable to be executed later.

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2007-11-14 : 14:22:04

Sure thing.. I also forgot to add the obligatory: [url]http://www.sommarskog.se/dynamic_sql.html[/url] :)

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources