Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Lohkay
Starting Member
5 Posts |
Posted - 2007-11-14 : 10:52:16
|
| Hi, we have surveys that gather data and we're trying to find the best way to query this data for reporting. It is not too late to change everything so we're open to any suggestions. Here is my current setup: Results table from which each row represents a user that filled in the survey so it contains a unique identifier, a survey identifier and some other data which we use elsewhere.Result_Answers table which holds the answers for each result. So it contains the result ID, the question ID and the answer IDQuestions and Answers table which hold the actual text for each identifier used in result_answers. What we're trying to do is retrieve the count for each answers (thats easy) based on a set of conditional answers. For exemple: Q1 - Are you male or female Q2 - What is your age groupQ3 - What type of music do you listen to. So lets say I want the count of each answers for each question but limited to the results that answered male on Q1 and 18-25 OR 25-35 on Q2. The only way I found to do this is create a cursor to loop trough each questions, select the result IDs that answered Q1 correctly and then remove from that temp table each result ID that didn't answer the subsequent questions correctly. This gives me a page long query with a cursor and multiple IN commands (for when there's more than one answer valid for a single question) I'm not saying the results are abysmal, I'm still getting my results inside 2s with a bit over 1 million rows in those tables. But unfortunatly, this is basically only one survey... I can only imagine that it will get longer and longer as we add more surveys.So my question is: Is there a better way to query this data or worst case, is my table schema not efficient enough? In both cases, how can I fix/optimize it? I'm really hoping you guys can help me out! |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-11-14 : 11:24:37
|
| You want help optimizing queries and schemas, neither of which you posted?CODO ERGO SUM |
 |
|
|
Lohkay
Starting Member
5 Posts |
Posted - 2007-11-14 : 11:37:07
|
| Good point. But then again, I didn't wantto influence the answers. Anyway, here it is:Results SchemasCREATE TABLE [dbo].[Results]([Result_ID] [int] IDENTITY(1,1) NOT NULL,[ResultDate] [datetime] NULL,[Jukebox_ID] [varchar](30) COLLATE Latin1_General_CI_AS NULL,[Status] [varchar](50) COLLATE Latin1_General_CI_AS NULL,[Survey_ID] [int] NOT NULL,[Group_ID] [int] NULL,CONSTRAINT [PK_Results_1] PRIMARY KEY CLUSTERED ([Result_ID] ASC)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]) ON [PRIMARY]CREATE UNIQUE NONCLUSTERED INDEX [Survey_ID] ON [dbo].[Results] ([Result_ID] ASC,[Survey_ID] ASC)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]Result_Answers SchemasCREATE TABLE [dbo].[Result_Answers]([Result_ID] [int] NOT NULL,[Question_ID] [int] NOT NULL,[Answer_ID] [int] NULL,[Answer_Text] [varchar](500) COLLATE Latin1_General_CI_AS NULL) ON [PRIMARY]CREATE NONCLUSTERED INDEX [Answer_ID] ON [dbo].[Result_Answers] ([Answer_ID] ASC)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]CREATE UNIQUE CLUSTERED INDEX [Main Index] ON [dbo].[Result_Answers] ([Result_ID] ASC,[Question_ID] ASC,[Answer_ID] ASC)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]Stored Proc.An AnswerList would look like: 455,456;467,468ALTER PROCEDURE [dbo].[sp_GetAnswerResult]@Question_ID as int = 0,@AnswersList as varchar(600) = ''ASBEGINif ( @Question_ID != 0 ) BEGINSELECT 1 as Result if ( @AnswersList != '' )BEGINDECLARE @possibleAnswer as varchar(500)DECLARE myCursor CURSOR FOR SELECT value FROM dbo.ufn_Split( @AnswersList,';')OPEN myCursorFETCH NEXT FROM myCursor INTO @possibleAnswerselect Result_ID INTO #tmp FROM Result_Answers WHERE Result_Answers.Answer_ID in (SELECT value FROM dbo.ufn_Split(@possibleAnswer, ','))WHILE @@FETCH_STATUS = 0BEGINDELETE FROM #tmp WHERE Result_ID NOT IN ( SELECT Result_ID FROM Result_Answers WHERE Answer_ID IN ( SELECT value FROM dbo.ufn_Split(@possibleAnswer, ',') ))FETCH NEXT FROM myCursor INTO @possibleAnswerENDCLOSE myCursorDEALLOCATE myCursorselect Answers.Answer, count(distinct Result_Answers.Result_ID) as 'Value' from #Tmp, Result_Answers LEFT OUTER JOIN Answers ON Answers.Answer_ID = Result_Answers.Answer_ID WHERE Result_Answers.Result_ID IN ( select Result_ID from #Tmp )AND Result_Answers.Question_ID = @Question_IDGROUP BY Answers.AnswerENDELSEBEGINselect Answers.Answer, count(distinct Result_Answers.Result_ID) as 'Value' from Result_Answers LEFT OUTER JOIN Answers ON Answers.Answer_ID = Result_Answers.Answer_ID WHERE Result_Answers.Question_ID = @Question_IDGROUP BY Answers.AnswerEND ENDELSEBEGINSELECT 0 as ResultENDEND |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-11-14 : 11:52:32
|
| What is this doing?SELECT value FROM dbo.ufn_Split( @AnswersList,';')Does this mean you are storing your answer in a delimited list, or other non-relational format?CODO ERGO SUM |
 |
|
|
Lohkay
Starting Member
5 Posts |
Posted - 2007-11-14 : 12:00:27
|
| Yes, the Answerlist comes from a web form. They describe the filter the user selected. Its in the format XX,XX;XX,XX where each answer selected for one question are delimited by commas and each question by semi-colon. ufn_Split is a user function that splits the string into rows based on the delimiter you give it. |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-11-14 : 12:06:29
|
quote: Originally posted by Lohkay Yes, the Answerlist comes from a web form. They describe the filter the user selected. Its in the format XX,XX;XX,XX where each answer selected for one question are delimited by commas and each question by semi-colon. ufn_Split is a user function that splits the string into rows based on the delimiter you give it.
It is much better to store those in a relational format, with one row for each answer. Just because the data comes from the web form that way is not a good reason to store it that way. Instead of splitting the data into rows when you need to use it, split the data into rows before you store it into into tables.CODO ERGO SUM |
 |
|
|
Lohkay
Starting Member
5 Posts |
Posted - 2007-11-14 : 12:15:43
|
| We don't store that information anywhere. Its only a filter passed by the user. And anyway, the splitting part (into tables) is almost free (doesn't use any cpu/IO). Our main problem is the cursor and SELECT that loops through 1M+ rows to crunch the numbers. |
 |
|
|
evilDBA
Posting Yak Master
155 Posts |
Posted - 2007-11-14 : 12:57:58
|
| >Its only a filter passed by the user. And anyway, the splitting part (into tables) is almost free (doesn't use any cpu/IO). Our main problem is the cursor and SELECT that loops through 1M+ rows to crunch the numbers.NO, IT IS NOT FREEIt is because of splitting you need to "loop thru 1M+ rows"If you use a realtional design, you could get an answer in milliseconds without traversing all rows. |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2007-11-14 : 13:13:17
|
The schema not withstanding, the issue you are facing is trying to create a web form where a user can select answers from different questions and they want to get a count of all the "Results" (users?) that provided all those same answers?If this is what you are trying to do, I think this might be a case where you can take advantage of Dynamic SQL. Some other thigns you may want to add are a filter by Survey_ID or something to restrict the result set down to only that Survey that youa re looking at (assuming that is how it works). In general you would want to filter out as much as possible so if you can filter by Survey_ID and other criteria do it!! :)Here is a sample I slapped together and did not test at all, but might give you an idea of how to use dynamic/dirty SQL to do what you want in one set based query rather than using a cursor. DECLARE @AnswerByQuestion TABLE (AnswerList VARCHAR(500))-- Load the Answers by QuestionINSERT @AnswerByQuestionSELECT value FROM dbo.ufn_Split(@AnswersList, ';') -- IF @@ROWCOUNT > 0DECLARE @Sql VARCAHR(8000)DECLARE @Answer VARCHAR(500)-- Set up the "base" querySET @Sql = '' + 'SELECT Answers.Answer, count(DISTINCT Result_Answers.Result_ID) as [Value] FROM Result_Answers LEFT OUTER JOIN Answers ON Answers.Answer_ID = Result_Answers.Answer_ID WHERE Result_Answers.Question_ID = ' + CAST(@Question_ID AS VARCHAR)-- For each Question, add the Answerlist to the WHERE clauseSELECT TOP 1 @Answer = AnswerListFROM @AnswerByQuestionWHILE @Answer IS NOT NULLBEGIN SET @Sql = @Sql + ' AND Result_Answers.Answer_ID IN(SELECT value FROM dbo.ufn_Split(''' + @Answer + ''', '',''))' DELETE @AnswerByQuestion WHERE AnswerList = @Answer SELECT TOP 1 @Answer = AnswerList FROM @AnswerByQuestionEND --WHILESET @Sql = @Sql + ' GROUP BY Answers.Answer'--PRINT @SqlEXEC @SqlEDIT - Couple of fix ups.. |
 |
|
|
Lohkay
Starting Member
5 Posts |
Posted - 2007-11-14 : 14:12:39
|
| Thank you Lamprey, this is exactly what I was looking for. I didn't know you could build queries in a variable to be executed later. |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2007-11-14 : 14:22:04
|
| Sure thing.. I also forgot to add the obligatory: [url]http://www.sommarskog.se/dynamic_sql.html[/url] :) |
 |
|
|
|
|
|
|
|