Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 crosstab with 1000 columns

Author  Topic 

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-02-11 : 20:54:07
Hi folks,

I have a rather complicated TVF that returns a record set that conforms to this schema:


create table Speeds
(
RoadID int not null
,BinID smallint not null
,Speed tinyint not null
,primary key(RoadID asc, BinID asc)
)


BinId varies from 0 to 999. There are about 1 million distinct RoadID values, so the total number of rows returned by the TVF with no where clause applied is about 1 billion.

I can't post the code of the TVF. Suffice to say it's 5 way join, indexed such that this query starts returning rows instantly:


select RoadID, BinID, Speed
from dbo.ComplicatedTVF(@someArgs)
order by RoadID, BinID




My requirement is to create a crosstab report over these 1 billion rows that has these columns:

RoadID, SpeedAtBin0, SpeedAtBin1, ..., SpeedAtBin999



I tried it using both the PIVOT syntax available in 2005, and also the old-school method using CASE:


-- using PIVOT:
select RoadID,[0],[1],...[999]
from (select RoadID, Speed, BinID from dbo.ComplicatedTVF()) a
pivot (min(Speed) for BinID in ([0],[1],...[999])) b

-- using CASE:
select
RoadID
,min(CASE BinID when 0 then Speed else NULL end) AS Bin0
,min(CASE BinID when 1 then Speed else NULL end) AS Bin1
...
,min(CASE BinID when 999 then Speed else NULL end) AS Bin999
from dbo.ComplicatedTVF()
group BY RoadID



unfortunately, I get this error for either method:


Msg 8632, Level 17, State 2, Line 61
Internal error: An expression services limit has been reached.
Please look for potentially complex expressions in your query, and try to simplify them


if I limit the number of columns in the crosstab to 100 or so, the PIVOT method works. It falls over around 200.

I'm starting to think I might need to just loop over all the rows (gasp!) in compiled code and create the crosstab by hand...

One other thing I thought of was to dump the raw output of the TVF into a temp table, and try the crosstab on that. but I tend to shy away from a temp table with 1b rows. yuck.

I don't care if this takes an hour or so to run btw. It's part of an offline process.



elsasoft.org

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2008-02-11 : 21:00:47
Oh i didn't know that there is a limit in the PIVOT

Maybe you can split into 10 PIVOT with 100 columns each and join back ?


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-02-11 : 21:12:14
I don't think the limit is in PIVOT per se. the CASE method fails with the same error. I think the query is simply too "complicated" for the optimizer. Perhaps the parse tree is too deep or something.

I'll try your suggestion.


elsasoft.org
Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-02-11 : 21:15:33
just found this KB article about it: http://support.microsoft.com/kb/913050

apparently this huge crosstab uses more than 65,535 identifiers internally.

workaround is "rewrite your query".




elsasoft.org
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2008-02-11 : 21:34:02
Yes.


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2008-02-11 : 22:38:33
Where are you ultimately outputting your results? Why is it necessary to generate a pivot of 1,000 columns?

- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-02-12 : 00:18:17
output would go to a csv file. it's necessary because that's the requirement. apparently some of our customers would prefer to work with 100k rows of 1001 columns each rather than 1b rows of 3 columns each. I myself would prefer the latter, but I don't buy the product - I just build it.


elsasoft.org
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2008-02-12 : 07:23:38
Can you generate this CSV file with a custom app, say one written in .NET? It would be only a few simple lines of code to do the work there, I'd be happy to help if you can provide your specific requirements. It will be not only easier and much less code, but also much, much faster and more flexible.

However, I still have to ask: What, then, will the customer do with this file? They won't be able to make sense of it viewing it in Notepad, and they won't be able to import that many columns into a table or Excel file...what is the final goal here? This reminds me of the client who insisted that their primary reporting need was a report that would be 300,000 printed pages because they "always needed to see everything."

- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-02-12 : 09:43:33
quote:
Originally posted by jsmith8858

Can you generate this CSV file with a custom app, say one written in .NET? It would be only a few simple lines of code to do the work there, I'd be happy to help if you can provide your specific requirements. It will be not only easier and much less code, but also much, much faster and more flexible.



I have already done so. I first tried khtan's suggestion of joining 10 crosstabs in t-sql, but that was very slow. I killed it after an hour and no rows had yet been returned. my compiled app generates the CSV in under an hour, which is acceptable.

quote:
Originally posted by jsmith8858
However, I still have to ask: What, then, will the customer do with this file? They won't be able to make sense of it viewing it in Notepad, and they won't be able to import that many columns into a table or Excel file...what is the final goal here? This reminds me of the client who insisted that their primary reporting need was a report that would be 300,000 printed pages because they "always needed to see everything."



pm for this project says the customer wants to load it up in excel to "get a feel for the data". Apparently excel 2007 can support such large files, so they tell me anyway. I already objected to say this was pointless, but was overruled.




elsasoft.org
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-02-12 : 10:23:02
You could do this with dynamic sql.

I wrote an article about that here
http://www.sqlservercentral.com/articles/Advanced+Querying/pivottableformicrosoftsqlserver/2434/

It does one update per column. Reasonable fast. Please let me know if you have access to the article. Otherwise email me and I can send it as pdf to you.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-02-12 : 10:39:49
Here is the code without explanations. I trust you are experienced enough to dissect it
CREATE PROCEDURE uspSalesOfficeDepartmentPivot
(
@FromDate DATETIME,
@ToDate DATETIME
)
AS

SET NOCOUNT ON

CREATE TABLE #Aggregates
(
RowText VARCHAR(50),
ColumnText VARCHAR(50),
CellData INT
)

INSERT #Aggregates
(
RowText,
ColumnText,
CellData
)
SELECT Sales.Office,
Sales.Department,
SUM(Invoices.Amount)
FROM Sales
INNER JOIN Invoices ON Sales.OfficeID = Invoices.OfficeID
WHERE Invoices.OrderDate BETWEEN @FromDate AND @ToDate
GROUP BY Sales.Office,
Sales.Department
HAVING SUM(Invoices.Amount) <> 0

CREATE UNIQUE INDEX IX_Aggregates ON #Aggregates (RowText, ColumnText, CellData)

CREATE TABLE #Columns
(
ColumnIndex INT IDENTITY (0, 1),
ColumnText VARCHAR(50)
)

INSERT #Columns
(
ColumnText
)
SELECT DISTINCT ColumnText
FROM #Aggregates WITH (NOLOCK, INDEX(IX_Aggregates))
ORDER BY ColumnText

CREATE UNIQUE INDEX IX_Columns ON #Columns (ColumnIndex, ColumnText)

CREATE TABLE #Rows
(
RowText VARCHAR(50)
)

INSERT #Rows
(
RowText
)

SELECT DISTINCT RowText
FROM #Aggregates WITH (NOLOCK, INDEX(IX_Aggregates))

CREATE UNIQUE INDEX IX_Rows ON #Rows (RowText)

DECLARE @ColumnIndex INT,
@MaxColumnIndex INT,
@ColumnText VARCHAR(50),
@SQL VARCHAR(1000)

SELECT @ColumnIndex = 0,
@MaxColumnIndex = MAX(ColumnIndex)
FROM #Columns

WHILE @ColumnIndex <= @MaxColumnIndex
BEGIN
SELECT @ColumnText = ColumnText
FROM #Columns
WHERE ColumnIndex = @ColumnIndex

SELECT @SQL = 'ALTER TABLE #Rows ADD ' + QUOTENAME(@ColumnText) + ' INT NULL DEFAULT 0'
EXEC (@SQL)

SET @SQL = 'UPDATE r
SET r.' + QUOTENAME(@ColumnText) + ' = a.CellData
FROM #Rows AS r
INNER JOIN #Aggregates AS a WITH (NOLOCK, INDEX(IX_Aggregates)) ON a.RowText = r.RowText
INNER JOIN #Columns AS c WITH (NOLOCK, INDEX(IX_Columns)) ON c.ColumnText = a.ColumnText
WHERE c.ColumnIndex = ' + CAST(@ColumnIndex AS VARCHAR(12))
EXEC (@SQL)

SET @ColumnIndex = @ColumnIndex + 1
END

DROP TABLE #Columns,
#Aggregates

SELECT *
FROM #Rows
ORDER BY RowText

DROP TABLE #Rows



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-02-12 : 10:40:51
I don't see how dynamic sql would help.

I am not having any trouble composing the query. The problem is that the query, when composed, cannot be processed by sql server.

it matters not at this point anyway as I have a solution using compiled code.


elsasoft.org
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-02-12 : 10:45:19
It will create a 1000-column crosstab report reasonbly fast.
My example above builds one column at a time and populate that column for all rows before moving on the next column.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page
   

- Advertisement -