Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Recursive CTEs and full-text search

Author  Topic 

ron2112
Starting Member

44 Posts

Posted - 2010-01-15 : 10:00:22
I'm pretty sure what I'm trying to do is impossible in 2005, but I want to lay out the idea and see if anyone can help me figure out how to make this work.

We've got a table that populates what looks more or less like an Excel spreadsheet. Each cell has a reference consisting of a column letter and row number... so the cell in column A row 1 would be cell A1. Each cell may contain a formula (basic +,-,/,* syntax) that may or may not refer to other cells. For example, a cell (let's call it cell A1) that contains the formula B2+C3+D4 contains references to cells B2, C3, and D4.

For this illustration, let's define the table as:

CREATE TABLE CellFormulas
( ID int IDENTITY PRIMARY KEY,
CellReference varchar(5),
Formula varchar(100))

Obviously, we need to calculate the values of cells B2, C3, and D4 before we can calculate cell A1, because those three cells won't have values yet if we calculate A1 first. A1's formula makes it dependent on the other three cells. So some sort of pecking order needs to be established.

My concept in a nutshell is to use a recursive CTE to collect the cell IDs more or less as follows:

WITH cte_Cells(Order,ID,Level)
AS (
SELECT [all cells from CellFormulas with no cell references in their formula]

UNION ALL

SELECT [all cells from CellFormulas with references to the anchor member in their formulas]
)

So I calculate all non-dependent cells first, then those cells that are dependent on the first set of cells, then those cells that are dependent on the second set... and so on until I reach the set of cells upon which no other cells are dependent.

I can generate the anchor member easily enough by selecting cells with NOT Formula LIKE '%[B-M][0-9]%'. Only formulas that contain cell references will match that pattern, so this excludes them nicely. (There are additional exclusions, but they're not relevant to this topic.)

As for the recursive member, I can make that happen by joining CellFormulas to cte_Cells on CellFormulas.Formula LIKE '%' + cte_Cells.CellReference + '[^0-9]%'.

WITH cte_Cells(Order,ID,Level)
AS (
SELECT [column list], 0 AS Level
FROM CellFormulas
WHERE NOT Formula LIKE '%[B-M][1-9]%'

UNION ALL

SELECT [column list], cte_Cells.Level + 1 AS Level
FROM CellFormulas
INNER JOIN cte_Cells
ON CellFormulas.Formula LIKE '%' + cte_Cells.CellReference + '[^0-9]%'
)

I need to include that [^0-9] at the end to ensure a reference to cell B5 doesn't match a reference to cell B55.

This works very well -- I get exactly the result set I'm looking for. But it's slow, taking about 20 seconds to return a result set of about 300 rows. That's not gonna fly, and I'm sure it's the LIKE comparison that's causing the slowdown.

[I know this is long-winded, sorry... go take a break if you like.]

Okay, so I have my recursive CTE working but it's too slow. It occurred to me that maybe full-text search might help. I have never used this feature before, so I'm flying blind and hitting walls. This is where I'm hoping someone can guide me a bit.

I set up a full-text index on CellFormulas.Formula. It's my understanding that LIKE does not make use of the full-text index, and I have to use CONTAINS(). This is where I'm hitting problems.

You can't do this, apparently:

WITH cte_Cells(Order,ID)
AS (
SELECT [column list], 0 AS Level
FROM CellFormulas
WHERE NOT Formula LIKE '%[B-M][1-9]%'

UNION ALL

SELECT [column list], cte_Cells.Level + 1 AS Level
FROM CellFormulas
INNER JOIN cte_Cells
ON CONTAINS(CellFormulas.Formula, cte_Cells.CellReference)
)

CONTAINS can only be used in WHERE clauses apparently, so I can't use it in my join. I would need the ability to concatenate the [^0-9] exclusion on to CellReference as well, which I don't believe is an option either. And this is pretty much where I hit the wall.

So... am I barking up the wrong tree? Is it time to bail on this idea and find other ways of speeding up this query? Or is there some ray of hope that I might be able to leverage full-text search here?
   

- Advertisement -