I'm getting very strange query plans, with very badly estimated costs, just finding the keys a column with a foreign key constraint isn't using. In fact, I'm getting a plan in 2005sp2 that takes about 40 times as long to run as the plan for the same query in 2000sp4. It looks like the plan on 2005sp2 is getting chosen on the basis of a ludicrously small cost estimate.I'd be interested to know whether it's an sp2 regression: I'd be surprised if I hadn't noticed it on 2005 before that.Here's a worked example.First we'll create two tables, a small one (A) with 1000 rows, and a large one (B) with 6 million rows. B.akey is an (unindexed) foreign key referencing A.akey, but only 853 of the rows in A are referenced. For a bit of verisilimitude, I've spaced out the ones that aren't referenced, and put some skew in there (that's the RAND() * RAND() bit) so some get referenced more than others.-- PKs named just to make the SHOWPLANs less messyCREATE TABLE A ( akey smallint NOT NULL, adesc varchar(100) NOT NULL, CONSTRAINT pkA PRIMARY KEY (akey))CREATE TABLE B ( bkey int NOT NULL, akey smallint NOT NULL, CONSTRAINT pkB PRIMARY KEY (bkey))SET NOCOUNT ONBEGIN TRANSACTIONDECLARE @a AS smallintSET @a = 0WHILE @a < 1000 BEGIN INSERT INTO A SELECT @a, 'akey ' + CAST(@a AS varchar(100)) SET @a = @a + 1ENDCOMMIT TRANSACTIONSET NOCOUNT OFFSET NOCOUNT ONBEGIN TRANSACTIONDECLARE @b AS intSET @b = 0WHILE @b < 6000000 BEGIN INSERT INTO B SELECT @b, CAST(RAND() * RAND() * 853 AS int) * 1000 / 853 SET @b = @b + 1ENDCOMMIT TRANSACTIONSET NOCOUNT OFFALTER TABLE B ADD CONSTRAINT B_akey FOREIGN KEY (akey) REFERENCES A
Let's just check what we've got:-- check there are 853 values used (i.e. 147 not used)SELECT COUNT(*), COUNT(DISTINCT akey)FROM B-- distribution skewed toward zero, with gaps:SELECT akey, COUNT(*) AS ctFROM BGROUP BY akeyORDER BY ct DESC
Now we'll find the A.akey values missing from B.akey.First with an outer-join and NULL test.SELECT A.*FROM ALEFT JOIN B ON A.akey = B.akeyWHERE bkey IS NULL
The execution plans and stats look like this:2000sp4: |--Filter(WHERE:([ B].[bkey]=NULL)) |--Hash Match(Left Outer Join, HASH:([A].[akey])=([ b].[akey])) |--Clustered Index Scan(OBJECT:([test].[dbo].[A].[pkA])) |--Clustered Index Scan(OBJECT:([test].[dbo].[ B].[pkB]))(147 row(s) affected)Table 'B'. Scan count 1, logical reads 11134, physical reads 0, read-ahead reads 0.Table 'A'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0.SQL Server Execution Times: CPU time = 1812 ms, elapsed time = 1816 ms.2005sp2: |--Parallelism(Gather Streams) |--Filter(WHERE:([test].[dbo].[ B].[bkey] IS NULL)) |--Hash Match(Left Outer Join, HASH:([test].[dbo].[A].[akey])=([test].[dbo].[ B].[akey])) |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([test].[dbo].[A].[akey])) | |--Clustered Index Scan(OBJECT:([test].[dbo].[A].[pkA])) |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([test].[dbo].[ B].[akey])) |--Clustered Index Scan(OBJECT:([test].[dbo].[ B].[pkB]))(147 row(s) affected)Table 'A'. Scan count 3, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'B'. Scan count 3, logical reads 12264, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times: CPU time = 3000 ms, elapsed time = 1556 ms.2005sp2, OPTION (MAXDOP 1): |--Filter(WHERE:([test].[dbo].[ B].[bkey] IS NULL)) |--Hash Match(Left Outer Join, HASH:([test].[dbo].[A].[akey])=([test].[dbo].[ B].[akey])) |--Clustered Index Scan(OBJECT:([test].[dbo].[A].[pkA])) |--Clustered Index Scan(OBJECT:([test].[dbo].[ B].[pkB]))(147 row(s) affected)Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'B'. Scan count 1, logical reads 11175, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'A'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times: CPU time = 2125 ms, elapsed time = 2129 ms.
All pretty much what you'd expect: an outer join and filter.Now let's try a subquery:SELECT A.*FROM AWHERE NOT EXISTS ( SELECT * FROM B WHERE A.akey = B.akey )
We'd get the same plan for a NOT IN and it doesn't matter if there's a DISTINCT in the subquery.The execution plans and stats look like this:2000sp4: |--Hash Match(Right Anti Semi Join, HASH:([ B].[akey])=([A].[akey])) |--Hash Match(Aggregate, HASH:([ B].[akey])) | |--Clustered Index Scan(OBJECT:([test].[dbo].[ B].[pkB])) |--Clustered Index Scan(OBJECT:([test].[dbo].[A].[pkA]))(147 row(s) affected)Table 'B'. Scan count 1, logical reads 11134, physical reads 0, read-ahead reads 0.Table 'A'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0.SQL Server Execution Times: CPU time = 1187 ms, elapsed time = 1189 ms.2005sp2: |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([test].[dbo].[A].[akey])) |--Clustered Index Scan(OBJECT:([test].[dbo].[A].[pkA])) |--Top(TOP EXPRESSION:((1))) |--Clustered Index Scan(OBJECT:([test].[dbo].[ B].[pkB]), WHERE:([test].[dbo].[A].[akey]=[test].[dbo].[ B].[akey]))(147 row(s) affected)Table 'B'. Scan count 1, logical reads 1657583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'A'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times: CPU time = 51203 ms, elapsed time = 51268 ms.
And here's our problem: it's just taken over 40 times as long to run the same simple query on SQL Server 2005sp2 as it took on 2000sp4.What's going on? Well, here are the estimated costs:40.03636 LEFT JOIN ... IS NULL4.589875 NOT EXISTSThat's estimating that the NOT EXISTS query has about a tenth the cost of the LEFT JOIN query. Why would it do that?Here's the estimated XML query plan:<ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.0" Build="9.00.3054.00"> <BatchSequence> <Batch> <Statements> <StmtSimple StatementText="SELECT A.* FROM A WHERE NOT EXISTS ( SELECT * FROM B WHERE A.akey = B.akey )" StatementId="1" StatementCompId="1" StatementType="SELECT" StatementSubTreeCost="4.58987" StatementEstRows="126.62" StatementOptmLevel="FULL"> <StatementSetOptions QUOTED_IDENTIFIER="false" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="false" ANSI_NULLS="false" ANSI_PADDING="false" ANSI_WARNINGS="false" NUMERIC_ROUNDABORT="false" /> <QueryPlan CachedPlanSize="9" CompileTime="28" CompileCPU="28" CompileMemory="256"> <MissingIndexes> <MissingIndexGroup Impact="99.5474"> <MissingIndex Database="[test]" Schema="[dbo]" Table="[ B]"> <ColumnGroup Usage="EQUALITY"> <Column Name="[akey]" ColumnId="2" /> </ColumnGroup> </MissingIndex> </MissingIndexGroup> </MissingIndexes> <RelOp NodeId="0" PhysicalOp="Nested Loops" LogicalOp="Left Anti Semi Join" EstimateRows="126.62" EstimateIO="0" EstimateCPU="0.00418" AvgRowSize="63" EstimatedTotalSubtreeCost="4.58987" Parallel="0" EstimateRebinds="0" EstimateRewinds="0"> <OutputList> <ColumnReference Database="[test]" Schema="[dbo]" Table="[A]" Column="akey" /> <ColumnReference Database="[test]" Schema="[dbo]" Table="[A]" Column="adesc" /> </OutputList> <NestedLoops Optimized="0"> <OuterReferences> <ColumnReference Database="[test]" Schema="[dbo]" Table="[A]" Column="akey" /> </OuterReferences> <RelOp NodeId="1" PhysicalOp="Clustered Index Scan" LogicalOp="Clustered Index Scan" EstimateRows="1000" EstimateIO="0.00460648" EstimateCPU="0.001257" AvgRowSize="63" EstimatedTotalSubtreeCost="0.00586348" Parallel="0" EstimateRebinds="0" EstimateRewinds="0"> <OutputList> <ColumnReference Database="[test]" Schema="[dbo]" Table="[A]" Column="akey" /> <ColumnReference Database="[test]" Schema="[dbo]" Table="[A]" Column="adesc" /> </OutputList> <IndexScan Ordered="0" ForcedIndex="0" NoExpandHint="0"> <DefinedValues> <DefinedValue> <ColumnReference Database="[test]" Schema="[dbo]" Table="[A]" Column="akey" /> </DefinedValue> <DefinedValue> <ColumnReference Database="[test]" Schema="[dbo]" Table="[A]" Column="adesc" /> </DefinedValue> </DefinedValues> <Object Database="[test]" Schema="[dbo]" Table="[A]" Index="[pkA]" /> </IndexScan> </RelOp> <RelOp NodeId="3" PhysicalOp="Top" LogicalOp="Top" EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007" AvgRowSize="9" EstimatedTotalSubtreeCost="4.57973" Parallel="0" EstimateRebinds="999" EstimateRewinds="0"> <OutputList /> <Top RowCount="0" IsPercent="0" WithTies="0"> <TopExpression> <ScalarOperator ScalarString="(1)"> <Const ConstValue="(1)" /> </ScalarOperator> </TopExpression> <RelOp NodeId="4" PhysicalOp="Clustered Index Scan" LogicalOp="Clustered Index Scan" EstimateRows="1" EstimateIO="8.24839" EstimateCPU="6.60008" AvgRowSize="9" EstimatedTotalSubtreeCost="4.17019" Parallel="0" EstimateRebinds="0" EstimateRewinds="999"> <OutputList /> <IndexScan Ordered="0" ForcedIndex="0" NoExpandHint="0"> <DefinedValues /> <Object Database="[test]" Schema="[dbo]" Table="[ B]" Index="[pkB]" /> <Predicate> <ScalarOperator ScalarString="[test].[dbo].[A].[akey]=[test].[dbo].[ B].[akey]"> <Compare CompareOp="EQ"> <ScalarOperator> <Identifier> <ColumnReference Database="[test]" Schema="[dbo]" Table="[A]" Column="akey" /> </Identifier> </ScalarOperator> <ScalarOperator> <Identifier> <ColumnReference Database="[test]" Schema="[dbo]" Table="[ B]" Column="akey" /> </Identifier> </ScalarOperator> </Compare> </ScalarOperator> </Predicate> </IndexScan> </RelOp> </Top> </RelOp> </NestedLoops> </RelOp> </QueryPlan> </StmtSimple> </Statements> </Batch> </BatchSequence></ShowPlanXML>It's estimating about the right number of rows in the result (126.62 versus the actual 147). And the number of rewinds in the clustered index scan on B is right at 999. But the cost of doing 1000 partial scans, 126.62 of which are expected to completely scan 6000000 rows, is utterly ludicrous.And indeed, with SET STATISTICS XML ON, this appears in the clustered index scan of B:<RunTimeInformation> <RunTimeCountersPerThread Thread="0" ActualRows="853" ActualEndOfScans="147" ActualExecutions="1000" /></RunTimeInformation>