Spatial Data Types index - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Spatial Data Types index

Author

Topic

michael.appleton
Posting Yak Master

160 Posts

Posted - 2009-10-14 : 12:03:47

Hello,
We've just upgrades to SQL Server 2008 and I've been testing out some of the new features but have run into a problem with the geography data type indexing. For some reason, the optimizer ignores the index when ordering by distance between two points:



--checks spatial index usage

--create table to store points
CREATE TABLE #SomePoints (
	Id int Primary Key Identity(1,1),
	Point Geography
)

--create spatial index
CREATE SPATIAL INDEX tempindex
   ON #SomePoints(Point);

--Populate points table
WITH Numbers AS(
	SELECT i=1
	UNION ALL
	SELECT i+1 FROM Numbers WHERE i<30000	--condition defines amount of points
)
 
     
INSERT INTO #SomePoints(Point)
--rand(CAST(NewID() as varbinary) is used to create random numbers (rand(seed) does not by itself create real randoms) that are changed to latitude and longitude limits
SELECT geography::Point(90*(rand(CAST(NewID() as varbinary) )-0.5),180*(rand(CAST(NewID() as varbinary) )-0.5),4326)
FROM Numbers
OPTION(MAXRECURSION 32767)	--set maxrecursion to highest


--declare point to compare
DECLARE @Point geography=geography::Point(50,50,4326);

--Uses spatial index
SELECT 
	S.*,
	@Point.STDistance (S.Point)*1e-3 --distance in KM
FROM 
	#SomePoints S 
WHERE @Point.STDistance (S.Point)<7000e3


--does not use spatial index (why not?)
SELECT 
	S.*,
	@Point.STDistance (S.Point)*1e-3--distance in KM
FROM 
	#SomePoints S	--WITH(INDEX(tempindex)) --throws up error with hint
ORDER BY @Point.STDistance (S.Point) 
	
	
--clean up
DROP TABLE #SomePoints

This is pretty disappointing as it means the old method of calculating closest neighbours using the GreaterCircle formula with latitude and longitude stored as floats works a lot quicker than using the dedicated geography data type and functions with indexes. I know that the formula used in the new spatial co-ordinates is more accurate, but expected the use spatial indexes to mean they would still be faster.

Can anyone explain why the index is ignored in the last query and not in the one before?

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2009-10-14 : 13:14:45

Without seeing the ececution plan, SQL probably ignores the index on the last query because there is no WHERE clause, so it'll probably just do a TABLE or CULSTERED INDEX scan.

michael.appleton
Posting Yak Master

160 Posts

Posted - 2009-10-15 : 05:29:00

Thanks for the reply. I realise the last query does not use the index, I don't understand why though.

Here's the execution plan for the first select query:


  |--Compute Scalar(DEFINE:([Expr1002]=[@Point].STDistance([tempdb].[dbo].[#SomePoints].[Point] as [S].[Point])*(1.000000000000000e-003)))
       |--Nested Loops(Inner Join, OUTER REFERENCES:([S].[Point], [Expr1004], [Expr1005], [Expr1006]))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[sys].[extended_index_1557580587_384000].[pk0], [Expr1064]) WITH UNORDERED PREFETCH)
            |    |--Hash Match(Aggregate, HASH:([tempdb].[sys].[extended_index_1557580587_384000].[pk0]) DEFINE:([Expr1004]=MAX([Expr1057]), [Expr1005]=ANY([tempdb].[sys].[extended_index_1557580587_384000].[SRID]), [Expr1006]=ANY([Expr1058])))
            |    |    |--Compute Scalar(DEFINE:([Expr1057]=[tempdb].[sys].[extended_index_1557580587_384000].[Cell_Attributes]+[Expr1052], [Expr1058]=[Expr1053]))
            |    |         |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1054], [Expr1055], [Expr1063]) WITH UNORDERED PREFETCH)
            |    |              |--Hash Match(Aggregate, HASH:([Expr1054], [Expr1055]), RESIDUAL:([Expr1054] = [Expr1054] AND [Expr1055] = [Expr1055]) DEFINE:([Expr1052]=MAX([Expr1056]), [Expr1053]=ANY([SRID])))
            |    |              |    |--Nested Loops(Inner Join, OUTER REFERENCES:([Cell_Id], [Cell_Attributes]))
            |    |              |         |--Table-valued function
            |    |              |         |--Constant Scan(VALUES:(([Cell_Id],getdescendantlimitforspatialbinary([Cell_Id]),CASE WHEN (2)=[Cell_Attributes] THEN (2) ELSE (0) END),([Cell_Id],[Cell_Id],[Cell_Attributes]),(getancestorforspatialbinary([Cell_Id
            |    |              |--Clustered Index Seek(OBJECT:([tempdb].[sys].[extended_index_1557580587_384000].[tempindex]), SEEK:([tempdb].[sys].[extended_index_1557580587_384000].[Cell_Id] >= [Expr1054] AND [tempdb].[sys].[extended_index_1557580587_38
            |    |--Clustered Index Seek(OBJECT:([tempdb].[dbo].[#SomePoints] AS [S]), SEEK:([S].[Id]=[tempdb].[sys].[extended_index_1557580587_384000].[pk0]) ORDERED FORWARD)
            |--Filter(WHERE:(STARTUP EXPR(CASE WHEN [Expr1004]>(2) THEN CASE WHEN [Expr1005]=[Expr1006] THEN (1) ELSE (0) END ELSE [tempdb].[dbo].[#SomePoints].[Point] as [S].[Point].STDistance([@Point])<(7.000000000000000e+006) END=(1))))
                 |--Constant Scan

And here's for the last, showing no use of the spatial index:


  |--Compute Scalar(DEFINE:([Expr1003]=[@Point].STDistance([tempdb].[dbo].[#SomePoints].[Point] as [S].[Point])*(1.000000000000000e-003)))
       |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#SomePoints] AS [S]))

This query shows what I would like to do, a nearest neighbour calculation for many points:


--does not use spatial index (why not?)
SELECT
	S1.Id,
	ClosestDistance=min(S2.Point.STDistance (S1.Point))*1e-3 --distance in KM
FROM 
	#SomePoints S1	--WITH(INDEX(tempindex)) --throws up error with hint
		CROSS JOIN
			#SomePoints S2
GROUP BY S1.Id

It also does not use the indexes like the ORDER BY query shown earlier. Here's the execution plan:


  |--Compute Scalar(DEFINE:([Expr1005]=[Expr1004]*(1.000000000000000e-003)))
       |--Parallelism(Gather Streams)
            |--Stream Aggregate(GROUP BY:([S1].[Id]) DEFINE:([Expr1004]=MIN([partialagg1006])))
                 |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([S1].[Id]), ORDER BY:([S1].[Id] ASC))
                      |--Stream Aggregate(GROUP BY:([S1].[Id]) DEFINE:([partialagg1006]=MIN([tempdb].[dbo].[#SomePoints].[Point] as [S2].[Point].STDistance([tempdb].[dbo].[#SomePoints].[Point] as [S1].[Point]))))
                           |--Nested Loops(Inner Join)
                                |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#SomePoints] AS [S1]), ORDERED FORWARD)
                                |--Table Spool
                                     |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#SomePoints] AS [S2]))

Is this a bug in SQL Server 2008? Does anyone know if Microsoft plan to address this? I've looked into how spatial indexes in SQL Server work (see [url]http://technet.microsoft.com/en-us/library/bb964712.aspx[/url]) and it seems they should definitely help with this kind of query. Maybe there's some complexity here I don't understand. If anyone can explain this to me I'd be extremely grateful.

Thanks

michael.appleton
Posting Yak Master

160 Posts

Posted - 2009-10-15 : 05:41:58

Maybe there's a better way to write this query. Just explain my thinking, if an index helps when finding if p1.Point.STDistance(p2.point)<[some number] then why doesn't it help when finding the smallest p1.Point.STDistance(p2.point) with a group of points?

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2009-10-15 : 12:26:20

Perhaps I was not clear or your understanding of what indexes do is not correct or my understanding od spatial indexes in not correct. Lets say we have a table User that has 3 columns: ID, UserName and RegistrationDate. Also, lerts assume there is an index on RegistrationDate.

When you do a select like:

SELECT *
FROM User
WHERE RegistrationDate > GETDATE() - 90

Sql will probably use an index because it can use the precdicate and make the query more effiecnet.

If you so a query like:

SELECT ID, DATEADD(DAY, 30, RegistratioonDate) AS MyDate
FROM User

Sql will most likly table scan because there is no predicate and it'll return all the rows, so no amount of indexing is going to help.

I'm a little loopy today, so I hoping that make sense.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources