Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 bulk loading large amount of spatial data

Author  Topic 

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2013-03-17 : 00:31:02
hi folks! been a while since I asked or answered a question! glad to be back! :)

I need advice on loading a largish amount of spatial data in to SQL 2008 R2.

I have 13605 individual tab-delimited text files with data like this:

43.00000 -115.00000 1082
43.00000 -114.99917 1062
43.00000 -114.99833 1054
43.00000 -114.99750 1065
43.00000 -114.99667 1083
43.00000 -114.99584 1092
43.00000 -114.99500 1090
43.00000 -114.99417 1089
43.00000 -114.99334 1085
43.00000 -114.99251 1081
43.00000 -114.99167 1067
43.00000 -114.99084 1067
43.00000 -114.99001 1073
43.00000 -114.98918 1077
43.00000 -114.98834 1077
43.00000 -114.98751 1056


each file has 1442401 rows in it. the columns are: Latitude, Longitude, ElevationMeters. Basically this is elevation data for the whole planet, at about 90m resolution in x-y.

So the total number of rows is 19.6 billion. btw if you want this data it's freely available from USGS. It came from a satellite radar survey called SRTM.
http://srtm.usgs.gov/index.php
http://www2.jpl.nasa.gov/srtm

I want to load this all into a table using a geography column so I can index it and do spatial queries against it.

My question is about how to load this data most efficiently. My plan was to do this:

1. create two tables:

-- for importing the text files
create table Elevation_stage(Latitude float not null, Longitude float not null, ElevationMeters smallint not null)

-- for storing the final result
create table Elevation(Point geography not null)


2. using a batch script, import files one at a time using bulk insert, then import from stage to the "real" table using insert/select. the body of the loop over files would be something like this:

truncate table Elevation_stage

bulk insert Elevation_stage from '%file%' with (BATCHSIZE=100000,TABLOCK)

insert Elevation
select geography::STPointFromText('POINT('+convert(varchar,Longitude)+' '+convert(varchar,Latitude)+' '+convert(varchar,ElevationMeters)+')', 4326)
from Elevation_stage


The problem is the insert/select takes a relatively long time, about 50 sec per file. It's 2x faster if I use geography::Point() instead of geography::STPointFromText(), but then I can't bake in the elevation value into the point because the Point() constructor does not take Z as a param. It only takes X,Y.

It's extremely stupid that Point ctor doesn't take Z as an arg but that's the way it is. STPointFromText() takes it, Point() does not. Lame.

here's Point, only takes lat/lon: http://technet.microsoft.com/en-us/library/bb933811.aspx (side note: extra lame also that Point takes args in lat,lon order, which is opposite the order STPointFromText wants...)

So if I go the Point() route, I'd have to add an ElevationMeters column to my Elevation table, like this:

create table Elevation(Point geography not null, ElevationMeters smallint not null)


then I could do the insert/select like this which is 2x faster.

insert Elevation
select geography::Point(Latitude, Longitude, 4326), ElevationMeters
from Elevation_stage


If you are loading 10^10 rows, factors of 2 matter. :)

Right now I am leaning towards the Point() route. The only real downside is the table will be about 20% bigger on disk when it's fully loaded, due to having the extra ElevationMeters column.

Any advice much appreciated! Is there a better way to load this data?

Jesse


elsasoft.org

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2013-03-17 : 01:54:18
I created a bug on connect for the deficiency of the Point() ctor. vote for it if you care :)

https://connect.microsoft.com/SQLServer/feedback/details/781512/geography-point-should-accept-elevation-and-measure



elsasoft.org
Go to Top of Page
   

- Advertisement -