hi folks! been a while since I asked or answered a question! glad to be back! :)
I need advice on loading a largish amount of spatial data in to SQL 2008 R2.
I have 13605 individual tab-delimited text files with data like this:
43.00000 -115.00000 1082
43.00000 -114.99917 1062
43.00000 -114.99833 1054
43.00000 -114.99750 1065
43.00000 -114.99667 1083
43.00000 -114.99584 1092
43.00000 -114.99500 1090
43.00000 -114.99417 1089
43.00000 -114.99334 1085
43.00000 -114.99251 1081
43.00000 -114.99167 1067
43.00000 -114.99084 1067
43.00000 -114.99001 1073
43.00000 -114.98918 1077
43.00000 -114.98834 1077
43.00000 -114.98751 1056
each file has 1442401 rows in it. the columns are: Latitude, Longitude, ElevationMeters. Basically this is elevation data for the whole planet, at about 90m resolution in x-y.
So the total number of rows is 19.6 billion. btw if you want this data it's freely available from USGS. It came from a satellite radar survey called SRTM.
http://srtm.usgs.gov/index.php
http://www2.jpl.nasa.gov/srtm
I want to load this all into a table using a geography column so I can index it and do spatial queries against it.
My question is about how to load this data most efficiently. My plan was to do this:
1. create two tables:
-- for importing the text files
create table Elevation_stage(Latitude float not null, Longitude float not null, ElevationMeters smallint not null)
-- for storing the final result
create table Elevation(Point geography not null)
2. using a batch script, import files one at a time using bulk insert, then import from stage to the "real" table using insert/select. the body of the loop over files would be something like this:
truncate table Elevation_stage
bulk insert Elevation_stage from '%file%' with (BATCHSIZE=100000,TABLOCK)
insert Elevation
select geography::STPointFromText('POINT('+convert(varchar,Longitude)+' '+convert(varchar,Latitude)+' '+convert(varchar,ElevationMeters)+')', 4326)
from Elevation_stage
The problem is the insert/select takes a relatively long time, about 50 sec per file. It's 2x faster if I use geography::Point() instead of geography::STPointFromText(), but then I can't bake in the elevation value into the point because the Point() constructor does not take Z as a param. It only takes X,Y.
It's extremely stupid that Point ctor doesn't take Z as an arg but that's the way it is. STPointFromText() takes it, Point() does not. Lame.
here's Point, only takes lat/lon: http://technet.microsoft.com/en-us/library/bb933811.aspx (side note: extra lame also that Point takes args in lat,lon order, which is opposite the order STPointFromText wants...)
So if I go the Point() route, I'd have to add an ElevationMeters column to my Elevation table, like this:
create table Elevation(Point geography not null, ElevationMeters smallint not null)
then I could do the insert/select like this which is 2x faster.
insert Elevation
select geography::Point(Latitude, Longitude, 4326), ElevationMeters
from Elevation_stage
If you are loading 10^10 rows, factors of 2 matter. :)
Right now I am leaning towards the Point() route. The only real downside is the table will be about 20% bigger on disk when it's fully loaded, due to having the extra ElevationMeters column.
Any advice much appreciated! Is there a better way to load this data?
Jesse
elsasoft.org