hi folks! been a while since I asked or answered a question! glad to be back! :)I need advice on loading a largish amount of spatial data in to SQL 2008 R2.I have 13605 individual tab-delimited text files with data like this:43.00000 -115.00000 108243.00000 -114.99917 106243.00000 -114.99833 105443.00000 -114.99750 106543.00000 -114.99667 108343.00000 -114.99584 109243.00000 -114.99500 109043.00000 -114.99417 108943.00000 -114.99334 108543.00000 -114.99251 108143.00000 -114.99167 106743.00000 -114.99084 106743.00000 -114.99001 107343.00000 -114.98918 107743.00000 -114.98834 107743.00000 -114.98751 1056
each file has 1442401 rows in it. the columns are: Latitude, Longitude, ElevationMeters. Basically this is elevation data for the whole planet, at about 90m resolution in x-y.So the total number of rows is 19.6 billion. btw if you want this data it's freely available from USGS. It came from a satellite radar survey called SRTM.http://srtm.usgs.gov/index.phphttp://www2.jpl.nasa.gov/srtmI want to load this all into a table using a geography column so I can index it and do spatial queries against it.My question is about how to load this data most efficiently. My plan was to do this:1. create two tables:-- for importing the text filescreate table Elevation_stage(Latitude float not null, Longitude float not null, ElevationMeters smallint not null)-- for storing the final resultcreate table Elevation(Point geography not null)
2. using a batch script, import files one at a time using bulk insert, then import from stage to the "real" table using insert/select. the body of the loop over files would be something like this:truncate table Elevation_stagebulk insert Elevation_stage from '%file%' with (BATCHSIZE=100000,TABLOCK)insert Elevationselect geography::STPointFromText('POINT('+convert(varchar,Longitude)+' '+convert(varchar,Latitude)+' '+convert(varchar,ElevationMeters)+')', 4326)from Elevation_stage
The problem is the insert/select takes a relatively long time, about 50 sec per file. It's 2x faster if I use geography::Point() instead of geography::STPointFromText(), but then I can't bake in the elevation value into the point because the Point() constructor does not take Z as a param. It only takes X,Y.It's extremely stupid that Point ctor doesn't take Z as an arg but that's the way it is. STPointFromText() takes it, Point() does not. Lame.here's Point, only takes lat/lon: http://technet.microsoft.com/en-us/library/bb933811.aspx (side note: extra lame also that Point takes args in lat,lon order, which is opposite the order STPointFromText wants...)So if I go the Point() route, I'd have to add an ElevationMeters column to my Elevation table, like this:create table Elevation(Point geography not null, ElevationMeters smallint not null)
then I could do the insert/select like this which is 2x faster.insert Elevationselect geography::Point(Latitude, Longitude, 4326), ElevationMetersfrom Elevation_stage
If you are loading 10^10 rows, factors of 2 matter. :)Right now I am leaning towards the Point() route. The only real downside is the table will be about 20% bigger on disk when it's fully loaded, due to having the extra ElevationMeters column.Any advice much appreciated! Is there a better way to load this data?Jesse
elsasoft.org