Author |
Topic |
sql777
Constraint Violating Yak Guru
314 Posts |
Posted - 2007-04-23 : 22:57:35
|
My table has two fields:ID IPAddress1 255.232.12.112 23.212.1.2213 55.92.241.9My database will have over 1 million rows, I have put an index on the IPAddress column.My queries will look like this:select * from Table1 where ipaddress like '255.232.%'select * from Table1 where ipaddress like '10.%'select * from Table1 where ipaddress like '55.92.12.%'Comments on potential performance issues? |
|
LoztInSpace
Aged Yak Warrior
940 Posts |
Posted - 2007-04-23 : 23:13:41
|
Should be ok, but obviously performance will degrade the further to the left the % goes.Depending on what you are doing there may be an argument for splitting out the 4 parts into separate columns with an index on each. Best to try it & see. It's hardly a difficult test. |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-04-23 : 23:49:16
|
Make sure you don't enter the IP address or the search string with leading zeros if you want to search that way. Or else always enter them with leading zeros.As an alternative, you could use 4 tinyint columns. It would use only 4 bytes to store the data, instead of 7 to 15 bytes. If you make an index that includes all four columms, searches should be fast:where ip1 = 255 and ip2 = 232where ip1 = 10where ip1 = 55 and ip2 = 92 and ip3 = 12The tinyint approach would also make it easier to do network mask searches, if that is important to you.CODO ERGO SUM |
|
|
jshepler
Yak Posting Veteran
60 Posts |
Posted - 2007-04-24 : 00:05:12
|
I don't have any evidence to back this up, but it just seems that LIKE will not use indexes because it's an operation that has to be performed on every row, not a value that can be searched/scanned/indexed.Kinda like when restricting a datetime column to a month (ie. WHERE month(someDateTimeColumn) < month(getdate())) would ignore any indexes on that datetime column.Could you work your app/proc to use IPs in decimal form (int) instead of octet form? If so, you could do straight math operations/comparisons which would be much much faster than LIKE and would use indexes. If you are interested in going this route, I have a couple ip conversion functions I wrote I could paste here for ya for going from dotted (octet) to decimal and from decimal to dotted. Matches how .net stores IPs if that's any use to ya.If not, then LoztInSpace's suggestion of splitting the IP into 4 columns (as tinyint, not varchar: 1 byte for tinyint vs. 3 bytes for the varchar saves you 66% space, and numerical comparisons are faster than string comparisons) would be the way to go.Either way, you don't need the seperate IP table. I'm assuming your ID column in an int - 4 bytes. The same 4 bytes if you were to use decimal form or 4 tinyint columns. Not having that JOIN will help performance./jeff |
|
|
jshepler
Yak Posting Veteran
60 Posts |
Posted - 2007-04-24 : 00:28:00
|
quote: The tinyint approach would also make it easier to do network mask searches, if that is important to you.
I don't believe this is true. A mask of 255.255.255.250 does not mean any ip from 0.0.0.0 up through 255.255.255.250./jeff |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-04-24 : 00:57:06
|
-- mask 255.255.255.250SELECT * FROM Table1WHERE aclass = @aclassand bclass = @bclassand cclass = @cclassand dclass >= 250Peter LarssonHelsingborg, Sweden |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-04-24 : 08:51:03
|
you should absolutely use 4 tinyint columns. If the logical model maps DIRECTLY to the physical database model perfectly, then take advantage of that rare and great situation and model your data exactly.An IP address is 4 bytes. 1 byte = 1 tiny int. Therefore, an IP Address works beautifully as 4 TinyInt columns, especially when you consider that you will be searching the database as indicated.Definitely do NOT use any type of non-numeric datatype to store these values, with or without leading zeros and such.- Jeffhttp://weblogs.sqlteam.com/JeffS |
|
|
jshepler
Yak Posting Veteran
60 Posts |
Posted - 2007-04-24 : 08:58:30
|
255.255.255.250 was supposed to be 255.255.255.240 (fat-fingered, .250 is not a valid mask).Even still, your example is wrong. You would still need to know that last octet (what you call dclass) passed and it does not imply the range in the way you suggest.For example, given 192.168.0.1 with the mask 255.255.255.240, that gives a range of 192.168.0.0 - 192.168.0.15. An ip of 192.168.0.26 with the same mask would give a range of 192.168.0.16 - 192.168.0.31./jeff |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-04-24 : 09:34:10
|
quote: Originally posted by jshepler
quote: The tinyint approach would also make it easier to do network mask searches, if that is important to you.
I don't believe this is true. A mask of 255.255.255.250 does not mean any ip from 0.0.0.0 up through 255.255.255.250./jeff
It's true that a mask of 255.255.255.250 does not mean any IP from 0.0.0.0 up through 255.255.255.250. It's also true that I never said it was. CODO ERGO SUM |
|
|
jshepler
Yak Posting Veteran
60 Posts |
Posted - 2007-04-24 : 22:48:55
|
I'm sorry, you're right. You never said it was. I assumed the implication.I still believe it is not easier to do subnet searches with the IP in octet form. As a single 4-byte integer, the math is pretty easy without having to figure out how each octet is affected. Given a common IP and a common network mask:192.168.0.100 = 11000000.10101000.00000000.01100100 = 3232235620255.255.255.0 = 11111111.11111111.11111111.00000000 = 4294967040 Calculate the range by finding the lowest (network address) and highest (broadcast address):Network address = passed IP & subnet mask192.168.0.100 = 11000000.10101000.00000000.01100100 = 3232235620&255.255.255.0 = 11111111.11111111.11111111.00000000 = 4294967040=192.168.0.0 = 11000000.10101000.00000000.00000000 = 3232235520 Broadcast address = passed IP | ~(subnet mask)192.168.0.100 = 11000000.10101000.00000000.01100100 = 3232235620|0.0.0.255 = 00000000.00000000.00000000.11111111 = 255=192.168.0.255 = 11000000.10101000.00000000.11111111 = 3232235775 Now, is it easier to query against 4 columns of 192.168.0.0 to 192.168.0.255 or BETWEEN 3232235520 AND 3232235775?/jeff |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-04-24 : 23:22:20
|
quote: Originally posted by jshepler I'm sorry, you're right. You never said it was. I assumed the implication.I still believe it is not easier to do subnet searches with the IP in octet form. As a single 4-byte integer, the math is pretty easy without having to figure out how each octet is affected. Given a common IP and a common network mask:192.168.0.100 = 11000000.10101000.00000000.01100100 = 3232235620255.255.255.0 = 11111111.11111111.11111111.00000000 = 4294967040 Calculate the range by finding the lowest (network address) and highest (broadcast address):Network address = passed IP & subnet mask192.168.0.100 = 11000000.10101000.00000000.01100100 = 3232235620&255.255.255.0 = 11111111.11111111.11111111.00000000 = 4294967040=192.168.0.0 = 11000000.10101000.00000000.00000000 = 3232235520 Broadcast address = passed IP | ~(subnet mask)192.168.0.100 = 11000000.10101000.00000000.01100100 = 3232235620|0.0.0.255 = 00000000.00000000.00000000.11111111 = 255=192.168.0.255 = 11000000.10101000.00000000.11111111 = 3232235775 Now, is it easier to query against 4 columns of 192.168.0.0 to 192.168.0.255 or BETWEEN 3232235520 AND 3232235775?/jeff
It is obviously easier to query against the 4 tinyint columns, since 3232235520 and 3232235775 are not valid values for the SQL Server integer data type.declare @ip integerset @ip = 3232235520Results:Server: Msg 8115, Level 16, State 2, Line 3Arithmetic overflow error converting expression to data type int. CODO ERGO SUM |
|
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2007-04-24 : 23:52:08
|
the problem is there's no unsigned int type in sql server. if there were, what jshelper is describing could work. unless you want to waste the space on a bigint column.In my mind, the other drawback of using a single integer column is transparency. how clear is it what the ip address 3232235520 is? 4 tinyints would be much clearer.MVJ is right: just use 4 byte columns. www.elsasoft.org |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-04-25 : 10:07:21
|
You could store the IP address in a binary(4) datatype, and do your network mask query like this:where IP between 0xC0A80064 and 0xC0A800FF Any true network geek will be able to translate 192 to 0xC0, 168 to 0xA8, 0 to 0x00, 100 to 0x64, and 255 to 0xFF and back in their head, so readability shouldn't be an issue. CODO ERGO SUM |
|
|
jshepler
Yak Posting Veteran
60 Posts |
Posted - 2007-04-25 : 12:22:05
|
The 4 extra bytes of bigint is nothing. At 1 million rows, that's less than 4meg. I was going to say space is cheap; but 4meg is so insignificant, it's not even a concern.Using binary(4) sounded like a great idea to me. However, trying to do bit operations on 2 binary values...declare @ip binary(4), @sm binary(4)set @ip = 0xC0A80064 -- 192.168.0.100set @sm = 0xFFFFFF00 -- 255.255.255.0select @ip & @sm ... resulted in "The data types binary and binary are incompatible in the boolean AND operator." - which seems odd to me as it's not the boolean AND operator being used. BOL states "In a bitwise operation, only one expression can be of either binary or varbinary data type." Well that sucks.jezemine, storing the IP as single number is only for internal use, you would never display it that way. The UI code could do the transform easily enough, or have SQL do it:CREATE FUNCTION ip_Decimal2Dotted (@decimal bigint, @isReverseByteOrder bit = 1)RETURNS varchar(15)ASBEGIN-- .net stores IPs as an unsigned int in reverse byte-orderDECLARE @dotted varchar(15)IF @isReverseByteOrder = 0 SET @dotted = convert(varchar, @decimal / 16777216) + '.' + convert(varchar, (@decimal & 16777215) / 65536) + '.' + convert(varchar, (@decimal & 65535) / 256) + '.' + convert(varchar, @decimal & 255)ELSE SET @dotted = convert(varchar, @decimal & 255) + '.' + convert(varchar, (@decimal & 65535) / 256) + '.' + convert(varchar, (@decimal & 16777215) / 65536) + '.' + convert(varchar, @decimal / 16777216)RETURN @dottedEND it's fast enough to not have a much impact on performance.I woke up in the middle of the night with an idea on how to do netmask queries against 4 columns (yes, I dream this stuff, I'm lame)WHERE ip1 BETWEEN (@ip1 & @sm1) AND (@ip1 | ~@sm1) AND ip2 BETWEEN (@ip2 & @sm2) AND (@ip2 | ~@sm2) AND ip3 BETWEEN (@ip3 & @sm3) AND (@ip3 | ~@sm3) AND ip4 BETWEEN (@ip4 & @sm4) AND (@ip4 | ~@sm4) Admittedly, this isn't as bad as I was expecting. It's untested, but looks like it should work just fine. I don't know how much slower it would perform vs. a single bigint (12 bit ops vs. 3, 4 BETWEEN comparisons vs. 1), but it should be fast enough to not be an issue.I was envisioning something much uglier. For example, given the range 192.168.0.128 - 192.168.2.128, you couldn't restrict the 4th octet to just "<= 128" because it depends on what the 3rd octet is. If the 3rd was 0, the 4th would be >= 128; if the 3rd was 1, then the 4th would be between 0 and 255; and if the 3rd was 2, the 4th would be <= 128. I envisioned some ugly query using CASEs to account for how each octet was affected.So, sql777, you can either use 4 tinyint columns, or 1 bigint column and your searches will be fast and easy. If you decide to go the bigint route, here's a quick function to convet an IP from dotted to decimalCREATE FUNCTION ip_Dotted2Decimal (@dotted varchar(15), @isReverseByteOrder bit = 1)RETURNS bigintASBEGIN-- .net stores IPs as an unsigned int in reverse byte-orderDECLARE @decimal bigint , @oct1 bigint , @oct2 bigint , @oct3 bigint , @oct4 bigint , @lastIndex int , @nextIndex intSET @nextIndex = charindex('.', @dotted)SET @oct1 = convert(bigint, substring(@dotted, 0, @nextIndex))SET @lastIndex = @nextIndexSET @nextIndex = charindex('.', @dotted, @lastIndex + 1)SET @oct2 = convert(bigint, substring(@dotted, @lastIndex + 1, @nextIndex - @lastIndex - 1))SET @lastIndex = @nextIndexSET @nextIndex = charindex('.', @dotted, @lastIndex + 1)SET @oct3 = convert(bigint, substring(@dotted, @lastIndex + 1, @nextIndex - @lastIndex - 1))SET @oct4 = convert(bigint, substring(@dotted, @nextIndex + 1, len(@dotted) - @nextIndex))IF @isReverseByteOrder = 0 SET @decimal = (@oct1 * 16777216) + (@oct2 * 65536) + (@oct3 * 256) + @oct4ELSE SET @decimal = (@oct4 * 16777216) + (@oct3 * 65536) + (@oct2 * 256) + @oct1RETURN @decimalEND and netmask query would useWHERE ip BETWEEN (@ip & @sm) AND (@ip | ~@sm) /jeff |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2007-04-25 : 15:52:43
|
For on the edge performance reasons, and maybe specialized network analysis... ummh ok.I often find that ip's are entered as strings and extracted as strings varchar(15)If the ip is indexed, then a search like you proposed is very fast (sample results on real data appx 7.6 mill rows):select count(*) from tbl where ip like '82.%'takes 499 ms (count > 134000)select count(*) from tbl where ip like '69.%'takes 28 ms (count appx 800)I'll put a vote in for the "simple" approach and just use varchar(15) unless you have very specialized needs.Of course a search:where ip like '%.56.%' would be killing... but as unless you are analyzing the subnets.. string approach is fine.rockmoose |
|
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2007-04-25 : 16:28:03
|
quote: Originally posted by jshepler jezemine, storing the IP as single number is only for internal use, you would never display it that way.
I didn't mean that some UI would display ips as ints. All I meant was, it would be something of a PITA for me during development to have to deal with them as ints. "select top 10 * from ipaddress" wouldn't return anything that I would be able to quickly recognize as an ip address. that's all. I know you could always get it in a readable form with a function, but then you'd have to call the function all the time: PITA. EDIT: fixed typo www.elsasoft.org |
|
|
LoztInSpace
Aged Yak Warrior
940 Posts |
Posted - 2007-04-25 : 19:37:18
|
Or use a calculated column or a view. |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-04-25 : 20:32:39
|
quote: Originally posted by rockmoose For on the edge performance reasons, and maybe specialized network analysis... ummh ok.I often find that ip's are entered as strings and extracted as strings varchar(15)If the ip is indexed, then a search like you proposed is very fast (sample results on real data appx 7.6 mill rows):select count(*) from tbl where ip like '82.%'takes 499 ms (count > 134000)select count(*) from tbl where ip like '69.%'takes 28 ms (count appx 800)I'll put a vote in for the "simple" approach and just use varchar(15) unless you have very specialized needs.Of course a search:where ip like '%.56.%' would be killing... but as unless you are analyzing the subnets.. string approach is fine.rockmoose
unless someone enters "192.168.01.01" or, worse, "abc.def.ghi.xxx" .... don't forget data consistency and integrity.What's the advantage of any idea other than 4 tiny ints again? Am I missing something? Even SQL server can very easily "format" 4 tiny ints as "192.168.1.1" in a view or computed column if it is too "difficult" to do elsewhere ... And now we have all the benefits of indexes, consistency and data integrity. What's the downside? - Jeffhttp://weblogs.sqlteam.com/JeffS |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2007-04-25 : 23:22:37
|
quote: Originally posted by byrmol Another option is to just change DBMS and use PostgreSQL :-)
heretic! www.elsasoft.org |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2007-04-25 : 23:31:08
|
It is the one area where SQL Server has disappointed me after each upgrade.The lack of new in-built data types.DavidMProduction is just another testing cycle |
|
|
Next Page
|