Is this just down right bad design? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Is this just down right bad design?

Author

Topic

sql777
Constraint Violating Yak Guru

314 Posts

Posted - 2007-04-23 : 22:57:35

My table has two fields:

ID IPAddress
1 255.232.12.11
2 23.212.1.221
3 55.92.241.9

My database will have over 1 million rows, I have put an index on the IPAddress column.

My queries will look like this:

select * from Table1 where ipaddress like '255.232.%'

select * from Table1 where ipaddress like '10.%'

select * from Table1 where ipaddress like '55.92.12.%'

Comments on potential performance issues?

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2007-04-23 : 23:13:41

Should be ok, but obviously performance will degrade the further to the left the % goes.
Depending on what you are doing there may be an argument for splitting out the 4 parts into separate columns with an index on each. Best to try it & see. It's hardly a difficult test.

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-04-23 : 23:49:16

Make sure you don't enter the IP address or the search string with leading zeros if you want to search that way. Or else always enter them with leading zeros.

As an alternative, you could use 4 tinyint columns. It would use only 4 bytes to store the data, instead of 7 to 15 bytes. If you make an index that includes all four columms, searches should be fast:

where ip1 = 255 and ip2 = 232

where ip1 = 10

where ip1 = 55 and ip2 = 92 and ip3 = 12

The tinyint approach would also make it easier to do network mask searches, if that is important to you.

CODO ERGO SUM

jshepler
Yak Posting Veteran

60 Posts

Posted - 2007-04-24 : 00:05:12

I don't have any evidence to back this up, but it just seems that LIKE will not use indexes because it's an operation that has to be performed on every row, not a value that can be searched/scanned/indexed.

Kinda like when restricting a datetime column to a month (ie. WHERE month(someDateTimeColumn) < month(getdate())) would ignore any indexes on that datetime column.

Could you work your app/proc to use IPs in decimal form (int) instead of octet form? If so, you could do straight math operations/comparisons which would be much much faster than LIKE and would use indexes. If you are interested in going this route, I have a couple ip conversion functions I wrote I could paste here for ya for going from dotted (octet) to decimal and from decimal to dotted. Matches how .net stores IPs if that's any use to ya.

If not, then LoztInSpace's suggestion of splitting the IP into 4 columns (as tinyint, not varchar: 1 byte for tinyint vs. 3 bytes for the varchar saves you 66% space, and numerical comparisons are faster than string comparisons) would be the way to go.

Either way, you don't need the seperate IP table. I'm assuming your ID column in an int - 4 bytes. The same 4 bytes if you were to use decimal form or 4 tinyint columns. Not having that JOIN will help performance.

/jeff

jshepler
Yak Posting Veteran

60 Posts

Posted - 2007-04-24 : 00:28:00

quote:
The tinyint approach would also make it easier to do network mask searches, if that is important to you.

I don't believe this is true. A mask of 255.255.255.250 does not mean any ip from 0.0.0.0 up through 255.255.255.250.

/jeff

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-24 : 00:57:06

-- mask 255.255.255.250

SELECT * FROM Table1
WHERE aclass = @aclass
and bclass = @bclass
and cclass = @cclass
and dclass >= 250

Peter Larsson
Helsingborg, Sweden

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-24 : 08:51:03

you should absolutely use 4 tinyint columns. If the logical model maps DIRECTLY to the physical database model perfectly, then take advantage of that rare and great situation and model your data exactly.

An IP address is 4 bytes. 1 byte = 1 tiny int. Therefore, an IP Address works beautifully as 4 TinyInt columns, especially when you consider that you will be searching the database as indicated.

Definitely do NOT use any type of non-numeric datatype to store these values, with or without leading zeros and such.

- Jeff
http://weblogs.sqlteam.com/JeffS

jshepler
Yak Posting Veteran

60 Posts

Posted - 2007-04-24 : 08:58:30

255.255.255.250 was supposed to be 255.255.255.240 (fat-fingered, .250 is not a valid mask).

Even still, your example is wrong. You would still need to know that last octet (what you call dclass) passed and it does not imply the range in the way you suggest.

For example, given 192.168.0.1 with the mask 255.255.255.240, that gives a range of 192.168.0.0 - 192.168.0.15. An ip of 192.168.0.26 with the same mask would give a range of 192.168.0.16 - 192.168.0.31.

/jeff

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-04-24 : 09:34:10

quote:
Originally posted by jshepler

quote:
The tinyint approach would also make it easier to do network mask searches, if that is important to you.

I don't believe this is true. A mask of 255.255.255.250 does not mean any ip from 0.0.0.0 up through 255.255.255.250.

/jeff

It's true that a mask of 255.255.255.250 does not mean any IP from 0.0.0.0 up through 255.255.255.250. It's also true that I never said it was.

CODO ERGO SUM

jshepler
Yak Posting Veteran

60 Posts

Posted - 2007-04-24 : 22:48:55

I'm sorry, you're right. You never said it was. I assumed the implication.

I still believe it is not easier to do subnet searches with the IP in octet form. As a single 4-byte integer, the math is pretty easy without having to figure out how each octet is affected. Given a common IP and a common network mask:


192.168.0.100  = 11000000.10101000.00000000.01100100 = 3232235620
255.255.255.0  = 11111111.11111111.11111111.00000000 = 4294967040

Calculate the range by finding the lowest (network address) and highest (broadcast address):

Network address = passed IP & subnet mask


192.168.0.100  = 11000000.10101000.00000000.01100100 = 3232235620
&
255.255.255.0  = 11111111.11111111.11111111.00000000 = 4294967040
=
192.168.0.0    = 11000000.10101000.00000000.00000000 = 3232235520

Broadcast address = passed IP | ~(subnet mask)


192.168.0.100  = 11000000.10101000.00000000.01100100 = 3232235620
|
0.0.0.255      = 00000000.00000000.00000000.11111111 = 255
=
192.168.0.255  = 11000000.10101000.00000000.11111111 = 3232235775

Now, is it easier to query against 4 columns of 192.168.0.0 to 192.168.0.255 or BETWEEN 3232235520 AND 3232235775?

/jeff

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-04-24 : 23:22:20

quote:
Originally posted by jshepler

I'm sorry, you're right. You never said it was. I assumed the implication.

I still believe it is not easier to do subnet searches with the IP in octet form. As a single 4-byte integer, the math is pretty easy without having to figure out how each octet is affected. Given a common IP and a common network mask:

192.168.0.100 = 11000000.10101000.00000000.01100100 = 3232235620
255.255.255.0 = 11111111.11111111.11111111.00000000 = 4294967040

Calculate the range by finding the lowest (network address) and highest (broadcast address):

Network address = passed IP & subnet mask

192.168.0.100 = 11000000.10101000.00000000.01100100 = 3232235620
&
255.255.255.0 = 11111111.11111111.11111111.00000000 = 4294967040
=
192.168.0.0 = 11000000.10101000.00000000.00000000 = 3232235520

Broadcast address = passed IP | ~(subnet mask)

192.168.0.100 = 11000000.10101000.00000000.01100100 = 3232235620
|
0.0.0.255 = 00000000.00000000.00000000.11111111 = 255
=
192.168.0.255 = 11000000.10101000.00000000.11111111 = 3232235775

Now, is it easier to query against 4 columns of 192.168.0.0 to 192.168.0.255 or BETWEEN 3232235520 AND 3232235775?

/jeff

It is obviously easier to query against the 4 tinyint columns, since 3232235520 and 3232235775 are not valid values for the SQL Server integer data type.


declare @ip integer
set @ip = 3232235520

Results:

Server: Msg 8115, Level 16, State 2, Line 3
Arithmetic overflow error converting expression to data type int.

CODO ERGO SUM

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2007-04-24 : 23:52:08

the problem is there's no unsigned int type in sql server. if there were, what jshelper is describing could work. unless you want to waste the space on a bigint column.

In my mind, the other drawback of using a single integer column is transparency. how clear is it what the ip address 3232235520 is? 4 tinyints would be much clearer.

MVJ is right: just use 4 byte columns.

www.elsasoft.org

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-04-25 : 10:07:21

You could store the IP address in a binary(4) datatype, and do your network mask query like this:

where IP between 0xC0A80064 and 0xC0A800FF

Any true network geek will be able to translate 192 to 0xC0, 168 to 0xA8, 0 to 0x00, 100 to 0x64, and 255 to 0xFF and back in their head, so readability shouldn't be an issue.

CODO ERGO SUM

jshepler
Yak Posting Veteran

60 Posts

Posted - 2007-04-25 : 12:22:05

The 4 extra bytes of bigint is nothing. At 1 million rows, that's less than 4meg. I was going to say space is cheap; but 4meg is so insignificant, it's not even a concern.

Using binary(4) sounded like a great idea to me. However, trying to do bit operations on 2 binary values...


declare @ip binary(4), @sm binary(4)
set @ip = 0xC0A80064 -- 192.168.0.100
set @sm = 0xFFFFFF00 -- 255.255.255.0

select @ip & @sm

... resulted in "The data types binary and binary are incompatible in the boolean AND operator." - which seems odd to me as it's not the boolean AND operator being used. BOL states "In a bitwise operation, only one expression can be of either binary or varbinary data type." Well that sucks.

jezemine, storing the IP as single number is only for internal use, you would never display it that way. The UI code could do the transform easily enough, or have SQL do it:


CREATE FUNCTION ip_Decimal2Dotted (@decimal bigint, @isReverseByteOrder bit = 1)
RETURNS varchar(15)
AS
BEGIN

-- .net stores IPs as an unsigned int in reverse byte-order

DECLARE @dotted varchar(15)

IF @isReverseByteOrder = 0
    SET @dotted = convert(varchar, @decimal / 16777216)
                + '.' + convert(varchar, (@decimal & 16777215) / 65536)
                + '.' + convert(varchar, (@decimal & 65535) / 256)
                + '.' + convert(varchar, @decimal & 255)
ELSE
    SET @dotted = convert(varchar, @decimal & 255)
                + '.' + convert(varchar, (@decimal & 65535) / 256)
                + '.' + convert(varchar, (@decimal & 16777215) / 65536)
                + '.' + convert(varchar, @decimal / 16777216)

RETURN @dotted
END

it's fast enough to not have a much impact on performance.

I woke up in the middle of the night with an idea on how to do netmask queries against 4 columns (yes, I dream this stuff, I'm lame)


WHERE   ip1 BETWEEN (@ip1 & @sm1) AND (@ip1 | ~@sm1)
        AND ip2 BETWEEN (@ip2 & @sm2) AND (@ip2 | ~@sm2)
        AND ip3 BETWEEN (@ip3 & @sm3) AND (@ip3 | ~@sm3)
        AND ip4 BETWEEN (@ip4 & @sm4) AND (@ip4 | ~@sm4)

Admittedly, this isn't as bad as I was expecting. It's untested, but looks like it should work just fine. I don't know how much slower it would perform vs. a single bigint (12 bit ops vs. 3, 4 BETWEEN comparisons vs. 1), but it should be fast enough to not be an issue.

I was envisioning something much uglier. For example, given the range 192.168.0.128 - 192.168.2.128, you couldn't restrict the 4th octet to just "<= 128" because it depends on what the 3rd octet is. If the 3rd was 0, the 4th would be >= 128; if the 3rd was 1, then the 4th would be between 0 and 255; and if the 3rd was 2, the 4th would be <= 128. I envisioned some ugly query using CASEs to account for how each octet was affected.

So, sql777, you can either use 4 tinyint columns, or 1 bigint column and your searches will be fast and easy. If you decide to go the bigint route, here's a quick function to convet an IP from dotted to decimal


CREATE FUNCTION ip_Dotted2Decimal (@dotted varchar(15), @isReverseByteOrder bit = 1)
RETURNS bigint
AS
BEGIN

-- .net stores IPs as an unsigned int in reverse byte-order

DECLARE @decimal bigint
        , @oct1 bigint
        , @oct2 bigint
        , @oct3 bigint
        , @oct4 bigint
        , @lastIndex int
        , @nextIndex int

SET @nextIndex = charindex('.', @dotted)
SET @oct1 = convert(bigint, substring(@dotted, 0, @nextIndex))

SET @lastIndex = @nextIndex
SET @nextIndex = charindex('.', @dotted, @lastIndex + 1)
SET @oct2 = convert(bigint, substring(@dotted, @lastIndex + 1, @nextIndex - @lastIndex - 1))

SET @lastIndex = @nextIndex
SET @nextIndex = charindex('.', @dotted, @lastIndex + 1)
SET @oct3 = convert(bigint, substring(@dotted, @lastIndex + 1, @nextIndex - @lastIndex - 1))

SET @oct4 = convert(bigint, substring(@dotted, @nextIndex + 1, len(@dotted) - @nextIndex))


IF @isReverseByteOrder = 0
    SET @decimal = (@oct1 * 16777216) + (@oct2 * 65536) + (@oct3 * 256) + @oct4
ELSE
    SET @decimal = (@oct4 * 16777216) + (@oct3 * 65536) + (@oct2 * 256) + @oct1

RETURN @decimal
END

and netmask query would use


WHERE ip BETWEEN (@ip & @sm) AND (@ip | ~@sm)

/jeff

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2007-04-25 : 15:52:43

For on the edge performance reasons, and maybe specialized network analysis... ummh ok.

I often find that ip's are entered as strings and extracted as strings varchar(15)

If the ip is indexed, then a search like you proposed is very fast (sample results on real data appx 7.6 mill rows):

select count(*) from tbl where ip like '82.%'
takes 499 ms (count > 134000)

select count(*) from tbl where ip like '69.%'
takes 28 ms (count appx 800)

I'll put a vote in for the "simple" approach and just use varchar(15) unless you have very specialized needs.

Of course a search:
where ip like '%.56.%' would be killing... but as unless you are analyzing the subnets.. string approach is fine.

rockmoose

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2007-04-25 : 16:28:03

quote:
Originally posted by jshepler

jezemine, storing the IP as single number is only for internal use, you would never display it that way.

I didn't mean that some UI would display ips as ints.

All I meant was, it would be something of a PITA for me during development to have to deal with them as ints. "select top 10 * from ipaddress" wouldn't return anything that I would be able to quickly recognize as an ip address. that's all.

I know you could always get it in a readable form with a function, but then you'd have to call the function all the time: PITA.

EDIT: fixed typo

www.elsasoft.org

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2007-04-25 : 19:37:18

Or use a calculated column or a view.

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-25 : 20:32:39

quote:
Originally posted by rockmoose

For on the edge performance reasons, and maybe specialized network analysis... ummh ok.

I often find that ip's are entered as strings and extracted as strings varchar(15)

If the ip is indexed, then a search like you proposed is very fast (sample results on real data appx 7.6 mill rows):

select count(*) from tbl where ip like '82.%'
takes 499 ms (count > 134000)

select count(*) from tbl where ip like '69.%'
takes 28 ms (count appx 800)

I'll put a vote in for the "simple" approach and just use varchar(15) unless you have very specialized needs.

Of course a search:
where ip like '%.56.%' would be killing... but as unless you are analyzing the subnets.. string approach is fine.

rockmoose

unless someone enters "192.168.01.01" or, worse, "abc.def.ghi.xxx" .... don't forget data consistency and integrity.

What's the advantage of any idea other than 4 tiny ints again? Am I missing something?

Even SQL server can very easily "format" 4 tiny ints as "192.168.1.1" in a view or computed column if it is too "difficult" to do elsewhere ... And now we have all the benefits of indexes, consistency and data integrity. What's the downside?

- Jeff
http://weblogs.sqlteam.com/JeffS

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2007-04-25 : 21:16:25

Another option is to just change DBMS and use PostgreSQL :-)

http://www.postgresql.org/docs/7.3/interactive/datatype-net-types.html

DavidM

Production is just another testing cycle

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2007-04-25 : 23:22:37

quote:
Originally posted by byrmol

Another option is to just change DBMS and use PostgreSQL :-)

heretic!

www.elsasoft.org

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2007-04-25 : 23:31:08

It is the one area where SQL Server has disappointed me after each upgrade.

The lack of new in-built data types.

DavidM

Production is just another testing cycle

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources