Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Some query optimalisation

Author  Topic 

bjornh
Yak Posting Veteran

87 Posts

Posted - 2002-12-02 : 09:58:35
Hi,

I've build this query to convert a table with normal IP-addresses (213.122.0.12) to an long/int value. (-1107296256) so the total table size will be smaller.

It looks like this:

declare @ip varchar(20)
declare @long bigint
declare @blok1 bigint
declare @blok2 bigint
declare @blok3 bigint
declare @blok4 bigint
declare @temp varchar(20)
declare @counter int

set @counter = 1

While @counter <= (select count(*) from ripencc_20021101)
BEGIN
set @ip = (Select col004 from ripencc_20021101 where id = @counter)

set @blok1 = str(left(@ip, PATINDEX('%.%', @ip)))
set @temp = right(@ip, len(@ip) - PATINDEX('%.%', @ip))

set @blok2 = str(left(@temp, PATINDEX('%.%', @temp)))
set @temp = right(@temp, len(@temp) - PATINDEX('%.%', @temp))

set @blok3 = str(left(@temp, PATINDEX('%.%', @temp)))
set @blok4 = right(@temp, len(@temp) - PATINDEX('%.%', @temp))


set @long = (@blok1 * 16777216)
set @long = @long + (@blok2 * 65536)
set @long = @long + (@blok3 * 256)
set @long = @long + @blok4
set @long = @long - 2147483648

update ripencc_20021101 set col004 = @long where id = @counter
set @counter = @counter + 1
end


Now I know this takes a lot resources, but I'am not that formiliar with SQL so I would like to know how I can eliminate that loop.

Thanks.

Bjorn

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-12-02 : 10:13:13
Well, you're gonna have a ton of problems making this fit into an int (if it can be done at all, I don't think it can) and the overhead of converting to and from will kill your performance. Not to mention that you'll have no ability to query for a particular subnet or IP range if you convert to an int.

If you really can't store it as a varchar (c'mon, storage space isn't THAT expensive), you are far better off splitting the IP into 4 tinyint columns:

CREATE TABLE IPAddresses (
octet1 tinyint NOT NULL,
octet2 tinyint NOT NULL,
octet3 tinyint NOT NULL,
octet4 tinyint NOT NULL )


Then you can do this:

INSERT INTO IPAddresses (octet1, octet2, octect3, octet4)
SELECT Convert(tinyint, PARSENAME(col004,4)),
Convert(tinyint, PARSENAME(col004,3)),
Convert(tinyint, PARSENAME(col004,2)),
Convert(tinyint, PARSENAME(col004,1))
from ripencc_20021101


That will convert all of the IP's into the 4 tinyint columns in one shot. You can reconstruct them using the following:

SELECT CAST(octet1 as varchar) + '.' +
CAST(octet2 as varchar) + '.' +
CAST(octet3 as varchar) + '.' +
CAST(octet4 as varchar) AS IPAddress
FROM IPAddresses


Edited by - robvolk on 12/02/2002 10:17:31
Go to Top of Page

bjornh
Yak Posting Veteran

87 Posts

Posted - 2002-12-02 : 10:42:32
ow, I'll keep that in mind, but than comes my second problem.

I need to uses stats files from arin.net (ftp://ftp.arin.net/pub/stats/) and other registrars to put into my table, but these are in this format:

arin|US|ipv4|24.30.224.0|8192|0|allocated
arin|US|ipv4|24.31.0.0|8192|0|allocated
arin|US|ipv4|24.31.32.0|57344|0|allocated

as you can see, It brings the start address, but it does not brings the end address, it only brings the number of ip's.

so, I'll still need to convert the ip address into long, and as you sugested, I maybe can convert it back into varchar.

maybe this cleares it up.

ps. And it can fit into an int, if tested it with ip address 0.0.0.0 (minimum) and 255.255.255.255 (maximum) :x



Edited by - bjornH on 12/02/2002 10:46:02
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-12-02 : 10:47:57
bigint not int.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

bjornh
Yak Posting Veteran

87 Posts

Posted - 2002-12-02 : 10:52:12
nope, int

255.255.255.255 brings back 2147483647 and what is the maximum of a int?

Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is integer.




Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-12-02 : 11:06:56
Oops missed the subtraction at the end
You can store the bit map in an int - 4 bytes * 255 per byte (i.e. tinyint) which I guess is what this is doing.


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

bjornh
Yak Posting Veteran

87 Posts

Posted - 2002-12-02 : 11:11:22
jup, but i'll still could need some optimalisation... :D

Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2002-12-02 : 11:20:50
Your biggest problem, despite how you choose to store the data, is looping through the table the way you are to calc the int.

Encapsulate your logic into a user-defined function. Call it, for example, ConvertToInt(), and then you can run:

UPDATE ripencc_20021101 set col004 = ConverToInt(columToConvert);

This will update the entire table (or you can use a WHERE to choose only certain rows) at once, without looping through the table the way you are currently doing it.

See books on-line for info on user-defined functions.


- Jeff
Go to Top of Page

bjornh
Yak Posting Veteran

87 Posts

Posted - 2002-12-02 : 11:26:09
aah yes, i thought of that, but i couldn't find much about it, so i thought that it didn't excisted in SQL, but Apparently it does. I'll continue my quest for user-defined functions... thank you.

Bjorn

Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-12-02 : 11:32:12
well you can do what Rob says probably

update ripencc_20021101
set col004 =
Convert(bigint, PARSENAME(col004,4)) * 16777216
+ Convert(bigint, PARSENAME(col004,3)) * 65536
+ Convert(bigint, PARSENAME(col004,2)) * 256
+ Convert(bigint, PARSENAME(col004,1))
- 2147483648
from ripencc_20021101


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2002-12-02 : 11:35:21
It only exists in SQL 2000 -- if you are running 7 or below, it doesn't have the user-defined functions.

By the way, I also think 4 tiny int fields would be maybe a nicer way to store this data, so you could parse it easily. And it requires the same amount of space - 4 bytes.

By putting it all into 1 integer, as you are trying to do, any querying you do of the data will have to run a reverse version of your ConvertToInt() function, which won't be able to use indexes or anything and will be very slow.

Good luck!


- Jeff

Edited by - jsmith8858 on 12/02/2002 11:36:44
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2002-12-02 : 11:39:07
I didn't notice the use of PARSENAME -- very clever! a nice side effect of IP address and database objects both using the "." as a delimiter!

That would be the way to go. very nice.

- Jeff
Go to Top of Page

bjornh
Yak Posting Veteran

87 Posts

Posted - 2002-12-02 : 11:54:59
I've used the user defined function and it works, very well i can say.

btw, I think i still can use my Int ip-addresses because i use ASP to convert them back, (and when the normal user is on the website, it also uses ASP to convert into Int, i only created this script for the mass-input from Arin.net) so this doesn't affect indexes and preformance, or does it?

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-12-02 : 12:02:59
Sure it will affect performance. Every IP address that is accessed in the query needs to be converted from an int back to an IP. The overhead can add up if you are returning thousands of rows. Doing it as an ASP function will only slow it down further. Even if you aren't returning a lot of rows, it will take much longer to process because of the conversion.

Using a SQL Server UDF isn't much better. The important parts of an IP address--the octets--are completely lost in an int value. You'd have to do some fancy bit-flipping or bitwise operations to parse out a subnet. It certainly isn't any more efficient than leaving them in a varchar column; you can still parse things out using PARSENAME, but it won't work on an int. And you'll get no help from an index on an int column.

It really isn't worth converting to an int, you'll only get lesser performance. Either keep it as varchar, or parse it into 4 tinyint columns.

Edited by - robvolk on 12/02/2002 12:04:47
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-12-02 : 12:04:51
Depends what you want to do with it.
If you just want to return the IP address or find a specific IP address then fine (except that you are leaving an odd database structure which will probably cause a maintenance overhead in the future).
If you want to search on part of an IP address then you have big problems.

I wouldn't code the convertion in ASP - leave it in the database (as udf's or just in the SPs) then it is available to anything that wants to access the data - you won't have to re-code it for every application that needs the data.
It will also be available when you want to look at the data in query analyser.


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

bjornh
Yak Posting Veteran

87 Posts

Posted - 2002-12-02 : 13:27:25
ow, i've forgot something that's quite important I think.
The ip-addresses are in a range. So i have 123.233.43.0 to 123.233.45 and with this range comes a id for a country (let's say Holland, Holland is a nice country ). And a visitor that has the ip 123.233.44.234, the ip is in the range, so the it will give back Holland.

now aint it easier to use a query like this:

select countryID from iprange where @visitorID between ipfrom AND ipto

When I use my way, the Int way, this doesn't give any probs, but with a varchar way this isn't possible (or am i wrong).

so, is it still bad to use my way?
(I think this is my last counter... :D, if you come back good, i'll do it your way and shut my big mouth.. )

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-12-02 : 13:54:38
Yes, you can do it with a varchar:

SELECT * FROM myTable
WHERE PARSENAME(IP, 4) = '123'
AND PARSENAME(IP, 3) = '233'
AND CAST(PARSENAME(IP, 2) AS int) BETWEEN 43 AND 44


Now, if you parse the IP addresses into 4 columns:

SELECT * FROM myTable
WHERE octet1 = 123
AND octet2 = 233
AND octet3 BETWEEN 43 AND 44


The second version, however, allows you to use indexes for each octet, and it can be much faster because the parsing is done ahead of time. Using PARSENAME allows you to keep it as a varchar, but you can't index on a subnet.

Depending on the values you have in your data, if you create a clustered index on the proper column(s), you can get lightning fast response even with millions of rows.

Go to Top of Page

bjornh
Yak Posting Veteran

87 Posts

Posted - 2002-12-02 : 14:12:46
ok, your the guru :), i'll check it out morning, thanks for all the help

Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-12-02 : 16:20:02
quote:

arin|US|ipv4|24.30.224.0|8192|0|allocated
arin|US|ipv4|24.31.0.0|8192|0|allocated
arin|US|ipv4|24.31.32.0|57344|0|allocated


Stupid question, but can that third field contain 'ipv6'? Pretty difficult to fit 128 bits in an int!



Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-12-03 : 05:43:17
quote:

If you want to search on part of an IP address then you have big problems.



As you say you have a lot of IP addresses then splitting up the octets into different fields gives more indexing options - and if you store IP addresses this is usually valuable.
You can store as a varchar and still index but this probably won't be as flexible as the separate fields.
That's why you rarely see IP addresses stored in the sort of format you were suggesting.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-12-03 : 09:04:42
I'd be interested to see how you "4 tinyint columns" folk would write a query that found which subnet (if any) a given address was in. I'd use binary(4) myself.

To load the table:

DROP TABLE IP_Staging
CREATE TABLE IP_Staging (
auth char(4),
country char(2),
type varchar(255),
ipstart varchar(255),
netsize int,
somedate varchar(255),
status varchar(255)
)

BULK INSERT IP_Staging
FROM 'r:\temp\arin.20021201.txt'
WITH ( FIELDTERMINATOR = '|', FIRSTROW = 2)

DROP TABLE IP_Subnets
CREATE TABLE IP_Subnets (
ipstart binary(4) PRIMARY KEY,
netmask binary(4) NOT NULL,
country char(2) NOT NULL
)

INSERT INTO IP_Subnets (ipstart, netmask, country)
SELECT
CAST(CAST(PARSENAME(ipstart, 4) AS tinyint) AS binary(1)) +
CAST(CAST(PARSENAME(ipstart, 3) AS tinyint) AS binary(1)) +
CAST(CAST(PARSENAME(ipstart, 2) AS tinyint) AS binary(1)) +
CAST(CAST(PARSENAME(ipstart, 1) AS tinyint) AS binary(1)) AS ipstart,
CAST(netsize - 1 AS binary(4)) AS netmask,
country
FROM IP_Staging
WHERE type = 'ipv4' AND ipstart <> '*'

 
To find the subnet:

DECLARE @ip varchar(16), @ipbin binary(4)
SET @ip = '66.129.68.65' // www.sqlteam.com, of course

SET @ipbin =
CAST(CAST(PARSENAME(@ip, 4) AS tinyint) AS binary(1)) +
CAST(CAST(PARSENAME(@ip, 3) AS tinyint) AS binary(1)) +
CAST(CAST(PARSENAME(@ip, 2) AS tinyint) AS binary(1)) +
CAST(CAST(PARSENAME(@ip, 1) AS tinyint) AS binary(1))

SELECT @ipbin, ipstart, netmask, country,
CAST(CAST(SUBSTRING(ipstart, 1, 1) AS tinyint) AS varchar(3)) + '.' +
CAST(CAST(SUBSTRING(ipstart, 2, 1) AS tinyint) AS varchar(3)) + '.' +
CAST(CAST(SUBSTRING(ipstart, 3, 1) AS tinyint) AS varchar(3)) + '.' +
CAST(CAST(SUBSTRING(ipstart, 4, 1) AS tinyint) AS varchar(3))
FROM IP_Subnets
WHERE ipstart = (
SELECT MAX(ipstart)
FROM IP_Subnets
WHERE @ipbin >= ipstart
)
AND @ipbin < CAST(CAST(ipstart AS int) | CAST(netmask AS int) AS binary(4)) -- *

 
* how silly! The bitmap operators don't work on binary datatypes.

Er, my terminology is a bit wrong: "netmask" column is really the complement of the netmask.


Edited by - Arnold Fribble on 12/03/2002 09:08:17
Go to Top of Page
  Previous Page&nsp;  Next Page

- Advertisement -