Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Removing unwanted characters efficiently

Author  Topic 

virtuesoft
Starting Member

8 Posts

Posted - 2008-02-08 : 06:24:28
Hi,

I'm hoping somebody can help me. I currently have a stored proc that trims out unwanted text from an application name, it works fine but it is slow, I'm sure the way I have wrote this is not the most efficient.

Here is the original list of distinct application names in my database. What I want to do is strip out the web service path (/LM/W3SVC/1/Root/) at the beginning and the randomly generated number at the end of the application names for web services (The first six rows are web services). One of our applications also sets the application name as a random GUID (don't ask me why) which complicates things more, I want to strip all of these out too.

AppDomainName
--------------
/LM/W3SVC/1/Root/DerivativePricerService-1-128377118199915131
/LM/W3SVC/1/Root/DerivativePricerService-1-128377961195421010
/LM/W3SVC/1/Root/DerivativePricerService-2-128377815630282463
/LM/W3SVC/1/Root/FundMaintenance-1-128454071437980607
/LM/W3SVC/1/Root/FundMaintenance-3-128364053980129318
/LM/W3SVC/1/Root/FundMaintenance-33-128400455429618694
6dfb21c-6-128377063596406250
6dfb21c-7-128377067554687500
DerivativeBatchPricer.exe
e63c5484-1-128438415595731219
e63c5484-2-128438426669112356
ed75b4da-2-128401328239646984
ed75b4da-3-128401349735859632
ed75b4da-4-128401352234172860
EncryptData.exe
HDFWinShell.exe
ImagineUploadTestWindow.exe
ImgMirrorDownloadService.exe
TestUsersLogBuild.exe

Essentially, I want to end up with this...

AppDomainName
--------------
DerivativePricerService
FundMaintenance
DerivativeBatchPricer.exe
EncryptData.exe
HDFWinShell.exe
ImagineUploadTestWindow.exe
ImgMirrorDownloadService.exe
TestUsersLogBuild.exe

Here is the query I'm using in my stored procedure, as you can see I'm using a subquery to strip out the text in two steps. The inner select strips out the random numbers at the end of the web service application name and the GUIDs. The outer select strips out the web service path prefix.

SELECT DISTINCT 
-- Remove the web service path prefix '/LM/W3SVC/1/Root/'
CASE PATINDEX('/LM/W3SVC/1/Root/%', TrimmedAppDomainName)
WHEN 0 THEN TrimmedAppDomainName
ELSE SUBSTRING(TrimmedAppDomainName, 18, LEN(TrimmedAppDomainName) -17)
END AS ApplicationName
FROM
(
SELECT DISTINCT
-- Remove the randomly generated numbers at the end of a web service name
CASE
WHEN PATINDEX('%[-]%[-]%', AppDomainName) = 0 THEN AppDomainName
WHEN PATINDEX('%[-]%[-]%', AppDomainName) <= 9 THEN NULL
ELSE SUBSTRING(AppDomainName, 0, PATINDEX('%[-]%[-]%', AppDomainName))
END AS TrimmedAppDomainName
FROM [HDFCommon].[dbo].[Log]
)
AS Trimmed


What do you think on the above code? Feel free to insult if required as I do suspect it may be pretty poor but I just cannot figure out a better way to do this...

Any help would be greatly appreciated

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2008-02-08 : 07:02:09
Refer this
http://sqlblogcasts.com/blogs/madhivanan/archive/2007/12/18/extract-only-numbers-from-a-string.aspx

Change [0-9] to [a-z]

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

ayamas
Aged Yak Warrior

552 Posts

Posted - 2008-02-08 : 07:46:01
Taking into consideration that the the data will be in the same format
the way you have given here is my solution.


declare @tbl table (AppDomainName varchar(500))
insert into @tbl

select'/LM/W3SVC/1/Root/DerivativePricerService-1-128377118199915131'union all
select'/LM/W3SVC/1/Root/DerivativePricerService-1-128377961195421010'union all
select'/LM/W3SVC/1/Root/DerivativePricerService-2-128377815630282463'union all
select'/LM/W3SVC/1/Root/FundMaintenance-1-128454071437980607'union all
select'/LM/W3SVC/1/Root/FundMaintenance-3-128364053980129318'union all
select'/LM/W3SVC/1/Root/FundMaintenance-33-128400455429618694'union all
select'6dfb21c-6-128377063596406250'union all
select'6dfb21c-7-128377067554687500'union all
select'DerivativeBatchPricer.exe'union all
select'e63c5484-1-128438415595731219'union all
select'e63c5484-2-128438426669112356'union all
select'ed75b4da-2-128401328239646984'union all
select'ed75b4da-3-128401349735859632'union all
select'ed75b4da-4-128401352234172860'union all
select'EncryptData.exe'union all
select'HDFWinShell.exe'union all
select'ImagineUploadTestWindow.exe'union all
select'ImgMirrorDownloadService.exe'union all
select'TestUsersLogBuild.exe'

select distinct AppDomainName from
(
select
case when len(AppDomainName)-len(replace(replace(AppDomainName,'-',''),'/',''))=7 then
replace(substring(AppDomainName,1,charindex('-',AppDomainName)-1),'/LM/W3SVC/1/Root/','')
when len(AppDomainName)-len(replace(replace(AppDomainName,'-',''),'/',''))=2 then ''
when len(AppDomainName)-len(replace(replace(AppDomainName,'-',''),'/',''))=''then AppDomainName
end as AppDomainName from @tbl
)
T where AppDomainName <>''
Go to Top of Page

virtuesoft
Starting Member

8 Posts

Posted - 2008-02-08 : 07:47:21
Hi Madhivanan,

Thanks for the reply.

I don't think that is going to work because the application name could have a number in it. Also, the GUIDs have numbers and letters so I would be left with something like this...

DerivativePricerService
FundMaintenance
dfbc
DerivativeBatchPricer.exe
ec
edbda
EncryptData.exe
HDFWinShell.exe
ImagineUploadTestWindow.exe
ImgMirrorDownloadService.exe
TestUsersLogBuild.exe
Go to Top of Page

virtuesoft
Starting Member

8 Posts

Posted - 2008-02-08 : 07:54:20
Hi ayamas,

Thanks for your reply.

That works but it is slower than the original query.

The hash match on the original query has a CPU cost of 0.84142 and the CPU cost on the new query is 1.7626.
Go to Top of Page

ayamas
Aged Yak Warrior

552 Posts

Posted - 2008-02-08 : 07:56:30
For how many records?
Go to Top of Page

virtuesoft
Starting Member

8 Posts

Posted - 2008-02-08 : 08:01:24
129,310 at the moment. But that number will grow by about 10,000 per day as it's a global error log.
Go to Top of Page

ayamas
Aged Yak Warrior

552 Posts

Posted - 2008-02-08 : 08:06:39
I guess it is because of that distinct keyword in the outer query.
Try the query without distinct.
Go to Top of Page

virtuesoft
Starting Member

8 Posts

Posted - 2008-02-08 : 08:13:59
That does get rid of the hash match and improves the speed. The only problem is I need the distinct records...

I'll have a play and see what I can come up with.
Go to Top of Page

Qualis
Posting Yak Master

145 Posts

Posted - 2008-02-08 : 10:38:02
I have a good test for you. :D

First, the setup. I fill a temp table with 5mil rows for speed testing.

Declare @cnt int
Set @cnt = 1
Drop Table #Log
Select @cnt LogID, '/LM/W3SVC/1/Root/DerivativePricerService-1-128377118199915131' AppDomainName
Into #Log

While @cnt < 5000000 Begin
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, '/LM/W3SVC/1/Root/DerivativePricerService-1-128377118199915131'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, '/LM/W3SVC/1/Root/DerivativePricerService-1-128377961195421010'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, '/LM/W3SVC/1/Root/DerivativePricerService-2-128377815630282463'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, '/LM/W3SVC/1/Root/FundMaintenance-1-128454071437980607'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, '/LM/W3SVC/1/Root/FundMaintenance-3-128364053980129318'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, '/LM/W3SVC/1/Root/FundMaintenance-33-128400455429618694'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, '6dfb21c-6-128377063596406250'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, '6dfb21c-7-128377067554687500'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'DerivativeBatchPricer.exe'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'e63c5484-1-128438415595731219'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'e63c5484-2-128438426669112356'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'ed75b4da-2-128401328239646984'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'ed75b4da-3-128401349735859632'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'ed75b4da-4-128401352234172860'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'EncryptData.exe'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'HDFWinShell.exe'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'ImagineUploadTestWindow.exe'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'ImgMirrorDownloadService.exe'
Set @cnt = @cnt + 1
Insert Into #Log Select @cnt, 'TestUsersLogBuild.exe'
End

Next, I query the results:

--5,000,000 Log Records
--11 Unique Clean AppDomainName Records
--2 seconds Execution Time (on a nice server)
Select Distinct
Case
--Prefix Removal
When Left(AppDomainName, 1) = '/' Then
SubString(Left(AppDomainName, CharIndex('-', AppDomainName) - 1), 18, 500)
--Dash Removal (No Prefix)
When CharIndex('-', AppDomainName) > 0 Then
Left(AppDomainName, CharIndex('-', AppDomainName) - 1)
Else AppDomainName
End As AppDomainName
From #Log
Order By AppDomainName
Go to Top of Page

virtuesoft
Starting Member

8 Posts

Posted - 2008-02-08 : 10:49:52
Hi Qualis,

Thanks for your reply.

I've tested your method and it works well, it is indeed quick. It leaves the first part of the GUIDs (the part before the '-') so I get the following extra records...

6dfb21c
e63c5484
ed75b4da

However, I'm sure I can figure out a way to overcome that issue.

Thanks very much!
Go to Top of Page

Qualis
Posting Yak Master

145 Posts

Posted - 2008-02-08 : 11:21:16
Try

Select AppDomainName
From
(
Select Distinct
Case
--Prefix Removal
When Left(AppDomainName, 1) = '/' Then
SubString(Left(AppDomainName, CharIndex('-', AppDomainName) - 1), 18, 500)
--Dash Removal (No Prefix)
When CharIndex('-', AppDomainName) > 0 Then
Left(AppDomainName, CharIndex('-', AppDomainName) - 1)
Else AppDomainName
End As AppDomainName
From #Log
) AppDomainNames
Where
Not (Len(AppDomainName) In (7, 8) And AppDomainName NOT LIKE '%[^0-9A-Fa-f-]%')
Order By AppDomainName
Go to Top of Page
   

- Advertisement -