Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Site Related Forums
 Article Discussion
 Article: Regular Expressions in T-SQL

Author  Topic 

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2003-06-24 : 11:20:24
This article comes to us from Cory Koski. Cory writes "I recently had the problem of trying to search for a regular expression in a database field. There is no version of SQL Server that supports regular expressions natively, but I discovered a way to add all sorts of regular expression goodness to your T-SQL applications. In order to use regular expressions easily, we can use a User Defined Function to help us out and abstract the work for us."

Article Link.

Blastrix
Posting Yak Master

208 Posts

Posted - 2003-06-24 : 15:05:32
I believe there is a problem with this in that if one of your steps fails along the way, the function will exit without properly destroying the regex object you have created. It's a simple enough fix, just figured I'd bring it up.

Go to Top of Page

plucky
Starting Member

1 Post

Posted - 2003-07-04 : 13:23:04
Aren't you opening yourself up to Injection attacks when you use dynamic SQL?

see post: [url]http://www.asp.net/Forums/ShowPost.aspx?tabindex=1&PostID=254355[/url]

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-07-05 : 08:48:30
There's no dynamic SQL used in the function or the example, so there's no opportunity for the function to be injected.

Go to Top of Page

simondeutsch
Aged Yak Warrior

547 Posts

Posted - 2003-07-09 : 21:25:46
I feel really stupid asking this, but what's a "regular expression"? Versus an irregular expression?

Sarah Berger MCSD
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-07-09 : 21:29:47
Probably the best place to start:

http://www.4guysfromrolla.com/webtech/regularexpressions.shtml

Basically they are more powerful versions of the features available with LIKE pattern-matching in SQL. The nice thing is that in addition to matching patterns, you can also use them to replace text.

I have no idea where the term "regular expression" came from, but I'm sure you can find a history of it through Google.

Go to Top of Page

Blastrix
Posting Yak Master

208 Posts

Posted - 2003-07-10 : 12:02:00
I was thinking about this again, and while it is indeed very useful, it also seems like an incredible waste of resources. Generally when working with a Reg Ex in code, you declare it once, set the pattern, and then try matching and so on through a loop or whatever.

I may be wrong about how SQL Server handles COM object creation, but it seems that if you were to use this as part of a WHERE clause in a querylike this would create a new RegEx object for each row of the query. Thus is you had a million rows, that would be a million instantiations. Again, please correct me if I'm wrong.

If I am correct about the above behavior, it might be better to instantiate the COM object in the SP you run, and then pass in the object's reference to the function. This way there is only one instantiation. It may be a little more from the maintenance perspective, but it seems that it would be a lot more efficient.

Steve

Go to Top of Page

yoinky
Starting Member

4 Posts

Posted - 2004-03-23 : 21:11:25
I'm sort of new to MSSQL and I just found this article and was thrilled to find a way to use regular expressions in T-SQL!

However I was more interested in using the regular expression replace functionality. So after a little bit of tweaking to the original UDF and using the Microsoft reference for VBScript.regex I was able to get it to work!

I then read the comments by Blastrix and thought he had a very good point about execution speed because of instantiating SO many COM objects. So I modified the regexFind and regexReplace UDFs as he had pointed out to create a second more efficient set of UDFs that would accept the regular expression object as a parameter.

In a basic test on my laptop I ran 30,000 records through a regexReplace, which took 27 seconds. I ran the same thing with the new regexObjReplace and it took 7 seconds. Its by no means an accurate benchmark of its performance, but it does show how dramatic the difference can be.

I did change the functionality a bit. If an error occurs the COM object is first destroyed, and then a NULL value is returned. I removed the unused parameters @objMatches, @objMatch, and @count.

Well I hope someone finds this code useful. As I said, I am pretty new to this, so if anyone has any improvements for this please let me know!

Here is an example of how to use it, this removes any characters other then a-z from the field 'firstname'.




The simple method is:

SELECT dbo.regexReplace( firstname, '[^a-z]', '', 1, 1 ) FROM account;




The more efficient method is:

DECLARE @regex integer;
SET @regex = dbo.regexObj( '[^a-z]', 1, 1 );
SELECT dbo.regexObjReplace( @regex, firstname, '' ) FROM account;




CREATE FUNCTION dbo.regexObj
(
@regexp varchar(1000),
@globalReplace bit = 0,
@ignoreCase bit = 0
)
RETURNS integer AS
BEGIN
DECLARE @hr integer
DECLARE @objRegExp integer

EXECUTE @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT
IF @hr <> 0 BEGIN
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp
IF @hr <> 0 BEGIN
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Global', @globalReplace
IF @hr <> 0 BEGIN
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignoreCase
IF @hr <> 0 BEGIN
RETURN NULL
END

RETURN @objRegExp
END
GO


CREATE FUNCTION dbo.regexObjFind
(
@objRegExp integer,
@source varchar(5000)
)
RETURNS bit AS
BEGIN
DECLARE @hr integer
DECLARE @results bit

EXECUTE @hr = sp_OAMethod @objRegExp, 'Test', @results OUTPUT, @source
IF @hr <> 0 BEGIN
RETURN NULL
END

RETURN @results
END
GO


CREATE FUNCTION dbo.regexObjReplace
(
@objRegExp integer,
@source varchar(5000),
@replace varchar(1000)
)
RETURNS varchar(1000) AS
BEGIN
DECLARE @hr integer
DECLARE @result varchar(5000)

EXECUTE @hr = sp_OAMethod @objRegExp, 'Replace', @result OUTPUT, @source, @replace
IF @hr <> 0 BEGIN
RETURN NULL
END

RETURN @result
END
GO


CREATE FUNCTION dbo.regexFind
(
@source varchar(5000),
@regexp varchar(1000),
@ignoreCase bit = 0
)
RETURNS bit AS
BEGIN
DECLARE @hr integer
DECLARE @objRegExp integer
DECLARE @results bit

SET @results = 0

EXECUTE @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Global', false
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignoreCase
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OAMethod @objRegExp, 'Test', @results OUTPUT, @source
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OADestroy @objRegExp
IF @hr <> 0 BEGIN
RETURN NULL
END

RETURN @results
END
GO


CREATE FUNCTION dbo.regexReplace
(
@source varchar(5000),
@regexp varchar(1000),
@replace varchar(1000),
@globalReplace bit = 0,
@ignoreCase bit = 0
)
RETURNS varchar(1000) AS
BEGIN
DECLARE @hr integer
DECLARE @objRegExp integer
DECLARE @result varchar(5000)

EXECUTE @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Global', @globalReplace
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignoreCase
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OAMethod @objRegExp, 'Replace', @result OUTPUT, @source, @replace
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OADestroy @objRegExp
IF @hr <> 0 BEGIN
RETURN NULL
END

RETURN @result
END
GO
Go to Top of Page

kapcreations
Starting Member

1 Post

Posted - 2004-06-02 : 18:48:56
can you give an example for all the functions you provided?
Go to Top of Page

yoinky
Starting Member

4 Posts

Posted - 2004-06-02 : 19:10:30
Sure, here are some simple examples.


-- remove anything other then a-z from firstname field
SELECT dbo.regexReplace( firstname, '[^a-z]', '', 1, 1 ) FROM account;


-- return only account records which have any characters other then a-z
SELECT * FROM account WHERE dbo.regexFind( firstname, '[^a-z]', 1 ) = 1;


--------------------------------------------------------------------------------


The longer, and more efficient version of the 2 statements above are:


-- remove anything other then a-z from firstname field
DECLARE @regex integer;
SET @regex = dbo.regexObj( '[^a-z]', 1, 1 );
SELECT dbo.regexObjReplace( @regex, firstname, '' ) FROM account;


-- return only account records which have any characters other then a-z
DECLARE @regex integer;
SET @regex = dbo.regexObj( '[^a-z]', 0, 1 );
SELECT * FROM account WHERE dbo.regexObjFind( @regex, firstname ) = 1;


Of course the power of regex can do WAY more then I've illustrated above, being able to update values in the database with regex, do very complex regex search filtering, and more.


quote:
Originally posted by kapcreations

can you give an example for all the functions you provided?

Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-06-02 : 21:36:05
very interesting ideas ... nice post.

- Jeff
Go to Top of Page

yoinky
Starting Member

4 Posts

Posted - 2004-06-02 : 23:07:58
Yeah it's really powerful, now if I could only get the dba on my shared mssql server to allow me use of the sp_OA* sprocs! Doh!
Go to Top of Page

pittsburghsteelersfan
Starting Member

4 Posts

Posted - 2004-06-11 : 21:58:13
hello...I was happy to se this, and hoped that it would solve my problem...i have a table with about 50,000 replacement regex's that need to be run against 2 fields in a different table...vb.net seems to have a memory issue because running these causes the memory usage to skyrocket to the point where the standalone app crashes. I was hoping to make it work directly in SQL server and I hope someone can help!!! Any help greatly appreciated. Thanks. Patrick.


quote:
Originally posted by yoinky

I'm sort of new to MSSQL and I just found this article and was thrilled to find a way to use regular expressions in T-SQL!

However I was more interested in using the regular expression replace functionality. So after a little bit of tweaking to the original UDF and using the Microsoft reference for VBScript.regex I was able to get it to work!

I then read the comments by Blastrix and thought he had a very good point about execution speed because of instantiating SO many COM objects. So I modified the regexFind and regexReplace UDFs as he had pointed out to create a second more efficient set of UDFs that would accept the regular expression object as a parameter.

In a basic test on my laptop I ran 30,000 records through a regexReplace, which took 27 seconds. I ran the same thing with the new regexObjReplace and it took 7 seconds. Its by no means an accurate benchmark of its performance, but it does show how dramatic the difference can be.

I did change the functionality a bit. If an error occurs the COM object is first destroyed, and then a NULL value is returned. I removed the unused parameters @objMatches, @objMatch, and @count.

Well I hope someone finds this code useful. As I said, I am pretty new to this, so if anyone has any improvements for this please let me know!

Here is an example of how to use it, this removes any characters other then a-z from the field 'firstname'.




The simple method is:

SELECT dbo.regexReplace( firstname, '[^a-z]', '', 1, 1 ) FROM account;




The more efficient method is:

DECLARE @regex integer;
SET @regex = dbo.regexObj( '[^a-z]', 1, 1 );
SELECT dbo.regexObjReplace( @regex, firstname, '' ) FROM account;




CREATE FUNCTION dbo.regexObj
(
@regexp varchar(1000),
@globalReplace bit = 0,
@ignoreCase bit = 0
)
RETURNS integer AS
BEGIN
DECLARE @hr integer
DECLARE @objRegExp integer

EXECUTE @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT
IF @hr <> 0 BEGIN
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp
IF @hr <> 0 BEGIN
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Global', @globalReplace
IF @hr <> 0 BEGIN
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignoreCase
IF @hr <> 0 BEGIN
RETURN NULL
END

RETURN @objRegExp
END
GO


CREATE FUNCTION dbo.regexObjFind
(
@objRegExp integer,
@source varchar(5000)
)
RETURNS bit AS
BEGIN
DECLARE @hr integer
DECLARE @results bit

EXECUTE @hr = sp_OAMethod @objRegExp, 'Test', @results OUTPUT, @source
IF @hr <> 0 BEGIN
RETURN NULL
END

RETURN @results
END
GO


CREATE FUNCTION dbo.regexObjReplace
(
@objRegExp integer,
@source varchar(5000),
@replace varchar(1000)
)
RETURNS varchar(1000) AS
BEGIN
DECLARE @hr integer
DECLARE @result varchar(5000)

EXECUTE @hr = sp_OAMethod @objRegExp, 'Replace', @result OUTPUT, @source, @replace
IF @hr <> 0 BEGIN
RETURN NULL
END

RETURN @result
END
GO


CREATE FUNCTION dbo.regexFind
(
@source varchar(5000),
@regexp varchar(1000),
@ignoreCase bit = 0
)
RETURNS bit AS
BEGIN
DECLARE @hr integer
DECLARE @objRegExp integer
DECLARE @results bit

SET @results = 0

EXECUTE @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Global', false
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignoreCase
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OAMethod @objRegExp, 'Test', @results OUTPUT, @source
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OADestroy @objRegExp
IF @hr <> 0 BEGIN
RETURN NULL
END

RETURN @results
END
GO


CREATE FUNCTION dbo.regexReplace
(
@source varchar(5000),
@regexp varchar(1000),
@replace varchar(1000),
@globalReplace bit = 0,
@ignoreCase bit = 0
)
RETURNS varchar(1000) AS
BEGIN
DECLARE @hr integer
DECLARE @objRegExp integer
DECLARE @result varchar(5000)

EXECUTE @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Global', @globalReplace
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignoreCase
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OAMethod @objRegExp, 'Replace', @result OUTPUT, @source, @replace
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OADestroy @objRegExp
IF @hr <> 0 BEGIN
RETURN NULL
END

RETURN @result
END
GO


Go to Top of Page

yoinky
Starting Member

4 Posts

Posted - 2004-06-11 : 22:45:28
Wow 50,000 regex replaces, that's an incredible concept to get your head around. What would you be doing with so many regex changes? Are they even necessary to do regex?

I love regex because with some cleaver thinking a single regex can completely transform a string, it's power is incredible.

Perhaps there is redundency in those 50,000 statements that could be eliminated into fewer generic 'preprocess' regex's to be run before the others. Then if you are doing like a word match/replace afterwards with just normal replace instead of regex?

I'm not sure about the scaleability of running this regex com object as a UDF on SQL Server, but its worth a shot.

I *think* this might work, but I can't test it myself. Basicly just joining the 2 tables together in a cross join and using the regex table as the parameters of the replace.

But as stated earlier in this thread its seamingly very inefficient to recreate the regex object each and every time. Alternatively you might have to use a cursor to loop over the regex table then use the regexObj() and regexObjReplace() methods for the replace action.

Let me know how it goes.




UPDATE updateTable
SET ut.field1 = dbo.regexReplace( ut.field1, rt.regex, rt.replace, 1, 1 )
,ut.field2 = dbo.regexReplace( ut.field2, rt.regex, rt.replace, 1, 1 )
FROM updateTable AS ut
CROSS JOIN regexTable AS rt



quote:
Originally posted by pittsburghsteelersfan

hello...I was happy to se this, and hoped that it would solve my problem...i have a table with about 50,000 replacement regex's that need to be run against 2 fields in a different table...vb.net seems to have a memory issue because running these causes the memory usage to skyrocket to the point where the standalone app crashes. I was hoping to make it work directly in SQL server and I hope someone can help!!! Any help greatly appreciated. Thanks. Patrick.

Go to Top of Page

cgray@careyweb.com
Starting Member

4 Posts

Posted - 2004-07-14 : 14:04:42
I am new to T-SQL and i am wondering if anyone can help me with this question.

I would like to use RE in a trigger. What i want to do is look at the column "PATH" that is being inserted. Use RE to grab a 5 digit number from the field and insert it into a field called "JOBNUM".

For example the data being inserted for the PATH field would be:
\\servername\foldername99999\filename.ext

I want to use the RE [0-9 ]{5} to grab the 99999 and insert that number into the field "JOBNUM".

All the RE functions i have see in this thread returns a 1 or 0.

How would i modify the Functions to return the found result of the RE?

Thanks!
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-07-14 : 14:13:13
i would not use external activeX controls in a trigger -- if anything goes wrong, you could have some serious issues.

why not just use PATINDEX(), SUBSTRING() and/or CHARINDEX() ? Those are built-in SQL functions that you can use with no performance/stability penalties.


- Jeff
Go to Top of Page

cgray@careyweb.com
Starting Member

4 Posts

Posted - 2004-07-14 : 14:23:53
Thanks for the reply. I will look at those functions and see how i can use them.

I am use to regex with other programing i have done in the past so it was the first place i looked for this solution.
Go to Top of Page

cgray@careyweb.com
Starting Member

4 Posts

Posted - 2004-07-14 : 14:49:22
Ya... i dont see how i am going to be able to use PATINDEX(), SUBSTRING(), or CHARINDEX(). They appear to use string literal text as the search method. PATINDEX() will use the % wildcard, but that wont help me.

The number 99999 was just an example. The number will always be 5 digits. The digits will never be the same. That is why regex seemed to be the way to go since i could use somthing like [0-9]{5} to find the number in the field being inserted.

Thanks
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-07-14 : 14:59:58
In a PATINDEX, or LIKE, you can use "%" for 0 or more characters, "_" for exactly one any-character, and [0-9] and [^0-9] as you would for a RegEx. But you cannot use "[0-9]*" or "[0-9]+", the use of "[0-9]" matches exactly one character.

So you can use "[0-9][0-9][0-9][0-9][0-9]" to find the location of a 5 digit number, but you will struggle to NOT get mismatches on EARLIER 5 digit numbers within the string.

DECLARE @strTest varchar(8000)

SELECT @strTest = '\\servername\foldername99999\filename.ext'

SELECT [Position]=PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', @strTest),
[Before]=SUBSTRING(@strTest, 1, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', @strTest)-1),
[After]=SUBSTRING(@strTest, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', @strTest), LEN(@strTest))

Kristen
Go to Top of Page

cgray@careyweb.com
Starting Member

4 Posts

Posted - 2004-07-14 : 15:08:27
Thanks Kristen! This will work perfect!

Guess i need to buy a better SQL book. :) I had no idea you could represent digits with 0-9 in a LIKE. Very cool!
Go to Top of Page

Razer
Starting Member

1 Post

Posted - 2004-08-06 : 17:14:02
This stuff looks pretty powerfull, but perhaps a bit over kill for my needs. Basically i want to make a query that gets all strings of any length that do not contain a comma in them. so for example if the rows of a table have an element called colors with values like: red blue green blue,green yellow,red

i would want only the rows with red blue and green not the blue,green or yellow,red

tried something like %[^,]% but id doesnt work as the wildcard % allows commas... any tips?

thanks
rich
Go to Top of Page
    Next Page

- Advertisement -