Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 T-SQL Statement to strip out HTML characters

Author  Topic 

ljp099
Yak Posting Veteran

79 Posts

Posted - 2007-07-31 : 16:16:48
I have a column which has some data stored with HTML markup. I want to strip out the HTML tags and peform a query like this:

SELECT D.Title from Documents D WHERE Or D.Title Like 'foo'"

So the what I need to do is use a sql function on the column D.Title so that it removes all HTML characters to perform the Like clause on foo.

Thanks for any help

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2007-07-31 : 16:18:41
you can use this and remove the tags with a regex
http://www.codeproject.com/cs/library/SqlRegEx.asp


or you can try to put your html in an xml type column and search it's values

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
Go to Top of Page

ljp099
Yak Posting Veteran

79 Posts

Posted - 2007-07-31 : 16:28:11
Oops. Maybe I posted to the wrong forum. This is using a SQL 2000 db. Can I use similar SQL RegEx in sql server 2000?

thanks for your help
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2007-07-31 : 16:30:47
http://www.sqlteam.com/article/regular-expressions-in-t-sql

note that this is slow compared to any kind of T-SQL

also moved from Transact-SQL (2005) forum. Please post in the correct forum

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
Go to Top of Page

ljp099
Yak Posting Veteran

79 Posts

Posted - 2007-07-31 : 17:22:34
The function below answers my original question (is there a t-sql function to strip out html)


CREATE FUNCTION [dbo].[udf_StripHTML]
(@HTMLText VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT

SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1

WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END

GO Test above function like this :

SELECT dbo.udf_StripHTML('<b>UDF at sqlteam.com </b><br><br><a href="http://www.sqlteam.com">sqlteam.com</a>'
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2007-07-31 : 17:27:15
or you can do it like that
nice.

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
Go to Top of Page
   

- Advertisement -