Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Old Forums
 CLOSED - General SQL Server
 Removing html tags from long text fields

Author  Topic 

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2006-05-17 : 09:48:23
MGD writes "We have taken a straight copy of an OLTP database which has a web front end and we use the copy database for reporting. We have noticed that some of the fields of type long text include a load of html tags ( all manner of tags eg <>). What is the most efficient way of removing the html tags out of a long text field.

Usually reporting is summary info but in this particular case it is necessary to report on the detail - dull as dishwater reports but essential for operations. How on earth do you remove all the unwanted characters which might appear anywhere in the text which might be over 10000 characters long?!"

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2006-05-18 : 02:28:16
http://www.nigelrivett.net/SQLTsql/ReplaceText.html

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

RichT
Starting Member

1 Post

Posted - 2006-06-08 : 11:31:10
How about a function:

/*
R.Tolliver 6/7/2006
Function to strip HTML from Text
*/

ALTER FUNCTION [dbo].[fn_StripHTML]
(@strHTML varchar(300))
RETURNS VarChar(300) AS

/*
STRIP OUT HTML TAGS
TRY AND LEAVE GREATER THAN > and LESS THEN tags alone
assums Greater than and less than will be preceeded by
and followed by a space, and that HTML tags will not
have a space immediatly following the <
*/

BEGIN
DECLARE @iTagStart INT
DECLARE @iTagEnd INT
DECLARE @iLT INT
SET @iTagStart = 0

WHILE (Charindex('<',@strHTML)) > 0 AND (Charindex('>',@strHTML)) > 0
BEGIN

SET @iTagStart = Charindex('<',@strHTML,@iTagStart)
IF @iTagStart = 0
BREAK --No tags
SET @iLT = Charindex(' < ',@strHTML,@iTagStart)

If (@iLT + 1 = @iTagStart) AND (@iLT > 0) --It's a < symbol, not the start of a tag
BEGIN
SET @iTagStart = @iTagStart + 1
CONTINUE
END

SET @iTagEnd = Charindex('>',@strHTML,@iTagStart + 1 )
IF @iTagEnd = 0
BREAK --No end tage, get out

IF SUBSTRING(@strHTML,@iTagStart + 1, 1) <> ' ' AND (Charindex('>',@strHTML)) > 0
BEGIN
SET @strHTML = LEFT(@strHTML,@iTagStart -1) + SUBSTRING ( @strHTML,@iTagEnd + 1, 300 )
END
ELSE
BEGIN
SET @iTagStart = @iTagStart + 1
SET @iTagStart = Charindex('<',@strHTML,@iTagStart) - 1
END

IF Charindex('<',@strHTML) = 0
BREAK
ELSE
CONTINUE
END
RETURN @strHTML
END
--test
--SELECT dbo.fn_StripHTML('<Font color="Red"><B> Test if 5 > 6 or 3 < 2 or 7 > 8 > And <SPAN> annother tag</Span></B></Font>')
Go to Top of Page
   

- Advertisement -