Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Removing an ascii char

Author  Topic 

WindChaser
Posting Yak Master

225 Posts

Posted - 2007-10-26 : 10:30:25
Hi folks!

This is likely an easy one, bordering on stupid. I have some Name, address, city, etc... columns in a DB. For some unknown reason, as users type in their data, the ASCII char 0 sometimes finds itself in the data. ASCII 0 is an invisible character which can create havoc on some operations like XML recognition.

1. Is there a Transact-SQL operation that I can perform to filter out ASCII 0 in the columns?
2. Does anyone know how a user can generate ASCII 0 through normal lay-user typing?

Thanks!


Mike

Zoroaster
Aged Yak Warrior

702 Posts

Posted - 2007-10-26 : 10:36:03
Are you sure it is ASCII 0 and not a square symbol that represents a carriage return? This can be entered into an application when an end user does a copy and paste and pastes in the next line. In the past I had to write a VBScript to remove these but there may be a better way.



Future guru in the making.
Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2007-10-26 : 10:37:14
Select columns from table
where col like '%'+char(0)+'%'

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

Zoroaster
Aged Yak Warrior

702 Posts

Posted - 2007-10-26 : 10:41:32
And char(13) would be for carriage return, if that is it.



Future guru in the making.
Go to Top of Page

Van
Constraint Violating Yak Guru

462 Posts

Posted - 2007-10-26 : 10:50:17
I've also saw the "..." copied from emails and pasted into a web app and and entered into a table. The email turns the "..." into some special character and it really messed with web app once the data got stored in the database. They couldn't retrieve the record through the web app or something like that...or not update the record maybe. I would have to update it in SQL and take out the "..." character.
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2007-10-26 : 11:18:19
Sorry, I used the term "column" when I meant fields. Basically, I'm looking for something like:

Update MyTable Set MyAddress = (MyAddressExcludingAscii0) where I need to define the operation which is MyAddressExcludingAscii0
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2007-10-26 : 12:12:36
Update MyTable Set MyAddress Replace(MyAddress, char(0), '')
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2007-10-26 : 12:18:04
whoops, left out the =

Update MyTable Set MyAddress = Replace(MyAddress, char(0), '')
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2007-10-26 : 13:00:56
Well, I tried something like this before consulting the forum and bothering you. The query makes sense but, instead, it zaps the address into nothingness... odd.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-26 : 13:17:26
"The query makes sense but, instead, it zaps the address into nothingness... odd"

Suggest you post the EXACT statement you used - in case we can spot some unfortunate side effect!

Kristen
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2007-10-26 : 13:34:16
Kristen,

The exact statement is as per Russell's: Update MyTable Set MyAddress = Replace(MyAddress, char(0), ''). It only zaps the data if char(0) is used. For instance, if I say Update MyTable Set MyAddress = Replace(MyAddress, char(5), ''), that works fine and only filters out ASCII 5.
Go to Top of Page

Zoroaster
Aged Yak Warrior

702 Posts

Posted - 2007-10-26 : 15:04:48

That's why I don't think it's CHAR(0), what leads you to believe it is?





Future guru in the making.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-26 : 17:20:04
Works for me:


DECLARE @TestData TABLE
(
MyAddress VARCHAR(50)
)

INSERT @TestData
SELECT 'foo bar' UNION ALL
SELECT 'bar' + CHAR(0) + 'foo'

SELECT LEN(MyAddress), MyAddress
FROM @TestData

UPDATE @TestData
SET MyAddress = REPLACE(MyAddress, char(0), '')

SELECT LEN(MyAddress), MyAddress
FROM @TestData

Kristen
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-26 : 17:20:44
P.S. Note that 'bar' + CHAR(0) + 'foo' does not display properly in the first resultset, but the LEN is displayed correctly, and decrements by one in the second resultset
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2007-10-27 : 12:37:48
I know that the ASCII 0 characters are there because I can detect them by creating some VB code which pulls a recordset and does a character-by-character sweep.

I did a test as follows:

Update MyTable set MyAddress = 'bar' + CHAR(0) + 'foo' which results in the expected alteration to the data. Then,

Update MyTable set MyAddress = REPLACE(MyAddress, char(0), '') which definitely zaps the data, nothing left, nada, zilch, niente, zippo. I was using SQL2000 SP3 and updated to SP4 but it did'nt change the behavior.

Just as a further test, I instead used Update MyTable set MyAddress = REPLACE(MyAddress, char(0), 'H') and it resulted in zapping everything and leaving a single H, meaning that it did find and replace the ASCII 0. So basically, the operation is doing:

If ASC(Character) = 0 then
ASC(Character) = 72
else
Character = NULL
endif

Best of all, I just tried the test in SQL Server 2005 and, well, it didn't zap the data but it didn't filter out anything either and just left the ASCII 0 in between bar and foo....


Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2007-10-27 : 12:41:15
you have more than just char(0) in there sounds like to me.

can you post a few smaple records of actual data and your exact update statement?
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2007-10-27 : 12:48:12
by the way, i tested what i posted, and what kristen posted in sql 2000 SE, sql2k5 SE, sql2k5 EE

works on all 3.

also, how are these getting in the database in the 1st place? note that ascii 0 is a NULL character, and there are many other non-printable characters (that will whack the display) that could be getting in there if u aren't doing validation in the application executing the inserts
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2007-10-27 : 13:18:51
This is the step by step of what I'm doing in EM:
1. I created a test table called MyTable which contains an ID keyed field and a MyAddress field which is a nvarchar 100 that allows nulls.
2. I executed Insert Into MyTable (MyAddress) Values ('blablabla')
3. I executed Update MyTable set MyAddress = 'bar' + CHAR(0) + 'foo' which changed 'blablabla' accordingly.
4. I executed Update MyTable set MyAddress = REPLACE(MyAddress, char(0), '')

I suspect that the ASCII 0 is imported from other data sources when we perform conversions. Many of our clients had products from competitors and just copied their old data. That's why I want to add a final filtering process to remove all unwanted characters.

And in writing this I realized something: what if I put the field as char instead of nvarchar? The operation works on a char. Good thing because I was about to send my SQL Server to rehab. So the question now becomes: why does it work with Char and not with nvarchar?
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-27 : 14:41:44
I still don't understand it. You are getting a different behaviour to the example I posted above. What happens if you use the example I posted above?

Are you doing this in a SQL Tool - such as Query Analyser or SSMS? or in some application language?

Many application languages will take CHAR(0) to be a string terminator, and thus interpret the data accordingly.

Indeed, in the example I posted above the text beyond the CHAR(0) is NOT displayed, although it IS there - as evidenced by the LENgth display.

Kristen
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2007-10-27 : 15:59:33
ok, i got it. unicode behaves differently as it takes 2bytes storage, not just one. you'll need to convert it to varchar, or manipulate in binary. hopefully you dont have any characters that cant be represented as varchar in there

select replace(convert(varchar, MyAddress), char(0), '') from myTable
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-27 : 16:36:28
Different behaviour in SQL 2000 and SQL 2005 for Nvarchar:

DECLARE @TestData TABLE
(
MyAddress Nvarchar(50)
)

INSERT @TestData
SELECT '1foo bar' UNION ALL
SELECT '2foo' + NCHAR(0) + 'bar'

SELECT LEN(MyAddress), MyAddress
FROM @TestData

UPDATE @TestData
SET MyAddress = REPLACE(MyAddress, NCHAR(0), N'')

SELECT LEN(MyAddress), MyAddress
FROM @TestData

SQL 2000 REPLACE wrongly replaces (presumably) the CHAR(0) byte making up the first part of the double-byte character.

SQL 2005 fails to replace the character, but does no damage to the remaining string - so it remains unchanged

CHARINDEX and STUFF both behave incorrectly, so we can't use those either.

CHARINDEX (NCHAR(0), MyAddress) thinks the NCHAR is found at position 0
PATINDEX ('%' + NCHAR(0) + '%', MyAddress) thinks it is at position 1

Kristen
Go to Top of Page
    Next Page

- Advertisement -