Author |
Topic |
WindChaser
Posting Yak Master
225 Posts |
Posted - 2007-10-26 : 10:30:25
|
Hi folks!This is likely an easy one, bordering on stupid. I have some Name, address, city, etc... columns in a DB. For some unknown reason, as users type in their data, the ASCII char 0 sometimes finds itself in the data. ASCII 0 is an invisible character which can create havoc on some operations like XML recognition. 1. Is there a Transact-SQL operation that I can perform to filter out ASCII 0 in the columns?2. Does anyone know how a user can generate ASCII 0 through normal lay-user typing?Thanks!Mike |
|
Zoroaster
Aged Yak Warrior
702 Posts |
Posted - 2007-10-26 : 10:36:03
|
Are you sure it is ASCII 0 and not a square symbol that represents a carriage return? This can be entered into an application when an end user does a copy and paste and pastes in the next line. In the past I had to write a VBScript to remove these but there may be a better way. Future guru in the making. |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2007-10-26 : 10:37:14
|
Select columns from tablewhere col like '%'+char(0)+'%'MadhivananFailing to plan is Planning to fail |
 |
|
Zoroaster
Aged Yak Warrior
702 Posts |
Posted - 2007-10-26 : 10:41:32
|
And char(13) would be for carriage return, if that is it. Future guru in the making. |
 |
|
Van
Constraint Violating Yak Guru
462 Posts |
Posted - 2007-10-26 : 10:50:17
|
I've also saw the "..." copied from emails and pasted into a web app and and entered into a table. The email turns the "..." into some special character and it really messed with web app once the data got stored in the database. They couldn't retrieve the record through the web app or something like that...or not update the record maybe. I would have to update it in SQL and take out the "..." character. |
 |
|
WindChaser
Posting Yak Master
225 Posts |
Posted - 2007-10-26 : 11:18:19
|
Sorry, I used the term "column" when I meant fields. Basically, I'm looking for something like:Update MyTable Set MyAddress = (MyAddressExcludingAscii0) where I need to define the operation which is MyAddressExcludingAscii0 |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2007-10-26 : 12:12:36
|
Update MyTable Set MyAddress Replace(MyAddress, char(0), '') |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2007-10-26 : 12:18:04
|
whoops, left out the =Update MyTable Set MyAddress = Replace(MyAddress, char(0), '') |
 |
|
WindChaser
Posting Yak Master
225 Posts |
Posted - 2007-10-26 : 13:00:56
|
Well, I tried something like this before consulting the forum and bothering you. The query makes sense but, instead, it zaps the address into nothingness... odd. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-10-26 : 13:17:26
|
"The query makes sense but, instead, it zaps the address into nothingness... odd"Suggest you post the EXACT statement you used - in case we can spot some unfortunate side effect!Kristen |
 |
|
WindChaser
Posting Yak Master
225 Posts |
Posted - 2007-10-26 : 13:34:16
|
Kristen,The exact statement is as per Russell's: Update MyTable Set MyAddress = Replace(MyAddress, char(0), ''). It only zaps the data if char(0) is used. For instance, if I say Update MyTable Set MyAddress = Replace(MyAddress, char(5), ''), that works fine and only filters out ASCII 5. |
 |
|
Zoroaster
Aged Yak Warrior
702 Posts |
Posted - 2007-10-26 : 15:04:48
|
That's why I don't think it's CHAR(0), what leads you to believe it is? Future guru in the making. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-10-26 : 17:20:04
|
Works for me:DECLARE @TestData TABLE( MyAddress VARCHAR(50))INSERT @TestDataSELECT 'foo bar' UNION ALLSELECT 'bar' + CHAR(0) + 'foo'SELECT LEN(MyAddress), MyAddressFROM @TestDataUPDATE @TestDataSET MyAddress = REPLACE(MyAddress, char(0), '')SELECT LEN(MyAddress), MyAddressFROM @TestData Kristen |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-10-26 : 17:20:44
|
P.S. Note that 'bar' + CHAR(0) + 'foo' does not display properly in the first resultset, but the LEN is displayed correctly, and decrements by one in the second resultset |
 |
|
WindChaser
Posting Yak Master
225 Posts |
Posted - 2007-10-27 : 12:37:48
|
I know that the ASCII 0 characters are there because I can detect them by creating some VB code which pulls a recordset and does a character-by-character sweep.I did a test as follows:Update MyTable set MyAddress = 'bar' + CHAR(0) + 'foo' which results in the expected alteration to the data. Then,Update MyTable set MyAddress = REPLACE(MyAddress, char(0), '') which definitely zaps the data, nothing left, nada, zilch, niente, zippo. I was using SQL2000 SP3 and updated to SP4 but it did'nt change the behavior. Just as a further test, I instead used Update MyTable set MyAddress = REPLACE(MyAddress, char(0), 'H') and it resulted in zapping everything and leaving a single H, meaning that it did find and replace the ASCII 0. So basically, the operation is doing:If ASC(Character) = 0 then ASC(Character) = 72else Character = NULLendifBest of all, I just tried the test in SQL Server 2005 and, well, it didn't zap the data but it didn't filter out anything either and just left the ASCII 0 in between bar and foo.... |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2007-10-27 : 12:41:15
|
you have more than just char(0) in there sounds like to me.can you post a few smaple records of actual data and your exact update statement? |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2007-10-27 : 12:48:12
|
by the way, i tested what i posted, and what kristen posted in sql 2000 SE, sql2k5 SE, sql2k5 EEworks on all 3.also, how are these getting in the database in the 1st place? note that ascii 0 is a NULL character, and there are many other non-printable characters (that will whack the display) that could be getting in there if u aren't doing validation in the application executing the inserts |
 |
|
WindChaser
Posting Yak Master
225 Posts |
Posted - 2007-10-27 : 13:18:51
|
This is the step by step of what I'm doing in EM:1. I created a test table called MyTable which contains an ID keyed field and a MyAddress field which is a nvarchar 100 that allows nulls. 2. I executed Insert Into MyTable (MyAddress) Values ('blablabla')3. I executed Update MyTable set MyAddress = 'bar' + CHAR(0) + 'foo' which changed 'blablabla' accordingly.4. I executed Update MyTable set MyAddress = REPLACE(MyAddress, char(0), '')I suspect that the ASCII 0 is imported from other data sources when we perform conversions. Many of our clients had products from competitors and just copied their old data. That's why I want to add a final filtering process to remove all unwanted characters.And in writing this I realized something: what if I put the field as char instead of nvarchar? The operation works on a char. Good thing because I was about to send my SQL Server to rehab. So the question now becomes: why does it work with Char and not with nvarchar? |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-10-27 : 14:41:44
|
I still don't understand it. You are getting a different behaviour to the example I posted above. What happens if you use the example I posted above?Are you doing this in a SQL Tool - such as Query Analyser or SSMS? or in some application language?Many application languages will take CHAR(0) to be a string terminator, and thus interpret the data accordingly.Indeed, in the example I posted above the text beyond the CHAR(0) is NOT displayed, although it IS there - as evidenced by the LENgth display.Kristen |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2007-10-27 : 15:59:33
|
ok, i got it. unicode behaves differently as it takes 2bytes storage, not just one. you'll need to convert it to varchar, or manipulate in binary. hopefully you dont have any characters that cant be represented as varchar in thereselect replace(convert(varchar, MyAddress), char(0), '') from myTable |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-10-27 : 16:36:28
|
Different behaviour in SQL 2000 and SQL 2005 for Nvarchar:DECLARE @TestData TABLE( MyAddress Nvarchar(50))INSERT @TestDataSELECT '1foo bar' UNION ALLSELECT '2foo' + NCHAR(0) + 'bar'SELECT LEN(MyAddress), MyAddressFROM @TestDataUPDATE @TestDataSET MyAddress = REPLACE(MyAddress, NCHAR(0), N'')SELECT LEN(MyAddress), MyAddressFROM @TestData SQL 2000 REPLACE wrongly replaces (presumably) the CHAR(0) byte making up the first part of the double-byte character.SQL 2005 fails to replace the character, but does no damage to the remaining string - so it remains unchangedCHARINDEX and STUFF both behave incorrectly, so we can't use those either.CHARINDEX (NCHAR(0), MyAddress) thinks the NCHAR is found at position 0PATINDEX ('%' + NCHAR(0) + '%', MyAddress) thinks it is at position 1Kristen |
 |
|
Next Page
|