Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Old Forums
 CLOSED - General SQL Server
 Implicit TRIM-function in equals operand?

Author  Topic 

Henrik Svensson
Starting Member

25 Posts

Posted - 2004-12-09 : 05:40:31
Hello!

I guess this is a well-known problem in these forums, but I keep getting time-outs while searching the forums, so I give it a try with a new thread.

Anyway, consider the following:

CREATE TABLE
#tmpTable
(Id int, Name varchar(10))

INSERT INTO #tmpTable VALUES(1,'test')
INSERT INTO #tmpTable VALUES(2,'test ')

SELECT Id, '<'+Name+'>' AS NewName FROM #tmpTable WHERE name = 'test'

DROP TABLE #tmpTable

The result-set displayed is:

Id NewName
----------- ------------
1 <test>
2 <test >

Why is that? Is this behaviour implicit in T-SQL or caused by collation-settings or what? How come Name for Id 2 is stored with a trailing space when it is considered as equal to Name for Id 1?

Stupid as it seems, this affects our current database implementation, causing unique rows to be considered as having duplicates.

Any ideas would be appreciated.


With best regards,
Henrik Svensson

-----
"Rhubarb, that is Barbapapa backwards. It's called a palindrome."

dsdeming

479 Posts

Posted - 2004-12-09 : 08:15:29
Try this:

SELECT Id, '<'+Name+'>' AS NewName FROM tmpTable WHERE CAST( name AS varbinary(16)) = CAST( 'test' AS varbinary(16))

Dennis
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-12-09 : 08:22:55
you could also check the length of both...

Corey
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-12-09 : 08:27:32
I take that back...

BUT you can do:


CREATE TABLE
#tmpTable
(Id int, Name nvarchar(10))

INSERT INTO #tmpTable VALUES(1,'test')
INSERT INTO #tmpTable VALUES(2,'test ')

Declare @name nvarchar(10)
Set @name = 'test'

SELECT Id, '<'+Name+'>' AS NewName FROM #tmpTable WHERE name+'|' = @name+'|'

DROP TABLE #tmpTable


Corey
Go to Top of Page

Henrik Svensson
Starting Member

25 Posts

Posted - 2004-12-09 : 09:03:37
Hmm...

I guess I wasn't too clear. I wanted to know why this is the "standard" behaviour, and whether you can change this behaviour. I can think of several work-arounds. But the easiest way would be to adjust some setting so that:

'test' = 'test '

...is considered false, not true. Is there no easier way to turn this "TRIM-feature" off than to rewrite all code?



With best regards,
Henrik Svensson

-----
"Rhubarb, that is Barbapapa backwards. It's called a palindrome."
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-12-09 : 09:06:58
It does appear that in general, SQL "rtrim's" values in these cases:

equality checks
group by's
unions
select distinct
count(distinct )
unique constraints
..basically everything I checked! ...

So it's pretty consistent accross the board. I think a lot of this is due to comparing CHAR columns, since they are always padded with spaces, and lots of times you wish this padding to be ignored.

Is there a setting you can use to turn this off? or is a collation setting? not sure. I'm sure someone knows.

Overall, though, if your data relies on columns having trailing spaces, you may need to alter this approach. Can you give an example of what you are storing, and why values with trailing spaces need to be considered unique?

- Jeff
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-12-09 : 09:12:35
by the way -- preceeding spaces are not trim'ed automatically, so if you use a solution like Corey's, you can just add a trailing character: where a.ID + '|' = b.ID + '|'

best might be:

where a.ID = b.ID and a.ID + '|' = b.ID + '|'

looks redundant, but this way a good index can be used on the first part of the clause to return matching rows, and then the second part will further filter those so only the ones you really want are returned.

- Jeff
Go to Top of Page

Henrik Svensson
Starting Member

25 Posts

Posted - 2004-12-09 : 09:41:27
I agree. The approach could use some alterations. But as in too many cases this is a quite big database system that has been in use for years, and there is neither time nor resources available to rethink the design all over because of this problem.

We have an application that is parsing all sorts of incoming documents and then inserting them into the database. Each record gets a checksum that is also stored with the record for easy identification later on (yes, I know that a checksum is not guaranteed to be unique, since it is just a hash-value).

The problem is that when two records are identical, except that one varchar-column has a trailing space character, you get two checksums, thus two records that are considered equal according to the equality operand. But they are not, according to their checksums. This is what I want to avoid, with the least possible effort :-).

I also thought it was connected to the treatment of CHAR-columns, but wouldn't it be more consistent that since SQL Server ADDS trailing spaces to CHAR-columns before saving in the table, SQL Server would REMOVE trailing spaces from VARCHAR-cokumns before saving in the table?



With best regards,
Henrik Svensson

-----
"Rhubarb, that is Barbapapa backwards. It's called a palindrome."
Go to Top of Page

Henrik Svensson
Starting Member

25 Posts

Posted - 2004-12-09 : 10:13:05
Hello again!

If you run this:

SET ANSI_PADDING OFF

CREATE TABLE
#tmpTable
(Id int, Name varchar(10))

INSERT INTO #tmpTable VALUES(1,'test')
INSERT INTO #tmpTable VALUES(2,'test ')

SELECT Id, '<'+Name+'>' AS NewName FROM #tmpTable WHERE name = 'test'

DROP TABLE #tmpTable

SET ANSI_PADDING ON

...then the behaviour is more consistent. But it does not, however, prevent 'test' = 'test ' from being considered true.

It seems I have some rewriting of code to do.



With best regards,
Henrik Svensson

-----
"Rhubarb, that is Barbapapa backwards. It's called a palindrome."
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2004-12-09 : 14:51:03
quote:
[i]The problem is that when two records are identical, except that one varchar-column has a trailing space character, you get two checksums, thus two records that are considered equal according to the equality operand. But they are not, according to their checksums. This is what I want to avoid, with the least possible effort :-)


How are you handling the checksums ?
It seems to me that one does not get different checksums with trailing spaces.

CREATE TABLE #tmpTable(Id int, Name varchar(10))

INSERT INTO #tmpTable VALUES(1,'test')
INSERT INTO #tmpTable VALUES(2,'test ')
INSERT INTO #tmpTable VALUES(2,'test ')

SELECT
Id
,checksum(name)
,checksum(name,name)
,checksum(name+name)
,'<'+Name+'>' AS NewName
FROM
#tmpTable WHERE name = 'test'


DROP TABLE #tmpTable

/*
Id NewName
----------- ----------- ----------- ----------- ------------
1 690337 10617521 -2002024277 <test>
2 690337 10617521 -1975875575 <test >
2 690337 10617521 -1557495255 <test >
*/


rockmoose
Go to Top of Page

Henrik Svensson
Starting Member

25 Posts

Posted - 2004-12-12 : 17:40:56
quote:
How are you handling the checksums ?
It seems to me that one does not get different checksums with trailing spaces.


The application was built for SQL Server 6.5. It has evolved since then, but it still uses its own algorithm for checksums, not the CHECKSUM function which, as far as I know, was introduced with SQL Server 2000. Even so, the checksums are typically based on data for a complete record (except the Id- and Checksum-columns), i.e. a checksum of concatenated columns (converted to strings) with a ";" as delimiter.


With best regards,
Henrik Svensson

-----
"Rhubarb, that is Barbapapa backwards. It's called a palindrome."
Go to Top of Page

Bustaz Kool
Master Smack Fu Yak Hacker

1834 Posts

Posted - 2004-12-12 : 18:38:34
Here is what I think is happening:

1) The literal 'test' is of type CHAR(x). Implicit in CHAR is right padding with spaces.
2) When you compare your column (Name), which is of type VARchar(10), the literal gets implicitly set to a size of 10 and the varchar(10) gets implicitly cast to a type of CHAR(10).

At this point 'test' is actually 'test '. Your two Name values are 'test' with six blanks padded and 'test ' with five blanks padded. The ressult is that are three are exactly equal. When you format your output, the actual base values are used for 'test' and 'test '.

One solution would be to cast your literal to a varchar(x).

HTH

=================================================================

Happy Holidays!
Go to Top of Page

Henrik Svensson
Starting Member

25 Posts

Posted - 2004-12-13 : 03:46:54
quote:
One solution would be to cast your literal to a varchar(x).


I'm afraid you're wrong. The same result is achieved without using literal CHAR-values. If you try a join on two varchar(x)-columns using the =-operand you will find that varchar(x) with trailing blanks is equal to varchar(x) without trailing blanks.

If you try this:

CREATE TABLE
#tmpTable
(Id int, Name varchar(10))

INSERT INTO #tmpTable VALUES(1,'test')
INSERT INTO #tmpTable VALUES(2,'test ')

SELECT T1.Id, '<'+T1.Name+'>' AS NewName FROM #tmpTable T1
INNER JOIN #tmpTable T2
ON
T1.Name = T2.Name
AND
T1.Id <> T2.Id

DROP TABLE #tmpTable

...you get this:

Id NewName
----------- ------------
1 <test>
2 <test >



With best regards,
Henrik Svensson

-----
"Rhubarb, that is Barbapapa backwards. It's called a palindrome."
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2004-12-13 : 15:21:17
You could rtrim(col1)+rtrim(col2)+...+rtrim(coln)
when passing the concatenated string to the checksum function.
Not very nice, and you are probably more than aware of it

rockmoose
Go to Top of Page
   

- Advertisement -