Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Henrik Svensson
Starting Member
25 Posts |
Posted - 2004-12-09 : 05:40:31
|
Hello!I guess this is a well-known problem in these forums, but I keep getting time-outs while searching the forums, so I give it a try with a new thread.Anyway, consider the following:CREATE TABLE #tmpTable(Id int, Name varchar(10))INSERT INTO #tmpTable VALUES(1,'test')INSERT INTO #tmpTable VALUES(2,'test ')SELECT Id, '<'+Name+'>' AS NewName FROM #tmpTable WHERE name = 'test'DROP TABLE #tmpTableThe result-set displayed is:Id NewName ----------- ------------ 1 <test>2 <test >Why is that? Is this behaviour implicit in T-SQL or caused by collation-settings or what? How come Name for Id 2 is stored with a trailing space when it is considered as equal to Name for Id 1?Stupid as it seems, this affects our current database implementation, causing unique rows to be considered as having duplicates.Any ideas would be appreciated.With best regards,Henrik Svensson-----"Rhubarb, that is Barbapapa backwards. It's called a palindrome." |
|
dsdeming
479 Posts |
Posted - 2004-12-09 : 08:15:29
|
Try this:SELECT Id, '<'+Name+'>' AS NewName FROM tmpTable WHERE CAST( name AS varbinary(16)) = CAST( 'test' AS varbinary(16))Dennis |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-12-09 : 08:22:55
|
you could also check the length of both...Corey |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-12-09 : 08:27:32
|
I take that back...BUT you can do:CREATE TABLE #tmpTable(Id int, Name nvarchar(10))INSERT INTO #tmpTable VALUES(1,'test')INSERT INTO #tmpTable VALUES(2,'test ')Declare @name nvarchar(10)Set @name = 'test'SELECT Id, '<'+Name+'>' AS NewName FROM #tmpTable WHERE name+'|' = @name+'|'DROP TABLE #tmpTable Corey |
|
|
Henrik Svensson
Starting Member
25 Posts |
Posted - 2004-12-09 : 09:03:37
|
Hmm... I guess I wasn't too clear. I wanted to know why this is the "standard" behaviour, and whether you can change this behaviour. I can think of several work-arounds. But the easiest way would be to adjust some setting so that:'test' = 'test ' ...is considered false, not true. Is there no easier way to turn this "TRIM-feature" off than to rewrite all code?With best regards,Henrik Svensson-----"Rhubarb, that is Barbapapa backwards. It's called a palindrome." |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2004-12-09 : 09:06:58
|
It does appear that in general, SQL "rtrim's" values in these cases:equality checksgroup by'sunionsselect distinctcount(distinct )unique constraints..basically everything I checked! ...So it's pretty consistent accross the board. I think a lot of this is due to comparing CHAR columns, since they are always padded with spaces, and lots of times you wish this padding to be ignored.Is there a setting you can use to turn this off? or is a collation setting? not sure. I'm sure someone knows.Overall, though, if your data relies on columns having trailing spaces, you may need to alter this approach. Can you give an example of what you are storing, and why values with trailing spaces need to be considered unique?- Jeff |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2004-12-09 : 09:12:35
|
by the way -- preceeding spaces are not trim'ed automatically, so if you use a solution like Corey's, you can just add a trailing character: where a.ID + '|' = b.ID + '|'best might be:where a.ID = b.ID and a.ID + '|' = b.ID + '|'looks redundant, but this way a good index can be used on the first part of the clause to return matching rows, and then the second part will further filter those so only the ones you really want are returned.- Jeff |
|
|
Henrik Svensson
Starting Member
25 Posts |
Posted - 2004-12-09 : 09:41:27
|
I agree. The approach could use some alterations. But as in too many cases this is a quite big database system that has been in use for years, and there is neither time nor resources available to rethink the design all over because of this problem.We have an application that is parsing all sorts of incoming documents and then inserting them into the database. Each record gets a checksum that is also stored with the record for easy identification later on (yes, I know that a checksum is not guaranteed to be unique, since it is just a hash-value). The problem is that when two records are identical, except that one varchar-column has a trailing space character, you get two checksums, thus two records that are considered equal according to the equality operand. But they are not, according to their checksums. This is what I want to avoid, with the least possible effort :-).I also thought it was connected to the treatment of CHAR-columns, but wouldn't it be more consistent that since SQL Server ADDS trailing spaces to CHAR-columns before saving in the table, SQL Server would REMOVE trailing spaces from VARCHAR-cokumns before saving in the table?With best regards,Henrik Svensson-----"Rhubarb, that is Barbapapa backwards. It's called a palindrome." |
|
|
Henrik Svensson
Starting Member
25 Posts |
Posted - 2004-12-09 : 10:13:05
|
Hello again!If you run this:SET ANSI_PADDING OFFCREATE TABLE #tmpTable(Id int, Name varchar(10))INSERT INTO #tmpTable VALUES(1,'test')INSERT INTO #tmpTable VALUES(2,'test ')SELECT Id, '<'+Name+'>' AS NewName FROM #tmpTable WHERE name = 'test'DROP TABLE #tmpTableSET ANSI_PADDING ON...then the behaviour is more consistent. But it does not, however, prevent 'test' = 'test ' from being considered true.It seems I have some rewriting of code to do.With best regards,Henrik Svensson-----"Rhubarb, that is Barbapapa backwards. It's called a palindrome." |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2004-12-09 : 14:51:03
|
quote: [i]The problem is that when two records are identical, except that one varchar-column has a trailing space character, you get two checksums, thus two records that are considered equal according to the equality operand. But they are not, according to their checksums. This is what I want to avoid, with the least possible effort :-)
How are you handling the checksums ?It seems to me that one does not get different checksums with trailing spaces.CREATE TABLE #tmpTable(Id int, Name varchar(10))INSERT INTO #tmpTable VALUES(1,'test')INSERT INTO #tmpTable VALUES(2,'test ')INSERT INTO #tmpTable VALUES(2,'test ')SELECT Id ,checksum(name) ,checksum(name,name) ,checksum(name+name) ,'<'+Name+'>' AS NewNameFROM #tmpTable WHERE name = 'test'DROP TABLE #tmpTable/*Id NewName ----------- ----------- ----------- ----------- ------------ 1 690337 10617521 -2002024277 <test>2 690337 10617521 -1975875575 <test >2 690337 10617521 -1557495255 <test >*/ rockmoose |
|
|
Henrik Svensson
Starting Member
25 Posts |
Posted - 2004-12-12 : 17:40:56
|
quote: How are you handling the checksums ?It seems to me that one does not get different checksums with trailing spaces.
The application was built for SQL Server 6.5. It has evolved since then, but it still uses its own algorithm for checksums, not the CHECKSUM function which, as far as I know, was introduced with SQL Server 2000. Even so, the checksums are typically based on data for a complete record (except the Id- and Checksum-columns), i.e. a checksum of concatenated columns (converted to strings) with a ";" as delimiter.With best regards,Henrik Svensson-----"Rhubarb, that is Barbapapa backwards. It's called a palindrome." |
|
|
Bustaz Kool
Master Smack Fu Yak Hacker
1834 Posts |
Posted - 2004-12-12 : 18:38:34
|
Here is what I think is happening:1) The literal 'test' is of type CHAR(x). Implicit in CHAR is right padding with spaces.2) When you compare your column (Name), which is of type VARchar(10), the literal gets implicitly set to a size of 10 and the varchar(10) gets implicitly cast to a type of CHAR(10).At this point 'test' is actually 'test '. Your two Name values are 'test' with six blanks padded and 'test ' with five blanks padded. The ressult is that are three are exactly equal. When you format your output, the actual base values are used for 'test' and 'test '.One solution would be to cast your literal to a varchar(x).HTH=================================================================Happy Holidays! |
|
|
Henrik Svensson
Starting Member
25 Posts |
Posted - 2004-12-13 : 03:46:54
|
quote: One solution would be to cast your literal to a varchar(x).
I'm afraid you're wrong. The same result is achieved without using literal CHAR-values. If you try a join on two varchar(x)-columns using the =-operand you will find that varchar(x) with trailing blanks is equal to varchar(x) without trailing blanks.If you try this:CREATE TABLE #tmpTable(Id int, Name varchar(10))INSERT INTO #tmpTable VALUES(1,'test')INSERT INTO #tmpTable VALUES(2,'test ')SELECT T1.Id, '<'+T1.Name+'>' AS NewName FROM #tmpTable T1INNER JOIN #tmpTable T2ONT1.Name = T2.NameANDT1.Id <> T2.IdDROP TABLE #tmpTable...you get this:Id NewName ----------- ------------ 1 <test>2 <test >With best regards,Henrik Svensson-----"Rhubarb, that is Barbapapa backwards. It's called a palindrome." |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2004-12-13 : 15:21:17
|
You could rtrim(col1)+rtrim(col2)+...+rtrim(coln)when passing the concatenated string to the checksum function.Not very nice, and you are probably more than aware of it rockmoose |
|
|
|
|
|
|
|