Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2007-12-12 : 13:30:55
|
Hello,The issue I'm dealing with is the default way SQL handles string comparsion. So, I'm getting a lot of strings that are "equal" to SQL but really are different string. The only way I know how to make comparison "correct" is to CAST the string to VARBINARY. So, I was wondering if anyone might have any other ways to compare unicode strings?Here is a sample of two string that I'm dealing with and some comparisons:DECLARE @T TABLE (ID INT IDENTITY(1,1), Val NVARCHAR(50))INSERT @T (Val) VALUES(NCHAR(108) + NCHAR(32) + NCHAR(1282) + NCHAR(782))INSERT @T (Val) VALUES(NCHAR(108) + NCHAR(32) + NCHAR(1282) + NCHAR(782) + NCHAR(1303))IF (SELECT Val FROM @T WHERE ID = 1) = (SELECT Val FROM @T WHERE ID = 2) PRINT 'EQUAL'ELSE PRINT 'NOT EQUAL'IF (SELECT Val FROM @T WHERE ID = 1) LIKE (SELECT Val FROM @T WHERE ID = 2) PRINT 'LIKE'ELSE PRINT 'NOT LIKE'IF (SELECT DATALENGTH(Val) FROM @T WHERE ID = 1) = (SELECT DATALENGTH(Val) FROM @T WHERE ID = 2) PRINT 'LENGTH EQUAL'ELSE PRINT 'LENGTH NOT EQUAL'IF (SELECT CAST(Val AS VARBINARY) FROM @T WHERE ID = 1) = (SELECT CAST(Val AS VARBINARY) FROM @T WHERE ID = 2) PRINT 'EQUAL'ELSE PRINT 'NOT EQUAL' EDIT: Forgot to mention that I have tried several different COLLATIONs and that didn't seem to work either. |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2007-12-12 : 13:49:59
|
| try with Latin1_General_BIN collation_______________________________________________Causing trouble since 1980blog: http://weblogs.sqlteam.com/mladenpSSMS Add-in that does a few things: www.ssmstoolspack.com <- new version out |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2007-12-12 : 14:06:17
|
| Thanks Spirit, That does work. Unfortunatly, it has the same affect as casting to VARBINARY, cannot use the index. The table is about 270 million rows. Does anyone have any suggestions for how to get Latin1_General_BIN collation in there? Some possbilities off the top of my head are: add a new column with Latin1_General_BIN collation, add a derived (persisted?) column with Latin1_General_BIN collation or change over the offending column to be Latin1_General_BIN collation.Cheers! |
 |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2007-12-12 : 14:21:07
|
| why can't you change the column datatype to different collation?_______________________________________________Causing trouble since 1980blog: http://weblogs.sqlteam.com/mladenpSSMS Add-in that does a few things: www.ssmstoolspack.com <- new version out |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2007-12-12 : 14:48:53
|
| I might be able to do that. I'll have to some impact analysis. |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2007-12-12 : 18:02:33
|
| That's a pretty bizarre example: codepoint U+0517 (= 1303 decimal) isn't even assigned to a character until Unicode 5.1.0, and that's still in beta.0517;CYRILLIC SMALL LETTER RHA;Ll;0;L;;;;;N;;;0516;;0516 |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2007-12-12 : 18:40:43
|
| Unfortunately, I'm not incredibly versed on Unicode. We seem to have a lot of high numbered characters that come in from other languages (Japanese and Chinese being the most prevalent). In order to parse out the individual characters I looped through the string and cast the character to UNICODE: UNICODE(SUBSTRING(@String, @Pos, 1))). That is where I got the 1303 from. There are plenty of others that are 12500+ codes. |
 |
|
|
|
|
|
|
|