Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Old Forums
 CLOSED - General SQL Server
 proper use of COLLATE in UPDATE statement?

Author  Topic 

CorpDirect
Yak Posting Veteran

92 Posts

Posted - 2006-10-06 : 16:18:58
Hello all,

We have to synchronize data between databases with different collations, and I'm not sure what is the preferred syntax for the update statement. Could you give me your opinions?

First, a sample environment:
-- create tables for testing
CREATE TABLE #temp1 (
col1 NVARCHAR(10) COLLATE Latin1_General_BIN
, col2 NVARCHAR(30) COLLATE Latin1_General_BIN
)
GO

CREATE TABLE #temp2 (
col1 NVARCHAR(10) COLLATE Latin1_General_CI_AS
, col2 NVARCHAR(30) COLLATE Latin1_General_CI_AS
)
GO

-- insert sample data
INSERT INTO #temp1 (
col1
, col2
)
SELECT 'test1', 'This is test row 1'
UNION ALL
SELECT 'test2', 'This is test row 2'
UNION ALL
SELECT 'test3', 'This is test row 3'
GO

INSERT INTO #temp2 (
col1
, col2
)
SELECT 'test1', 'sample data item 1'
UNION ALL
SELECT 'test2', 'sample data item 2'
UNION ALL
SELECT 'test3', 'sample data item 3'
GO

Now, the update statement:
-- update
UPDATE
t2
SET
t2.col2 = t1.col2
FROM
#temp1 t1
, #temp2 t2
WHERE
t1.col1 = t2.col1

This of course fails, saying "Cannot resolve collation conflict for equal to operation." So I need to use the collation name in my WHERE clause to compare columns. The question is, which collation name should I use?
WHERE	t1.col1	=	t2.col1	COLLATE Latin1_General_BIN
OR
WHERE t1.col1 = t2.col1 COLLATE Latin1_General_CI_AS

...and WHY should I use one over the other? Or does it matter at all?

Since this will be a "synchronization" of data, I have to do this in both directions (with of course some other criteria to identify which records to sync which way). If the source, or the target, collation should be specified when doing an update, I would like to know which is preferred and why.

Thanks for your help!

Regards,

Daniel


-- return data to verify the update
SELECT
t1.col1
, t1.col2
, t2.col2
FROM
#temp1 t1
, #temp2 t2
WHERE
t1.col1 = t2.col1 COLLATE Latin1_General_BIN

-- clean up
DROP TABLE #temp1
GO
DROP TABLE #temp2
GO

EugeneZ
Starting Member

26 Posts

Posted - 2006-10-06 : 17:30:09
try:

UPDATE
t2
SET
t2.col2 = t1.col2
FROM
#temp1 t1
, #temp2 t2
WHERE
t1.col1 = t2.col1 COLLATE Latin1_General_CI_AS
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-10-07 : 01:37:02
"and WHY should I use one over the other? Or does it matter at all?"

Arnold is your man for this question, but I reckon if you basically have the same "language" collation and the only difference is BIN v. CI_AS then your choice is based on:

If you use the BIN collation you will be forcing a binary collation. So a difference of only an upper/lowercase letter will be treated as significant. Whereas the CI_AS collation will be case insensitive in that regard - you will know your own data as to which will perform correctly for you.

You also need to consider what happens to accented characters (maybe you haven't got any) which will be treated as ordinary letters - E-acute and E for example.

One approach might be to run some queries to see what slips through the net:

SELECT TOP 1000 t1.col1 , t2.col1
FROM #temp1 t1
, #temp2 t2
WHERE t1.col1 = t2.col1 COLLATE Latin1_General_CI_AS
AND t1.col1 <> t2.col1 COLLATE Latin1_General_BIN

SELECT TOP 1000 t1.col1 , t2.col1
FROM #temp1 t1
, #temp2 t2
WHERE t1.col1 = t2.col1 COLLATE Latin1_General_BIN
AND t1.col1 <> t2.col1 COLLATE Latin1_General_CI_AS

I don't think the second one is capable of finding any differences

(I think you need to change your test data so you have a range of different capitalisations and some accents for t2.col1)

You may also want to check the tables for duplicates in that column, forcing the collation you will use - they may be unique at Latin1_General_BIN but not using Latin1_General_CI_AS!

Kristen
Go to Top of Page

CorpDirect
Yak Posting Veteran

92 Posts

Posted - 2006-10-09 : 11:16:46
Hi Kristen,

That makes perfect sense, now that you put it that way! We do use some accented characters, and having used binary collation for most of our apps, uniqueness may extend to case for some values.

I'll use your clever test (col1 = col1 COLLATE _BIN AND col1 != col1 COLLATE _CI_AS) to check our data for inconsistencies and potential problems. Thanks for the help!

Regards,

Daniel
Go to Top of Page
   

- Advertisement -