Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Unicode (utf-8) in SQL

Author  Topic 

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2006-09-19 : 07:40:19
smart writes "hi
please help me to insert unicode characters in sql
more information:
* SQL server 2000
* filed data type: varchar or nchar or nvarchar
but it does not work properly and all characters converted to Question Mark(?)

please help me
with many thanks
Regards,
SMART"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-09-19 : 07:59:09
nvarchar should work. But only if source information is stored in this to start with.
I fyou have done an INSERT to a regular VARCHAR first, and then changed the VARCHAR column to a NVARCHAR, the data is not converted to unicode.

How does the source information look like?


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2006-09-19 : 08:29:02
quote:
Originally posted by Peso
I fyou have done an INSERT to a regular VARCHAR first, and then changed the VARCHAR column to a NVARCHAR, the data is not converted to unicode.



Yes it is!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-09-19 : 08:31:56
So if I insert NVARCHAR data to a VARCHAR column and later do a ALTER TABLE, the original data is converted back to NVARCHAR?


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2006-09-19 : 08:36:23
SQL Server 2000 doesn't have any knowledge of the utf-8 character encoding. So you won't be able to directly load utf-8 encoded files into a table using BULK INSERT, BCP or DTS.
However, if you were to use any of these, and lied about the encoding (e.g. said it was Window-1252) then I wouldn't expect the result to have question marks in it -- that's usually indicative of down-conversion before it's got to the table.

So I guess the main question is, how are you inserting the data into the tables?
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2006-09-19 : 08:41:45
quote:
Originally posted by Peso

So if I insert NVARCHAR data to a VARCHAR column and later do a ALTER TABLE, the original data is converted back to NVARCHAR?



If you try to insert characters into a varchar column that can't be represented in the character set (as determined from the collation) of that column, the result will be that the characters are down-converted to something that can be represented. Possibly by losing diacritics, but at worst turning into the question mark character.
There are no characters (AFAIK) that can be represented in a varchar column that cannot be represented in an nvarchar column, so there will never be any down-conversion in that direction.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-09-19 : 08:46:31
So all letters in Chinese, Japanese and other Far East alphabets can be converted from and back between VARCHAR and NVARCHAR without loss?


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2006-09-19 : 09:15:29
quote:
Originally posted by Peso

So all letters in Chinese, Japanese and other Far East alphabets can be converted from and back between VARCHAR and NVARCHAR without loss?


Peter Larsson
Helsingborg, Sweden



No !!

Only conversion from VARCHAR to NVARCHAR is lossless
Conversion from NVARCHAR to VARCHAR is lossy since varchar takes 1 byte to store each character while NVARCHAR takes 2 bytes per character.

Of course, this loss will only be noticed in case of characters which can not be represented in varchar data type (like Chinese, Japanese and other Far East alphabets)

Harsh Athalye
India.
"Nothing is Impossible"
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2006-09-19 : 09:16:26
quote:
Originally posted by Peso

So all letters in Chinese, Japanese and other Far East alphabets can be converted from and back between VARCHAR and NVARCHAR without loss?



That would depend on the repertoire of the character set. Presumably, codepages 936, 950, 932, 949 all have some capability in that respect, but it's unlikely to be 'all' of the CJK character represented in nvarchar. Obviously CP-1252 as used by the Latin1_General_ collations, etc. can't represent those characters at all.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-09-19 : 09:21:36
quote:
Originally posted by harsh_athalye

No !!
Sorry Harch, I forgot Indian alphabets, of which I believe there are a few.

I know converting VARCHAR/NVARCHAR is not possible without loss. I just wanted to be sure what Arnold meant.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2006-09-19 : 09:33:34
Since you asked, here's some examples of what round-trips successfully from unicode to varchar and back for the CJK collations:

-- columns named by codepage number:
CREATE TABLE CJKTest (
u_codepoint int NOT NULL,
cp936 varchar(10) COLLATE Chinese_PRC_BIN NOT NULL,
cp950 varchar(10) COLLATE Chinese_Taiwan_Stroke_BIN NOT NULL,
cp932 varchar(10) COLLATE Japanese_BIN NOT NULL,
cp949 varchar(10) COLLATE Korean_Wansung_BIN NOT NULL,
cp1252 varchar(10) COLLATE Latin1_General_BIN NOT NULL
)

INSERT INTO CJKTest
SELECT n, c, c, c, c, c
FROM (
SELECT n, NCHAR(n) AS c
FROM Numbers
WHERE n BETWEEN 32 AND 65535
) AS A

-- what roundtrips ok?
SELECT u_codepoint
FROM CJKTest
WHERE u_codepoint = UNICODE(cp949)
ORDER BY u_codepoint

-- characters in CJK aren't all one byte:
SELECT DATALENGTH(cp936) AS l, COUNT(*)
FROM CJKTest
WHERE u_codepoint = UNICODE(cp936)
GROUP BY DATALENGTH(cp936)

-- the length of a varchar type is in bytes, irrespective of
-- how the codepage encodes characters, so this returns 2:
SELECT LEN(CAST(REPLICATE(NCHAR(25611) COLLATE Chinese_PRC_BIN, 5) AS varchar(5)))

Go to Top of Page

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2006-09-19 : 10:03:11
quote:
Originally posted by Peso

quote:
Originally posted by harsh_athalye

No !!
Sorry Harch, I forgot Indian alphabets, of which I believe there are a few.

I know converting VARCHAR/NVARCHAR is not possible without loss. I just wanted to be sure what Arnold meant.


Peter Larsson
Helsingborg, Sweden



Hey Peter,

Nothing to be sorry about!!

I just posted what I knew...I am sorry if I was poking my head in between two experts' talk

BTW, my name is Harsh, not Harch !

Harsh Athalye
India.
"Nothing is Impossible"
Go to Top of Page
   

- Advertisement -