Author |
Topic |
AskSQLTeam
Ask SQLTeam Question
0 Posts |
Posted - 2006-09-19 : 07:40:19
|
smart writes "hiplease help me to insert unicode characters in sqlmore information:* SQL server 2000* filed data type: varchar or nchar or nvarcharbut it does not work properly and all characters converted to Question Mark(?)please help mewith many thanksRegards,SMART" |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-09-19 : 07:59:09
|
nvarchar should work. But only if source information is stored in this to start with.I fyou have done an INSERT to a regular VARCHAR first, and then changed the VARCHAR column to a NVARCHAR, the data is not converted to unicode.How does the source information look like?Peter LarssonHelsingborg, Sweden |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2006-09-19 : 08:29:02
|
quote: Originally posted by PesoI fyou have done an INSERT to a regular VARCHAR first, and then changed the VARCHAR column to a NVARCHAR, the data is not converted to unicode.
Yes it is! |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-09-19 : 08:31:56
|
So if I insert NVARCHAR data to a VARCHAR column and later do a ALTER TABLE, the original data is converted back to NVARCHAR?Peter LarssonHelsingborg, Sweden |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2006-09-19 : 08:36:23
|
SQL Server 2000 doesn't have any knowledge of the utf-8 character encoding. So you won't be able to directly load utf-8 encoded files into a table using BULK INSERT, BCP or DTS.However, if you were to use any of these, and lied about the encoding (e.g. said it was Window-1252) then I wouldn't expect the result to have question marks in it -- that's usually indicative of down-conversion before it's got to the table.So I guess the main question is, how are you inserting the data into the tables? |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2006-09-19 : 08:41:45
|
quote: Originally posted by Peso So if I insert NVARCHAR data to a VARCHAR column and later do a ALTER TABLE, the original data is converted back to NVARCHAR?
If you try to insert characters into a varchar column that can't be represented in the character set (as determined from the collation) of that column, the result will be that the characters are down-converted to something that can be represented. Possibly by losing diacritics, but at worst turning into the question mark character.There are no characters (AFAIK) that can be represented in a varchar column that cannot be represented in an nvarchar column, so there will never be any down-conversion in that direction. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-09-19 : 08:46:31
|
So all letters in Chinese, Japanese and other Far East alphabets can be converted from and back between VARCHAR and NVARCHAR without loss?Peter LarssonHelsingborg, Sweden |
|
|
harsh_athalye
Master Smack Fu Yak Hacker
5581 Posts |
Posted - 2006-09-19 : 09:15:29
|
quote: Originally posted by Peso So all letters in Chinese, Japanese and other Far East alphabets can be converted from and back between VARCHAR and NVARCHAR without loss?Peter LarssonHelsingborg, Sweden
No !!Only conversion from VARCHAR to NVARCHAR is losslessConversion from NVARCHAR to VARCHAR is lossy since varchar takes 1 byte to store each character while NVARCHAR takes 2 bytes per character.Of course, this loss will only be noticed in case of characters which can not be represented in varchar data type (like Chinese, Japanese and other Far East alphabets)Harsh AthalyeIndia."Nothing is Impossible" |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2006-09-19 : 09:16:26
|
quote: Originally posted by Peso So all letters in Chinese, Japanese and other Far East alphabets can be converted from and back between VARCHAR and NVARCHAR without loss?
That would depend on the repertoire of the character set. Presumably, codepages 936, 950, 932, 949 all have some capability in that respect, but it's unlikely to be 'all' of the CJK character represented in nvarchar. Obviously CP-1252 as used by the Latin1_General_ collations, etc. can't represent those characters at all. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-09-19 : 09:21:36
|
quote: Originally posted by harsh_athalye No !!
Sorry Harch, I forgot Indian alphabets, of which I believe there are a few.I know converting VARCHAR/NVARCHAR is not possible without loss. I just wanted to be sure what Arnold meant.Peter LarssonHelsingborg, Sweden |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2006-09-19 : 09:33:34
|
Since you asked, here's some examples of what round-trips successfully from unicode to varchar and back for the CJK collations:-- columns named by codepage number:CREATE TABLE CJKTest ( u_codepoint int NOT NULL, cp936 varchar(10) COLLATE Chinese_PRC_BIN NOT NULL, cp950 varchar(10) COLLATE Chinese_Taiwan_Stroke_BIN NOT NULL, cp932 varchar(10) COLLATE Japanese_BIN NOT NULL, cp949 varchar(10) COLLATE Korean_Wansung_BIN NOT NULL, cp1252 varchar(10) COLLATE Latin1_General_BIN NOT NULL)INSERT INTO CJKTestSELECT n, c, c, c, c, cFROM ( SELECT n, NCHAR(n) AS c FROM Numbers WHERE n BETWEEN 32 AND 65535 ) AS A-- what roundtrips ok?SELECT u_codepointFROM CJKTestWHERE u_codepoint = UNICODE(cp949)ORDER BY u_codepoint-- characters in CJK aren't all one byte:SELECT DATALENGTH(cp936) AS l, COUNT(*)FROM CJKTestWHERE u_codepoint = UNICODE(cp936)GROUP BY DATALENGTH(cp936)-- the length of a varchar type is in bytes, irrespective of-- how the codepage encodes characters, so this returns 2:SELECT LEN(CAST(REPLICATE(NCHAR(25611) COLLATE Chinese_PRC_BIN, 5) AS varchar(5))) |
|
|
harsh_athalye
Master Smack Fu Yak Hacker
5581 Posts |
Posted - 2006-09-19 : 10:03:11
|
quote: Originally posted by Peso
quote: Originally posted by harsh_athalye No !!
Sorry Harch, I forgot Indian alphabets, of which I believe there are a few.I know converting VARCHAR/NVARCHAR is not possible without loss. I just wanted to be sure what Arnold meant.Peter LarssonHelsingborg, Sweden
Hey Peter,Nothing to be sorry about!!I just posted what I knew...I am sorry if I was poking my head in between two experts' talk BTW, my name is Harsh, not Harch !Harsh AthalyeIndia."Nothing is Impossible" |
|
|
|