Selecting UTF-8 characters from the dB table

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Selecting UTF-8 characters from the dB table

Author

Topic

mbuono
Starting Member

6 Posts

Posted - 2008-07-29 : 15:21:08

Dear All,
Althgough my web application properly converts and displays characters to & from the dB, I'm having and issue with running a query directly from the dB table(s).
It appears that (correct me if I'm wrong) SQL Server 2000 stores a 'special' UTF-8 character as a two-character set.

For example, ö is stored as Ã¶ for columns of type [varchar].

So how do I correctly sort and query a column that possibly contains these characters?

I believe this to have nothing to do with encoding. That is to say, "Ã¶" is NOT the HTML or UCS-2 encoded version of the "ö" UTF-8 character. Is that right? Is this just something special that SQL does?

I don't view changing the datatypes to [nvarchar] or [ntext] as a solution becuase not only is the data already in the tables (and it won't automatically converted), but also SQL Server has a ROW SIZE LIMIT of 8060 when using these types (and that's not long enough).
I don't know how to find the collation of my database, just the table columns (see below). Also, I don't know if it's possible or how to collate a [varchar] column using UTF8 in stead of Latin1_General.

So, my question is: "How do I view the correct characters from my database via SQL Query Analyzer (just as the front-end of my web application does)?"

Example: 'medicaldesc' is a [varchar] or [text]

SELECT medicaldesc FROM table_tbl WHERE actid = '1'

returns... "JÃ¶rg fell down the stairs." but I want it to return "Jörg fell down the stairs.".

Thanks for any help on this.

_MBuono

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2008-07-29 : 22:33:16

>> SQL Server has a ROW SIZE LIMIT of 8060

Not include blob, text column can store 2g characters.

mbuono
Starting Member

6 Posts

Posted - 2008-07-30 : 09:35:04

Agreed, that is why 'text' is ok for me to use, but 'ntext' is NOT, as it contributes to this 8060 limit.

This is an HR system, so lots of text goes into one column. it is also a GLOBALLY-used HR system, which is why so many different characters may be saved to the database.

Any insight as to HOW to correctly read these multi-lingual characters directly from the database?

_MBuono

mbuono
Starting Member

6 Posts

Posted - 2008-08-04 : 17:18:41

The only "work-around" i've found for this solution is to input each character into the front-end of the system and then pair it up (in an Excel file) with its SQL counter-part. Has anyone else encountered this? Is there a better/proper way?

_MBuono

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2008-08-05 : 08:06:04

From your initial question, it looks like you are inserting into varchar columns strings of bytes that represent UTF-8 encoded characters but telling SQL Server that they are Windows-1252 encoded characters.

The front-end of your web application is not "getting it right", it's getting it wrong in exactly the opposite direction: treating strings of Windows-1252 encoded characters as if they were bytes representing UTF-8 encoded characters.

If you only need to store characters in the repertoire of a code page used by a varchar (or char or text) column with a given collation then go ahead and use a varchar column. So if all the character you want to store are represented in Windows code page 1252 then you can any varchar columns with a collation that uses Windows code page 1252 (for example, Latin1_General_CI_AS).

If you need to be able store any character in the Unicode BMP (code points U+0000 - U+FFFF) then you'll have to use an nvarchar (or nchar or ntext) column.

The phrase "UTF-8 character" is a category error.

mbuono
Starting Member

6 Posts

Posted - 2008-08-05 : 18:04:18

Many thanks for the response, Arnold.

If I understand correctly, you are saying my SQL [varchar] columns store 1 UTF-8 encoded character as 2 Windows-1252 encoded characters? (that is, ö is stored as Ã¶)?

I can find Ã and ¶ on the 1252 code page: Ã is the byte 00C3, and ¶ is 00B6. But is there a method to pull these and do this "translation" other than what is built into my web application?

_MBuono

harish_sv
Starting Member

1 Post

Posted - 2009-10-26 : 06:54:39

Hi Mike,

I am facing and need the exact requirement as yours retrive "Ã¶" as ö. Did you able to get this? can you please send me if yes...

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources