Invalid non-ASCII character conversion over JDBC

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Invalid non-ASCII character conversion over JDBC

Author

Topic

alistair.black
Starting Member

2 Posts

Posted - 2006-06-19 : 11:01:27

Hi,

I'm working on a database conversion from Sybase to SQL Server 2005 and have hit a wall with a character conversion problem when reading non-ASCII characters (encrypted password string) via JDBC.

My application runs on Solaris and accesses a SQL Server 2005 database via the Microsoft JDBC driver. The server was unfortunately specified as having a SQL_Latin1_General_CP1_CI_AS collation at installation time, and the database being accessed has taken this default. After creation the data was migrated across via DTS.

The invalid character is a dagger '†'. When read over JDBC it is converted to a question mark '?'.

In my original environment a Sybase database was accessed via JDBC driver from Solaris and the correct value was returned. The Sybase database used Latin1_General_BIN as it's collation. By way of experimentation I have modified the default collation sequence within the SQL Server 2005 database, and created a new table to hold the password. I am then able to correctly return strings containing this character from within SQL Server Management Studio, but the same problem still exists when accessing it via JDBC.

I am not sure where to focus my investigation and would be grateful for any useful pointers/advice. To me it looks like it's a JDBC driver issue as with the change in collation it works from a non-JDBC client.

Many thanks

Alistair

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2006-06-20 : 06:24:05

A few things spring to mind:

DAGGER is encoded as 0x86 in CP1252. ISO Latin 1 doesn't have any non-control characters in the range 0x80-0x9F. If the driver thinks it's trying to convert from ISO Latin 1 it might lose that character.

The characters encoded in CP1252 between 0x00-0x7F and 0xA0-0xFF have the 'same' Unicode code points, i.e. U+00 - U+7F and U+A0 - U+FF.
The non-control characters in CP1252 encoded in the range 0x80-0x9F don't. Specifically, DAGGER has Unicode code point U+2020. Anything correctly translating from CP1252 to Unicode absolutely will not convert 0x86 to code point U+86.

Are the things in the encrypted password really characters? If they're really bytes, store them in varbinary, not varchar!

alistair.black
Starting Member

2 Posts

Posted - 2006-06-22 : 03:42:44

Arnold,

Thank you for your response and suggestion. I can confirm that utilising the varbinary data type has resolved the problem and my headache!

Best regards,

Alistair

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources