Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 utf-8 xml ingest into table

Author  Topic 

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2011-04-22 : 18:33:13
Greetings

I just ran into this strange anomaly in SQL server 2008 R2. We get xml files from an outside vendor. All was working fine until we ran into an xml file that had text such as à. Collation on the server us SQL_Latin2_General_CP1_CI_AS, but database is SQL_Latin2_General_CP1_CI_AI. Could the latter be the issue that I cannot ingest this xml file via SQLXMLBulkLoad 4.0. Seems like any BULK insert does not sit well with sql 2008. Yikes!



If you don't have the passion to help people, you have no passion

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-04-23 : 07:48:02
bulk insert and bcp don't support utf-8 (they used to - it was dropped and causes a lot of problems, ssis does though, part of MS forcing this to be the import application).
I don't know about sqlxmlbulkLoad but sounds as though it has the same issue. For ascii chanracters utf-8 encoding is the same as ascii so won't be a problem - it's only when the multi byte encoding crops up (utf-8 is up to 5 bytes per character). Note that sql server is a fixed 2 bytes per character so there will always be data loss impoting utf-8.

I've had to use a unix based database (DB2) to import before and then import to sql server so that the original data was available if needed - also the import was a lot faster as the database used utf-8 as a native encoding. Given that a lot of internet data arrives as utf-8 it seems a mistake for MS not to support it.

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-04-23 : 07:48:17
It's UTF-8 that is the problem. Ask your vendor to use UTF-16 instead.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2011-04-25 : 11:22:53
Thanks for the bad new folks. :(
What the heck was MS thinking!

If you don't have the passion to help people, you have no passion
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-04-25 : 17:20:35
If you are familiar with Connect, make a suggestion at connect.microsoft.com that SQL Server need to support UTF-8 too.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2011-04-25 : 19:20:17
ok I just did and I have an ID yippy! 664737

If you don't have the passion to help people, you have no passion
Go to Top of Page
   

- Advertisement -