Author |
Topic |
jesus4u
Posting Yak Master
204 Posts |
Posted - 2003-02-27 : 13:51:43
|
We have 500 Meg logfiles that need to be parsed but the CRAPPY thing about it is that I first have to eliminate these unnecessary empty spaces at the end of the files for our SQL DTS to work correctly.What would you suggest?Thanks |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-02-27 : 14:16:03
|
I usually don't bother with DTS if there's even the slightest goofiness with the source file. I'd import it straight into a staging table and then process it from there. This will let you simply delete blank rows or any others that don't meet the format you want to transfer. I know this sounds like a dumb approach, but the alternative is to write an ActiveX transformation to read each line and parse it, and with a 500 MB file it will take hours to do.BTW, why are your log files so big? You'd be better off cycling them so that an individual file does not get so large. |
|
|
ValterBorges
Master Smack Fu Yak Hacker
1429 Posts |
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-02-27 : 14:50:11
|
I believe he's refering to:http://www.sqlteam.com/Forums/topic.asp?TOPIC_ID=24111Anyway, I've seen DTS do some very quirky things, and we tend to NOT rely on it. BCP and Bulk Insert do quite well thank you very much.Is there a general concensus about what peoples feelings are toward it? I don't think I'd releease any DTS package to production anyway.Any thoughts appreciated.Brett8-) |
|
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2003-02-27 : 15:02:16
|
I agree with Mr. X - BCP and BULK INSERT all the way. I hate DTS, but only because I'm a code-centric type of person. As a personal note, WYSIWYG is more than an acronym, it really means what it stands for ;). |
|
|
jesus4u
Posting Yak Master
204 Posts |
Posted - 2003-02-27 : 15:10:02
|
quote: I believe he's refering to:http://www.sqlteam.com/Forums/topic.asp?TOPIC_ID=24111Anyway, I've seen DTS do some very quirky things, and we tend to NOT rely on it. BCP and Bulk Insert do quite well thank you very much.Is there a general concensus about what peoples feelings are toward it? I don't think I'd releease any DTS package to production anyway.Any thoughts appreciated.Brett8-)
yep |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-02-27 : 15:33:46
|
DTS is perfectly fine in production, as long as you have control over the format of the files coming in, or the people providing them are diligent in making them clean, structured, and consistent. You can't entirely dismiss DTS though. It is perfect for where you can access the actual database server sources, or have non-text data formats. BULK INSERT and bcp are only good for text files. However, I would use them over DTS if I was importing text files for the same reasons everyone else does: they're easier to set up and use and are less likely to have problems than DTS. |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-02-27 : 16:07:30
|
If you have access to the datasources, why not just creat linked servers? |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-02-27 : 16:15:03
|
DTS is faster, for straight imports, and doesn't require changing settings to query it. Don't know why exactly, but I've had tons of linked queries take 3-4 times longer than a DTS job. And if I have to work on the data repeatedly, I would NOT like to query the linked server multiple times if I can grab a copy locally and work with it instead. There may be other circumstances that favor linked servers, but not in my case. |
|
|
jesus4u
Posting Yak Master
204 Posts |
Posted - 2003-02-27 : 16:20:14
|
Does bulk inserting allow inserting the data into tables of different data types?From the text file "12-02-2003" to the DB? |
|
|
jesus4u
Posting Yak Master
204 Posts |
Posted - 2003-02-27 : 16:21:37
|
Hey everyone! Look at this Note from BOL. Sounds like my original problem will freak Bulk Insert out as well.BOL:Note A hidden character in an ASCII data file can cause problems when trying to bulk copy data into an instance of SQL Server, resulting in an "unexpected null found" error message. Many utilities and text editors display hidden characters which can usually be found at the bottom of the data file. Finding and removing these characters should resolve the problem. |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-02-27 : 16:31:53
|
Thanks for the advice Rob.And No jesus4u, you can not insert data (unless of course it's a character datatype) into non like datatypes, with the exception of inserting a number in to a datetime column. I was pretty shocked at this, and still doesn't make sense to me:Declare @x table (col1 datetime)Insert Into @x (col1) Values(0)Select * from @xBrett8-) |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-02-27 : 16:56:12
|
Dates are internally stored as float values. The integer portion stores the number of days from Jan. 1, 1900, while the decimal portion stores the seconds/milliseconds values from midnight, where 1.0=(1 day)=(86400 seconds). A 12-hour time (12 noon) would be stored as 0.5.Try this:SELECT convert(int, getdate()) orSELECT convert(float, getdate())So, inserting a 0 value into a datetime column will give you a date of Jan. 1, 1900 at midnight. This kind of storage system is what makes the date arithmetic so easy.As far as hidden characters in the log files, as the old saying goes, garbage in, garbage out. If you're talking about a web server log file, there is nothing in there that should cause a problem unless you're using a non-standard logging format, or there's something wrong with the machine logging to that file. It isn't any different from cleaning up any other kind of corrupted data file.As far as using BULK INSERT, you can dump the data straight into a table with a big varchar column (1000 characters or more, depending on the line length) and DO NOT specify any kind of column delimiter. Once the data is imported, you would run a procedure to delete bad rows and then parse out the values, cleaning up any things that might not fit. Once it's in the staging table you use typical SQL techniques, SubString(), Like etc. to identify and extract the data to the proper formats. The beauty of this is that it is very flexible and only requires some very basic SQL commands, certainly far less complicated than an equivalent VB or VBScript/ActiveX transformation code would require. |
|
|
jesus4u
Posting Yak Master
204 Posts |
Posted - 2003-02-28 : 08:45:13
|
thanks for all the suggestions |
|
|
|