Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 500 Meg logfiles

Author  Topic 

jesus4u
Posting Yak Master

204 Posts

Posted - 2003-02-27 : 13:51:43
We have 500 Meg logfiles that need to be parsed but the CRAPPY thing about it is that I first have to eliminate these unnecessary empty spaces at the end of the files for our SQL DTS to work correctly.

What would you suggest?

Thanks

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-02-27 : 14:16:03
I usually don't bother with DTS if there's even the slightest goofiness with the source file. I'd import it straight into a staging table and then process it from there. This will let you simply delete blank rows or any others that don't meet the format you want to transfer. I know this sounds like a dumb approach, but the alternative is to write an ActiveX transformation to read each line and parse it, and with a 500 MB file it will take hours to do.

BTW, why are your log files so big? You'd be better off cycling them so that an individual file does not get so large.

Go to Top of Page

ValterBorges
Master Smack Fu Yak Hacker

1429 Posts

Posted - 2003-02-27 : 14:19:05
Write a simple vb script that opens the file using filesystemobject and remove what you want.

Here is an example
http://groups.google.com/groups?q=filesystemobject+moving+to+end+of+file&hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&selm=eJIK8UQm%24GA.264%40cppssbbsa04&rnum=9

I agree with rob that due to the large size of the files this process will be slow.



Edited by - ValterBorges on 02/27/2003 14:20:35
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2003-02-27 : 14:50:11
I believe he's refering to:

http://www.sqlteam.com/Forums/topic.asp?TOPIC_ID=24111

Anyway, I've seen DTS do some very quirky things, and we tend to NOT rely on it. BCP and Bulk Insert do quite well thank you very much.

Is there a general concensus about what peoples feelings are toward it? I don't think I'd releease any DTS package to production anyway.

Any thoughts appreciated.

Brett

8-)

Go to Top of Page

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2003-02-27 : 15:02:16
I agree with Mr. X - BCP and BULK INSERT all the way. I hate DTS, but only because I'm a code-centric type of person. As a personal note, WYSIWYG is more than an acronym, it really means what it stands for ;).

Go to Top of Page

jesus4u
Posting Yak Master

204 Posts

Posted - 2003-02-27 : 15:10:02
quote:

I believe he's refering to:

http://www.sqlteam.com/Forums/topic.asp?TOPIC_ID=24111

Anyway, I've seen DTS do some very quirky things, and we tend to NOT rely on it. BCP and Bulk Insert do quite well thank you very much.

Is there a general concensus about what peoples feelings are toward it? I don't think I'd releease any DTS package to production anyway.

Any thoughts appreciated.

Brett

8-)





yep

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-02-27 : 15:33:46
DTS is perfectly fine in production, as long as you have control over the format of the files coming in, or the people providing them are diligent in making them clean, structured, and consistent.

You can't entirely dismiss DTS though. It is perfect for where you can access the actual database server sources, or have non-text data formats. BULK INSERT and bcp are only good for text files. However, I would use them over DTS if I was importing text files for the same reasons everyone else does: they're easier to set up and use and are less likely to have problems than DTS.

Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2003-02-27 : 16:07:30
If you have access to the datasources, why not just creat linked servers?

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-02-27 : 16:15:03
DTS is faster, for straight imports, and doesn't require changing settings to query it. Don't know why exactly, but I've had tons of linked queries take 3-4 times longer than a DTS job. And if I have to work on the data repeatedly, I would NOT like to query the linked server multiple times if I can grab a copy locally and work with it instead. There may be other circumstances that favor linked servers, but not in my case.

Go to Top of Page

jesus4u
Posting Yak Master

204 Posts

Posted - 2003-02-27 : 16:20:14
Does bulk inserting allow inserting the data into tables of different data types?

From the text file "12-02-2003" to the DB?

Go to Top of Page

jesus4u
Posting Yak Master

204 Posts

Posted - 2003-02-27 : 16:21:37
Hey everyone! Look at this Note from BOL. Sounds like my original problem will freak Bulk Insert out as well.

BOL:

Note A hidden character in an ASCII data file can cause problems when trying to bulk copy data into an instance of SQL Server, resulting in an "unexpected null found" error message. Many utilities and text editors display hidden characters which can usually be found at the bottom of the data file. Finding and removing these characters should resolve the problem.

Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2003-02-27 : 16:31:53
Thanks for the advice Rob.

And No jesus4u, you can not insert data (unless of course it's a character datatype) into non like datatypes, with the exception of inserting a number in to a datetime column.

I was pretty shocked at this, and still doesn't make sense to me:

Declare @x table (col1 datetime)

Insert Into @x (col1) Values(0)

Select * from @x


Brett

8-)

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-02-27 : 16:56:12
Dates are internally stored as float values. The integer portion stores the number of days from Jan. 1, 1900, while the decimal portion stores the seconds/milliseconds values from midnight, where 1.0=(1 day)=(86400 seconds). A 12-hour time (12 noon) would be stored as 0.5.

Try this:

SELECT convert(int, getdate()) or
SELECT convert(float, getdate())

So, inserting a 0 value into a datetime column will give you a date of Jan. 1, 1900 at midnight. This kind of storage system is what makes the date arithmetic so easy.

As far as hidden characters in the log files, as the old saying goes, garbage in, garbage out. If you're talking about a web server log file, there is nothing in there that should cause a problem unless you're using a non-standard logging format, or there's something wrong with the machine logging to that file. It isn't any different from cleaning up any other kind of corrupted data file.

As far as using BULK INSERT, you can dump the data straight into a table with a big varchar column (1000 characters or more, depending on the line length) and DO NOT specify any kind of column delimiter. Once the data is imported, you would run a procedure to delete bad rows and then parse out the values, cleaning up any things that might not fit. Once it's in the staging table you use typical SQL techniques, SubString(), Like etc. to identify and extract the data to the proper formats. The beauty of this is that it is very flexible and only requires some very basic SQL commands, certainly far less complicated than an equivalent VB or VBScript/ActiveX transformation code would require.

Go to Top of Page

jesus4u
Posting Yak Master

204 Posts

Posted - 2003-02-28 : 08:45:13
thanks for all the suggestions

Go to Top of Page
   

- Advertisement -