Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 help with query / db structure for query

Author  Topic 

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2011-02-14 : 15:57:23
Hello,

I am creating an windows application that loops thru 50,000 items. It stores all the properties it retrieves of these items in the database. The loop will run repeatedly so that when I have completed the 50,000 items, it will grab updated data again starting at 50,000.

Occasionally the app gets restarted (or crashes) and I am trying to think of the best way for it to restart where it left off.

This is easy if we are just doing one loop as I can determine which records don't exist.

Could anyone lend a hand with suggestions on how to do it for instances once we have completed a few loops ? (eventually I want to have 100+ historical rows of data for each 'item')


I hope this make sense.

Please let me know if any questions or concerns!! :)

Thanks!
Mike



dataguru1971
Master Smack Fu Yak Hacker

1464 Posts

Posted - 2011-02-14 : 16:09:22
If you want performance to be better, don't do it in loops.

You can update the entire applicable set with one statement and it won't require a loop.



Poor planning on your part does not constitute an emergency on my part.
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2011-02-14 : 16:21:33
I think performance is doing fine, but to give you a better idea its a similar process to googlebot. I am crawling results and putting them into a database, after each domain thats crawled we insert into the DB. I think theres a limit to how much data I can crawl and just keep in memory before I insert ? This is even more of an issue if the application crashes. Also another thing I didn't mention is I am going to eventually be running multiple windows apps from different machines, so they need to update the database somewhat regularly so that the other clients don't crawl the same data.

Thoughts ? :)

Any input is greatly appreciated!

Thanks!
Mike123

Go to Top of Page

dataguru1971
Master Smack Fu Yak Hacker

1464 Posts

Posted - 2011-02-14 : 16:40:23
So is your concern the database side of things?

Am i reading this correctly that you are grabbing domains to be crawled from a table and then after the crawl you insert info into another table?

AND likewise, your concern is that two instances may pull the same records to crawl? Not sure of the fastest/best way to handle it in the program. When it crashes, is it a SQL error that causes it to fail? (perhaps a timeout query?)



Poor planning on your part does not constitute an emergency on my part.
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2011-02-14 : 19:46:21
hello again ..

The windows app has been much more stable lately, but I'm always experimenting with new builds. Its multi threaded and I don't have a huge amount of experience with writing highly optimized multi threaded applications, so am encountering the odd problem. For now I have changed it to just run sequentially so its more stable and to be honest crashing isn't a huge issue anymore. Restarting is however so I still need the application to know where to start over from. The errors are not SQL related, more related to multi threading and like I said I am avoiding the crashes now. The bigger issue is its just such a massive loop (requires day to run thru it) that restarting it is unavoidable with the amount of tweaking we are doing on the app.

You are correct in understanding that I grab the list of domains from a "master" table and then insert all the info into another table yes.
My other concern is that two instances might grab the same domain to pull, but I think I'll just select 1000 records at a time and then mark them 'in process' until they are updated, so I think thats a pretty good solution for that.

My main concern is how to mark records that haven't been crawled the max amount....

Maybe what I need to do is determine the domain with the max amount rows and then select all with less ? Not sure how I would go about this

again any help is greatly appreciated ! :)

thanks!
mike123

Go to Top of Page
   

- Advertisement -