Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Efficiency of data access with ADO

Author  Topic 

jharris3434
Starting Member

2 Posts

Posted - 2008-12-09 : 14:15:37
I'm updating some VB6 code to move from using Access to using SQL Server 2008 (Express) as a backend. The code used DAO for data access, and I'll be changing it to ADO. In addiction to this, the application is being prepared to run over a VPN, so efficiency of data access is going to become even more important. As this is my first experience with SQL Server, I would really appreciate getting some general advice with the best strategy for this particular situation:

The client code downloads a bunch of email messages from an IMAP server and then loops through each one, reading the headers, scraping the body, etc. Each message will become a 'lead.' This data will later establish
1)the identity of the sender
2)the group that the lead belongs to
IDENTITIES-->GROUPS-->LEADS

The code first checks for the existence of the UID of the message in the DB to see if it has already been written, if not, it adds the message to the database.

So far, it requires running the Exists query n number of times.
But I'm mostly worried about the AddMessageToDatabase routine, which also runs for each new message.

AddMessageToDatabase
This routine, in order to establish the identity of the person submitting the lead, builds an identifying query based on what information it has available from the message. Usually this is an email address. In that case it selects all leads from the Leads table with a matching email address and orders them by DateReceived. If it doesn't find any, it creates a new Identity in the Identities table, a new Group in the Groups table, and a new Lead in the Leads table. If it does find a matching identity, then it moves to the next step.

The next step
If it's found a matching identity, the next rule it tests is whether the latest lead in this Identity recordset is within 30 days of the new lead. If so, new lead gets added to that Group, if not, new Group is created, lead given that GroupID, etc.

I hope that explanation suffices. I can certainly share the code if it will help. The structure of the database is a little unorthodox in the sense that the lowest level of data, Leads, contains all identifying information, and Groups and Identities merely serve as organizing 'containers'. This is an intentional violation of normalization for the purpose of modeling the data on a business-logic level.

Basically, I want to know how to update this code in the most intelligent way possible. Right now it's running the Identity query n times, it is looping through each recordset (although an Identity is at most 10 leads), and it's updating everything in this fashion:

rsGroups.AddNew
rsGroups.Fields("IdentityID") = IdentityID
NewGroupID = rsGroups.Fields("GroupID")
rsGroups.Update
rsLeads.Fields("GroupID") = NewGroupID
rsLeads.Update

I appreciate any and all pointers
-Batching work
-Best practices
-etc
   

- Advertisement -