Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
nathans
Aged Yak Warrior
938 Posts |
Posted - 2005-02-22 : 14:22:07
|
Guys,I have a wide table [customer] that holds info such as phone, fax, social_security_number, etc.When creating a new customer, I want to cross-reference all existing customers on certain fields, say phone, fax, and ssn and insert any matches into a separate table [data_match] with the employee_id, matched_column, matched_value.I have written the procedure to search the [customer] table for these matches, but with 1 million + rows in the [customer] table this query takes a while, and with adding 1,000 customers per day this solution is not feasible. SO - I have moved the cross-referenced data out into a name/value table that stores the column_name, and column value.Account: 1234 would have the following rows for the 3 searched columns:1234 phone 12345678901234 fax 09876543211234 ssn 123456789 I indexed the value column, and of course this makes the query lightning fast compared to the other solution... But I dont feel great about duplicating the data. Am I going about this wrong? I can index the other columns in the primary table, but this would really be 8 columns I would be indexing (I have included only 3 in this example). Thoughts? |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2005-02-22 : 17:13:33
|
I think it would be simpler to have the indexes on the customer table directly. The other approach doesn't save anything in terms of disk, and you have a lot of additional work to maintain the lookup table.quote: Originally posted by nathans Guys,I have a wide table [customer] that holds info such as phone, fax, social_security_number, etc.When creating a new customer, I want to cross-reference all existing customers on certain fields, say phone, fax, and ssn and insert any matches into a separate table [data_match] with the employee_id, matched_column, matched_value.I have written the procedure to search the [customer] table for these matches, but with 1 million + rows in the [customer] table this query takes a while, and with adding 1,000 customers per day this solution is not feasible. SO - I have moved the cross-referenced data out into a name/value table that stores the column_name, and column value.Account: 1234 would have the following rows for the 3 searched columns:1234 phone 12345678901234 fax 09876543211234 ssn 123456789 I indexed the value column, and of course this makes the query lightning fast compared to the other solution... But I dont feel great about duplicating the data. Am I going about this wrong? I can index the other columns in the primary table, but this would really be 8 columns I would be indexing (I have included only 3 in this example). Thoughts?
Codo Ergo Sum |
 |
|
|
nathans
Aged Yak Warrior
938 Posts |
Posted - 2005-02-22 : 17:56:04
|
| I am somewhat hesitant to index those 8 columns and perform the search on the primary transactional table rather than the lookup table. The primary table holds all the data for all customers... I am only interested in searching the last 45 days of data. The lookup table would only hold this 45 day subset of data (< 1 mil rows) whereas the transactional table is ~10 million rows. |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2005-02-22 : 17:59:03
|
OK, but in your original post, you said you wanted to search all existing customers.If you are going to maintain another table, you can use your approach, or use a flat structure. With only 45,000 ruws (1000/day*45days), it should be fast either way.The trick will be maintaining your lookup data with current information so that you don't miss any additions or updates. That may be easier with a flat structure.quote: Originally posted by nathans I am somewhat hesitant to index those 8 columns and perform the search on the primary transactional table rather than the lookup table. The primary table holds all the data for all customers... I am only interested in searching the last 45 days of data. The lookup table would only hold this 45 day subset of data (< 1 mil rows) whereas the transactional table is ~10 million rows.
Codo Ergo Sum |
 |
|
|
nathans
Aged Yak Warrior
938 Posts |
Posted - 2005-02-22 : 18:04:11
|
| Thanks Michael, I should have reviewed my orig post :) Yes, I am working with a subset of data here, but its still a rather large chunk. Ill be searching email_address, phone, fax, and ssn for both customer / co-customer, so (8) total columns. Any new ideas, now that you have the full story.- Nathan |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2005-02-22 : 18:27:15
|
One thing you might try is storing CHECKSUM columns for each of the eight items, and putting indexes on them. Then you are doing really fast index lookups on INTEGER columns. This scales a lot better when you are doing searches on wide VARCHAR columns.Select CUSTOMER_IDfrom CUSTOMER_LOOKUPwhere SSM_CHECKSUM = checksum(@SSN) and SSN = @SSNunionSelect CUSTOMER_IDfrom CUSTOMER_LOOKUPwhere EMAIL_CHECKSUM = checksum(@EMAIL) and EMAIL = @EMAIL...and so on... quote: Originally posted by nathans Thanks Michael, I should have reviewed my orig post :) Yes, I am working with a subset of data here, but its still a rather large chunk. Ill be searching email_address, phone, fax, and ssn for both customer / co-customer, so (8) total columns. Any new ideas, now that you have the full story.- Nathan
Codo Ergo Sum |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2005-02-23 : 08:13:08
|
| One thing to keep in mind if using Checksum(), it would not work if you wanted to do LIKE matches or other partial matches.How about using an indexed view? This would reduce the amount of duplicated data, as it would only exist in the index, not as a table unto itself. There would be no work in keeping it up-to-date either. |
 |
|
|
nathans
Aged Yak Warrior
938 Posts |
Posted - 2005-02-23 : 12:49:28
|
| Thanks Michael. I will defintely look into that solution. Rob, will using indexed views have any performance impact on the table itself? I need to read up on those but it seems that the view would be compiled everytime a row is inserted / modified in the transactional table? Thanks again guys. |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2005-02-23 : 14:04:20
|
I don't see anything in Books Online that says it would recompile whenever something is added (if it did, it would make no sense to use them). This seems to nail the details:quote: Creating a clustered index on a view stores the data as it exists at the time the index is created. An indexed view also automatically reflects modifications made to the data in the base tables after the index is created, the same way an index created on a base table does. As modifications are made to the data in the base tables, the data modifications are also reflected in the data stored in the indexed view. The requirement that the clustered index of the view be unique improves the efficiency with which SQL Server can find the rows in the index that are affected by any data modification.Indexed views can be more complex to maintain than indexes on base tables. You should create indexes only on views where the improved speed in retrieving results outweighs the increased overhead of making modifications. This usually occurs for views that are mapped over relatively static data, process many rows, and are referenced by many queries.
If you're going to be making lots of updates to existing data, on top of inserting the new stuff, then it might not make sense to use an indexed view for this function.I'd also envisioned the view using a UNION operator to build it, which is not allowed in an indexed view. If there's no other way of transforming the data then indexed views are a dead end. |
 |
|
|
nathans
Aged Yak Warrior
938 Posts |
Posted - 2005-02-23 : 14:47:11
|
| The existing data is not modified often, though new rows are added fairly frequently (1000 rows /day). But like you said, how can we transform the data without the use of the UNION? Indexed views are out?Im thinking of going with the original solution. I can just create a job to run during off-peak hours to populate the table with 45 days worth of data, then cross reference on that during new customer inserts. At least this solution will allow me the ability to add/remove searched columns, and increase/decrease the day interval by just modifying the job... but I still feel like there is a better way.thanks again! |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2005-02-23 : 18:16:23
|
You could use a trigger to populate your lookup table with adds, changes, and deletes and just schedule a job once per day to remove old data. There is some overhead with this, but it would keep your lookup table completely up to date.You could also maintain the lookup table within the stored procedures you are using to maintain the main table, if you are using stored procedures. If the main table is being maintained by many different procedures, the trigger may be easier to implement.quote: Originally posted by nathans The existing data is not modified often, though new rows are added fairly frequently (1000 rows /day). But like you said, how can we transform the data without the use of the UNION? Indexed views are out?Im thinking of going with the original solution. I can just create a job to run during off-peak hours to populate the table with 45 days worth of data, then cross reference on that during new customer inserts. At least this solution will allow me the ability to add/remove searched columns, and increase/decrease the day interval by just modifying the job... but I still feel like there is a better way.thanks again!
Codo Ergo Sum |
 |
|
|
|
|
|
|
|