Solution to store million of data in a table.

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

Solution to store million of data in a table.

Author

Topic

stan9186
Starting Member

7 Posts

Posted - 2007-06-27 : 02:55:00

Dear all,

i need to design a database table which will store supplier's demand information. 1 supplier will probably have 10000 records and there are posibility that there are 10,000 suppliers. So, in total, the number of records will be 10000 * 10000 = XXXXA LOT XXXX which will be very large number of record to be inserted into a table. So, how can i design an table and structure to cater this scenario? Thanks.

Hope to hear from you..

nr
SQLTeam MVY

12543 Posts

Posted - 2007-06-27 : 04:46:46

Depends on what you need to store and how you need to access it.
I would suspect that you will have a few large suppliers and many small ones rather than the 100,000,000 rows you are suggesting.
Anyway, normalise it so reducing the storage space and think about partitioning.

In pure data terms it's probably not a problem but you need to think about how it needs to be accessed and secured.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

stan9186
Starting Member

7 Posts

Posted - 2007-06-27 : 05:34:50

Thanks nr..

But i was now have a hard time to normalizing the data. The case are like:
1. There are 10000 supplier.
2. 1 supplier (everyone) have 10000 demands.

How do I create a tables to store this records so that the table won't have 10000 X 10000 bulk records.?

Kristen
Test

22859 Posts

Posted - 2007-06-27 : 05:36:32

If you will only need to access the Supplier Demand table by Supplier ID plus date range, or somesuch, then appropriate indexes on the table should mean that performance is acceptable.

If you want to do more "wildcard" type searches that resort to Table Scans then it will be slow, or you will need some heavy-engineering to work out better solutions that will provide acceptable speed

Kristen

Kristen
Test

22859 Posts

Posted - 2007-06-27 : 06:13:08

Table1 : Suppliers (10,000 rows)
Table2: SupplierID plus DemandID (10,000 x 10,000 rows)

If the DemandID is unique (1...N overall, rather than 1...N per Supplier then that can be the PK and you need an index on SupplierID. The index on SupplierID won't be Selective, so I reckon the SupplierID needs to be the Clustered index.

If you want to query all 10,000 rows for Supplier-X I think the performance will be OK with that solution. It will take a little while to retrieve 10,000 rows, but it will take just as long if there happened to all be, say, in their own private table. No real difference performance-wise.

If you want to do "All rows for Supplier-X where Date between This and That OR Part-Number-Y OR Foo & Bar" I don;t see that as being two bad either. Its a sequential scan of all 10,000 rows for that supplier, using the Clustered Index. (So it is NOT a scan of 10,000x10,000 rows).

Any query where you do NOT know the Supplier is, however, likely to be a scan of 10,000x10,000 rows unless you have an appropriate index that IS selective.

Kristen

stan9186
Starting Member

7 Posts

Posted - 2007-06-27 : 06:30:16

Hello Kristen,

Thanks very much for your answer. I give me a new idea. Anyway, is there a way to keep the tables normalized so that we do not need 10000 X 10000 of records in 1 table?. The records are passed thought web service, so, 10000 X 10000 record might make the connection failed also... :(

Kristen
Test

22859 Posts

Posted - 2007-06-27 : 06:46:41

"is there a way to keep the tables normalized so that we do not need 10000 X 10000 of records in 1 table"

Normalised doens;t have to do with the number of rows.

If you store the Supplier Details on every row that would be bad ... if your store the Supplier Details in a separate table, and the SupplierID in every Row of the Demand Table then that will be Normalised.

"records are passed thought web service, so, 10000 X 10000 record might make the connection failed also"

Why would you ever pass all the rows through a web service? I would expect a web service to Target the rows that need to be passed - so "All rows that have changed since Last Time [or Date-X]" or "All rows for Supplier-X between Date-Y and Date-Z"

Transferring 10,000 rows is reasonably painful (e.g. a simple SELECT query that happens to return 10,000 rows to the application). Transferring 10,000x10,000 is going to require VERY specialist programming to make it work at all!

Kristen

stan9186
Starting Member

7 Posts

Posted - 2007-06-27 : 07:04:29

Hi..

I dunno in detail about the web services. I just told by someone about that. Anyway, i am in charge of the database section.

Thanks for you advice.

Kristen
Test

22859 Posts

Posted - 2007-06-27 : 07:05:47

"I just told by someone about that"

OK, well in that case my advise would be to assume they know diddly-squat about it!

nr
SQLTeam MVY

12543 Posts

Posted - 2007-06-27 : 07:22:13

Think you need an architect (or someone who calls themselves by that name).
At least it will give you someone to blame - and it looks like that's going to be neede at some point.

Has anyone involved any experience with dealing with large amounts of data?

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

dbwilson4
Yak Posting Veteran

50 Posts

Posted - 2007-06-27 : 15:20:54

Here's my opinion. Everyone keeps saying "million rows!" or SELECT count queries are slow.

In my application, I have 5 million rows...it takes 0.25 seconds for a query to finish and that is on a medium server.

Indexing should speed up queries BUT there are certain queries where even indexes won't help. For those kind of queries, you run those queries in Database Tuning Enginer Advisor (comes with sql 2005). The advisor will create "statistics" and "partioning" of indexed ids. This will make the queries go from 1 minute execution time to 0.5 second execution time. Yes it will take more harddrive space but harddrives are cheap these days.

I tested this by filling out 5 tables with 11 million records. Then I did PAGE LONG complex queries. Without statistics and partioning, the query took 2 minutes. With statistics and partitioning, it took 0.8 seconds and 10% cpu power (quad-core).

Alot of people are desgining based on industry standards and what others have done....what if those industry standards are wrong???? You'd see a "domino effect" of architectural problems starting with databases.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources