Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Best way to get data across many tables?

Author  Topic 

darc
Starting Member

7 Posts

Posted - 2008-12-03 : 23:49:58
I have a database of stock price data. There are about 500 individual stocks in individual tables, each with an int index as the primary key.

I would like to access the data such that I can query the price data for each stock at a perticular date.

With my test database of only 5 tables i was able to do this quite easilly with a programatically generated join query that looks something like this:

SELECT table1.index table1.price table2.price table3.price ...
FROM table1
JOIN table2 ON (table1.date=table2.date)
JOIN table3 ON (table1.date=table3.date)
...
WHERE table1.index = 100;

So when i went to run the same query on the full database of 500 tables i got an error that i could only include a maximum of 256 tables in a query.

So I'm thinking there must be a better way to go about doing this, anyone have any ideas? I'm up to anything including stored procedures, aliases, restructuring the database. I guess i could split the query into mutliple subqueries but I was hopeing there might be a more elegant solution.

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2008-12-04 : 00:17:34
Store the data for all the stock prices in a single table. Then your query will be easy.


CODO ERGO SUM
Go to Top of Page

darc
Starting Member

7 Posts

Posted - 2008-12-04 : 00:46:41
Well its a little more complicated then I made it out to be since each stock has seven columns, I guess it could be cut down but it would be atleast 5 x 500 columns. I read somewhere that there is a 1024 cap on columns? Dunno, could put each column into its own table but it all seems conceptually messy. Anything that might be more elegant?
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2008-12-04 : 01:06:00
You need to store all 500 stocks in one table and have another table for all the stock prices, and use the same price columns for all stocks.

Each row in the prices table will identify the stock and the date and then have the various prices for that stock on that date. So you can have as many sets of prices for as many stocks on as many dates as you like.

You need to read up on normalized database design, get a book or Google that, or look at sites like
www.sql-tutorial.net
www.firstsql.com/tutor.htm
www.w3schools.com/sql/default.asp
http://databases.about.com/od/databasetraining/a/databasesbegin.htm
http://databases.about.com/od/specificproducts/a/normalization.htm
Go to Top of Page

darc
Starting Member

7 Posts

Posted - 2008-12-04 : 01:39:47
So i would do something like this to get all price data of a certain date from all stocks in the database?

SELECT Symbol, Open, High, Low, Close, Volume
FROM Prices
WHERE Date = '12/1/08'

for a range of dates:

SELECT Symbol, Open, High, Low, Close, Volume
FROM Prices
WHERE Date BETWEEN '11/1/08' AND '12/1/08'
ORDER BY Date;

range for a single stock:

SELECT Symbol, Open, High, Low, Close, Volume
FROM Prices
WHERE Date BETWEEN '11/1/08' AND '12/1/08' AND Symbol = 'MSFT'
ORDER BY Date;


And the other table would contain meta data on each stock?
Go to Top of Page

darc
Starting Member

7 Posts

Posted - 2008-12-04 : 10:52:52
I looked through some of those links and I really had difficulty finding information relating to the kind of database design that fits my needs. Every tutorial focuses on the same relational model with orders, customer info blah blah blah. I checked out the wiki on normalized design and it read like chinese.

Anyways I think I understand what ya'll are suggesting, I just like some confirmation before i go about coding it all. I wasted a significant amount of time building around the original db design and i'd like to avoid doing that again if at all possible.

Baiscally you are suggesting the data be arranged in two tables:

table #1 - containing meta data for each stock. columns like symbol, name, start date, end date. each row is a unique stock.

table #2 - contiaining price data for all stocks. columns like symbol, date, price. each row is an observation of price for a perticular date and stock. the prices of all stocks in the db are in this one table.

So if i got it right so far I have a few other questions:

1) at any point is advantageous to split the table into smaller tables since the number of rows could potenitaly be very high? (500 stocks x 5000 prices)

2) is it all advantageous to sort the data prior to populating the db? and if so is there a way to do it with SMSE or must it be done in the population tool?

oh yeah one othe rquestion thats really annoying me:

3) whats the difference between [] and '', as i understand it '' denotes literals, can this be used for naming columns with reserved words or does it need to be done with []?

Thanks
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2008-12-04 : 20:46:47
You're right on track with your design and NO you do not need to partition that table. 500 X 5000 = 2 and a half million rows - most of the people here would regard that as a small to medium size table. As long as you index the key columns in each table and the columns you're using in your WHERE clauses (the date for example) - you'll struggle to write queries on a table of that size that will be slow (that's not a challenge to write bad queries!).

As for [] vs " - they are used for delimiting identifiers that have special characters in them (such as spaces). [] are the default delimiters and personally I always use them rather than quotes - because quotes are not the default and they are easy to confuse with string delimiters. To use quotes for delimiting identifiers you have to turn on the SET QUOTED_IDENTIFIER option.
See a full description in Books Online
http://msdn.microsoft.com/en-us/library/ms176027.aspx
Go to Top of Page

darc
Starting Member

7 Posts

Posted - 2008-12-05 : 14:14:04
Cool, I'm relieved to hear that i'm on the right track. I went ahead and coded the tools and I really couldn't be happier. The invaluable thing that you mentioned was the indexing which i read up on and think i have a pretty good understanding of. For my Prices table i would be searching by stock and date or both with ordering by date so i went and indexed the table as such:

IX_DateSymbol = (unique, clustered, composite: date -> symbol)
IX_SymbolDate = (unique, composite: symbol -> date)
IX_Date = (date)
IX_Symbol = (symbol)

Pulling data with the queries i had posted earlier was extremely fast, in fact it only took a few seconds to get the entire data set on 100 stocks which really blew my mind lol.

My only real issue at this point is my price table has no primary key. Everything I have read to this point extolls them, but again the are always based on the northwind model so its difficult for me to apply my dbs design with my limited understanding.

Columns in my price table are: date(datetime), symbol(vchar(10)), price(numeric(12,6)).

I guess I could create a unique key based off symbol and date but i'm not sure how i would go about doing that.

So basically, should i have a primary key for this table and if so how do i get the db to create the pk values for me based on the date and symbol?

Thanks a million for all the help by the way, you've really awakened me to the power of databases :)
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2008-12-05 : 19:07:28
There are a couple of different trains of thought on PKs (not on whether or not to have one, you really must create one, rather on how to create one)
There's a good discussion here
http://www.sqlteam.com/forums/topic.asp?Topic_ID=24776

I'd say for what you need, create an identity column and make that the primary key. Also put the clustered index on that column only, you're better off having a smaller primary key in SQL Server because it is used for lookups from all the other indexes.
Go to Top of Page

darc
Starting Member

7 Posts

Posted - 2008-12-07 : 03:50:09
So this is what i came up with for my table definition.

CREATE TABLE [Prices]
(
[Date] datetime NOT NULL,
[Symbol] varchar(10) NOT NULL,
[Open] numeric(12,6),
[High] numeric(12,6),
[Low] numeric(12,6),
[Close] numeric(12,6) NOT NULL,
[Volume] int,
CONSTRAINT PK_Prices PRIMARY KEY ([Date],[Symbol])
)
CREATE UNIQUE INDEX IX_SymbolDate ON Prices ([Symbol],[Date]);
CREATE INDEX IX_Date ON Prices ([Date]);
CREATE INDEX IX_Symbol ON Prices ([Symbol]);

i think that implements all the features you recommended except keeping the pk small, i guess for that i'd have to figure out how to code a bigint that is yyyymmdd & some sort of char conversion on symbol to keep the records clustered properly.
Go to Top of Page

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2008-12-07 : 04:32:08
You do not need ix_symbol because it is already indexed on your symbol/date index. Same with the ix_date - your pk takes care of that.
Your PK is on date so you'll probably be ok with that as your cluster. Don't dick around with the date types converting them to char. Date is fine and anything else will introduce crap into your DB. Your volumes are not big enough to warrant trying anything clever for performance reasons. Get your model right and you'll do a better job and understand more.
Go to Top of Page

darc
Starting Member

7 Posts

Posted - 2008-12-07 : 14:02:38
Yeah figured some of those indexes were redundant but was unsure about which were the most important with my primary query which is this:

SELECT * FROM Prices
WHERE Date BETWEEN '1/1/2007' AND '1/1/2008'
AND Symbol IN ('MSFT','INTC','CSCO')
ORDER BY Date;

i totaly hear yah on sticking with the unoptimized but robust pk.
Go to Top of Page
   

- Advertisement -