SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2012 Forums
 Transact-SQL (2012)
 Automatic column normalization and dummy variables
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

analyst79
Starting Member

1 Posts

Posted - 02/07/2013 :  15:21:50  Show Profile  Reply with Quote
A friend recommended I post to this community, saying it was both fast an excellent. So here I am.

I am struggling with a business process of transforming data for analysis that I would like to handle in SQL. I have a CSV file that contains numerical and categorical information. The categorical columns would ideally be identified automatically by containing non-numeric data, with an optional list of columns that would be interpreted as categorical even if a numeric interpretation was possible (such as a list of accounting codes). Before loading the CSV, I will not know how many unique values exist in a column. I may have to list out those columns manually, however. Once the columns are identified as categorical, a new table is created for each column, containing that column's distinct values and an ID column. The categorical column in the original data set is then replaced by a series of columns containing Boolean values for each original value in the column. I will also end up scaling numerical columns so that the largest absolute value is 1. Any help as to code samples or a strategy for solving this would be greatly appreciated. Thanks!

Example:
data.csv:
Category,Balance
A,1000
B,2400
C,800

Raw Data table:
Category,Balance
A,1000
B,-2400
C,800

Category table:
Category_id,Category_value
1,A
2,B
3,C

Normalization table:
Column,Factor
Category,2400

Processed Data table:
Category__A,Category__B,Category__C,Sales__norm
1,0,0,0.4166666666666667
0,1,0,-1
0,0,1,0.3333333333333333

LoztInSpace
Aged Yak Warrior

939 Posts

Posted - 02/07/2013 :  20:22:45  Show Profile  Reply with Quote
You typically don't want to be doing that. You already have all the data you need to work it all out. The rest is a presentation issue, not a data issue.
If you're using SSRS, there are matrix reports that pretty much cover what you are tring to do. A relational database is not the tool for this job. It's just not how they work.
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.05 seconds. Powered By: Snitz Forums 2000