|
analyst79
Starting Member
1 Posts |
Posted - 02/07/2013 : 15:21:50
|
A friend recommended I post to this community, saying it was both fast an excellent. So here I am.
I am struggling with a business process of transforming data for analysis that I would like to handle in SQL. I have a CSV file that contains numerical and categorical information. The categorical columns would ideally be identified automatically by containing non-numeric data, with an optional list of columns that would be interpreted as categorical even if a numeric interpretation was possible (such as a list of accounting codes). Before loading the CSV, I will not know how many unique values exist in a column. I may have to list out those columns manually, however. Once the columns are identified as categorical, a new table is created for each column, containing that column's distinct values and an ID column. The categorical column in the original data set is then replaced by a series of columns containing Boolean values for each original value in the column. I will also end up scaling numerical columns so that the largest absolute value is 1. Any help as to code samples or a strategy for solving this would be greatly appreciated. Thanks!
Example: data.csv: Category,Balance A,1000 B,2400 C,800
Raw Data table: Category,Balance A,1000 B,-2400 C,800
Category table: Category_id,Category_value 1,A 2,B 3,C
Normalization table: Column,Factor Category,2400
Processed Data table: Category__A,Category__B,Category__C,Sales__norm 1,0,0,0.4166666666666667 0,1,0,-1 0,0,1,0.3333333333333333 |
|
|
LoztInSpace
Aged Yak Warrior
878 Posts |
Posted - 02/07/2013 : 20:22:45
|
You typically don't want to be doing that. You already have all the data you need to work it all out. The rest is a presentation issue, not a data issue. If you're using SSRS, there are matrix reports that pretty much cover what you are tring to do. A relational database is not the tool for this job. It's just not how they work. |
 |
|