Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 SQL Server Administration (2005)
 Exporting very large datasets

Author  Topic 

sunliners81
Starting Member

4 Posts

Posted - 2008-10-30 : 13:38:33
I'm having some problems with exporting data from a view to text files. Here are the issues;

1. The view returns about 1.2 billion rows.
2. I've partitioned the primary table in this view by date range, but only one processor of four is max at 100% while running this operation.
3. I'm using the following script to output by date.

DECLARE @YEAR CHAR(4), @MONTH VARCHAR(2), @FILENAME VARCHAR(50), @BCP VARCHAR(1000), @SQL VARCHAR(200);
SET @YEAR = '1999';
SET @MONTH = '1';
WHILE (@YEAR <> '2007')
BEGIN
WHILE (@MONTH <> '13')
BEGIN
SET @FILENAME = 'Output_' + @YEAR + '_' + @MONTH + '.TXT';
SET @SQL = 'SELECT * FROM Master_DMI.DMI.ALL_DMI_Comp_View WHERE YEAR(Original_Date) = ' + @YEAR + ' AND MONTH(Original_Date) = ' + @MONTH
SET @BCP = 'bcp "' + @SQL + '" queryout \\StorageServer\OutData\' + @FILENAME + ' -T -w -t^| -r\n'
EXEC xp_cmdshell @BCP;
SET @MONTH = @MONTH + 1;
END
SET @YEAR = @YEAR + 1;
SET @MONTH = 1;

4. Each file takes from 3.5 to 4 hours to write and each file is about 3.5 to 4GB. (So that means that it will take almost two weeks to write!)


Can anyone suggest a better way to write out 1.2 billion rows to flat files?
Should I partition by some other key other than data is my query selects by date?
Other this I might consider, like modifying my current indexes?
I've changed the Recovery Model to Bulk Logged. Is there other Server settings that may help?
One issue is bottle neck issues with our direct attached storage. We have four arrays, and with my partition schema, I may not be spreading the data well enough.


CREATE Partition FUNCTION pfALL_DMI (Datetime)
AS RANGE RIGHT FOR VALUES
('20011231', '20040630', '20061231')
GO

CREATE PARTITION SCHEME psALL_DMI
AS PARTITION pfALL_DMI
TO ( [PRIMARY], [ALL_DMI_01_20011231], [ALL_DMI_20020101_20040630], [ALL_DMI_20040701_20061231]);

CREATE TABLE [DMIALL_DMI
[Record_Code] [varchar](4) NULL,
[Original_Date] [datetime] NOT NULL,
[Contract_Date] [datetime] NULL,
[NDC] [char](11) NULL,
[Contract_CD] [varchar](15) NULL,
[ItemNo] [varchar](20) NULL,
[Material_Group_Code] [varchar](1) NULL,
[Customer_Number] [varchar](10) NULL,
[Code_Unknown] [varchar](10) NULL,
[Vendor_CD] [varchar](10) NULL,
[Unit_Of_Measure] [char](2) NULL,
[Quantity] [int] NULL,
[Total_Cost] [money] NULL,
[ChargeBack] [money] NULL,
[Manufacturer] [varchar](50) NULL,
[BW_SOURCE] [numeric](4, 0) NULL,
[BW_ID] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_ALL_DMI] PRIMARY KEY CLUSTERED
(
[Original_Date] ASC,
[BW_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [psALL_DMI]([Original_Date])
)




Thanks,


Clay

cvraghu
Posting Yak Master

187 Posts

Posted - 2008-10-31 : 02:20:04
I don't have much experience with this. But here are my thoughts -

1. Instead of a single script, why can't multiple scripts( or jobs) with different date ranges run parallely?

2. Instead of Year and Month funtion in where clause, use
'Original_Date between ' + @Month + '/01/' + @YEAR + ' AND ' @Month + '/31/' + @YEAR

3. NOLOCK used in view definition?
Go to Top of Page
   

- Advertisement -