SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 SQL bulk insert from mySQL filter out duplicates
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

2revup
Posting Yak Master

102 Posts

Posted - 06/11/2013 :  19:38:30  Show Profile  Reply with Quote
Guys I am doing a bulk insert from mysql, but I wondered if I could do a bulk insert and put a where around it. I want to do this so I don't insert duplicates.

Here is an example of the bulk insert

-- Create table first (Shows the structure in MSSQL)
CREATE TABLE [dbo].[UserRatings_all](
[CaseID] [int] NULL,
[Correspondence_ID] [bigint] NULL,
[UserID] [varchar](max) NULL,
[Rating] [int] NULL,
[Creation_date] [datetime] NULL
)

-- Ok lets insert data
insert into [UserRatings_all]
select * from openquery([support_cases],'select cr.CASE_ID, cr.CORRESPONDENCE_ID, CORRESPONDENCE_AGENT_ID, RATING, UPDATE_DATE FROM case_correspondence_rating cr
Left join d_case_details cd on cr.CASE_Id=cd.CASE_ID
WHERE UPDATE_DATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
limit 6000000;')

As you can see I am doing it from a date range, but there is room for overlap here, and there appears to be some, I need to ensure that the duplicates are killed, and I am only inserting fresh records.

I could do this on 'CORRESPONDENCE_ID' as this is a unique number. any ways that I could do this without bulk inserting into a temp table and then doing a insert from that temp table?

visakh16
Very Important crosS Applying yaK Herder

India
52309 Posts

Posted - 06/12/2013 :  00:51:00  Show Profile  Reply with Quote
what according to you are duplicates? can you illustrate with an example?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

2revup
Posting Yak Master

102 Posts

Posted - 06/12/2013 :  01:35:57  Show Profile  Reply with Quote
sure

Lets say I run the insert as above I would get the below from the date range

CaseID Correspondence_ID UserID Rating Creation_date
12911199333 5570802313 User1 4 6/10/2013
12911163573 5570245363 User2 5 6/10/2013
12911163573 5570515953 User3 5 6/10/2013
12898391011 25016198145 User4 2 6/10/2013
12910760153 5564211803 User1 5 6/10/2013
12898959621 25014914845 User9 2 6/10/2013
12899340751 25015936585 User1 5 6/10/2013
12911177063 5570523243 User6 5 6/10/2013

But my data I might have: (NOTE I have added the 'match' text to the right of a field that will match to the top, essentially a duplicate, I need these matches filtered out before inserting)

12911199333 5570802313 User1 4 6/10/2013 Match
12911229593 5571174303 User6 5 6/10/2013
12911228413 5571214293 asais 5 6/10/2013
12911163573 5570245363 User2 5 6/10/2013 Match
12911163573 5570515953 User3 5 6/10/2013 Match
12999894701 25016321555 User3 5 6/10/2013
12911213413 5571028223 User9 3 6/10/2013
12911213413 5571058303 User8 3 6/10/2013



Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52309 Posts

Posted - 06/12/2013 :  01:45:33  Show Profile  Reply with Quote

insert into [UserRatings_all]
select * from openquery([support_cases],'select cr.CASE_ID, cr.CORRESPONDENCE_ID, CORRESPONDENCE_AGENT_ID, RATING, UPDATE_DATE FROM case_correspondence_rating cr
Left join d_case_details cd on cr.CASE_Id=cd.CASE_ID
WHERE UPDATE_DATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
limit 6000000;')t
where not exists (select 1
                  from [UserRatings_all]
                  where CaseID = t.CaseID
                  and Correspondence_ID = t. Correspondence_ID
                 )


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

2revup
Posting Yak Master

102 Posts

Posted - 06/12/2013 :  01:51:20  Show Profile  Reply with Quote
When this is run I am getting 'Invalid column name 'CaseID''
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52309 Posts

Posted - 06/12/2013 :  01:55:41  Show Profile  Reply with Quote

insert into [UserRatings_all]
select * from openquery([support_cases],'select cr.CASE_ID, cr.CORRESPONDENCE_ID, CORRESPONDENCE_AGENT_ID, RATING, UPDATE_DATE FROM case_correspondence_rating cr
Left join d_case_details cd on cr.CASE_Id=cd.CASE_ID
WHERE UPDATE_DATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
limit 6000000;')t
where not exists (
                  select 1
                  from [UserRatings_all]
                  where CaseID = t.Case_ID
                  and Correspondence_ID = t. Correspondence_ID
                  )


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs

Edited by - visakh16 on 06/12/2013 01:56:56
Go to Top of Page

2revup
Posting Yak Master

102 Posts

Posted - 06/12/2013 :  01:58:21  Show Profile  Reply with Quote
The table I am pulling the data from
= cr.CASE_ID, cr.CORRESPONDENCE_ID

the table I am dumping the data to
= Case_ID, CORRESPONDENCE_ID

So it looks like based on what you have its correct.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52309 Posts

Posted - 06/12/2013 :  02:00:46  Show Profile  Reply with Quote
quote:
Originally posted by 2revup

The table I am pulling the data from
= cr.CASE_ID, cr.CORRESPONDENCE_ID

the table I am dumping the data to
= Case_ID, CORRESPONDENCE_ID

So it looks like based on what you have its correct.


then it should be


insert into [UserRatings_all]
select * from openquery([support_cases],'select cr.CASE_ID, cr.CORRESPONDENCE_ID, CORRESPONDENCE_AGENT_ID, RATING, UPDATE_DATE FROM case_correspondence_rating cr
Left join d_case_details cd on cr.CASE_Id=cd.CASE_ID
WHERE UPDATE_DATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
limit 6000000;')t
where not exists (
                  select 1
                  from [UserRatings_all]
                  where Case_ID = t.Case_ID
                  and Correspondence_ID = t. Correspondence_ID
                  )



------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

2revup
Posting Yak Master

102 Posts

Posted - 06/12/2013 :  02:01:30  Show Profile  Reply with Quote
Dang your right I also missed that completely, and it worked like a charm... I didnt think I could do this with MySQL thank you, thank you , thank you!!!!!
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52309 Posts

Posted - 06/12/2013 :  02:03:19  Show Profile  Reply with Quote
welcome...
so far as its ANSI based it should work!

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.06 seconds. Powered By: Snitz Forums 2000