How to say: Append just the legitimate recs then!

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

How to say: Append just the legitimate recs then!

Author

Topic

Glynner
Starting Member

5 Posts

Posted - 2008-10-16 : 19:16:54

I tried running a T-SQL insert query that tries to insert several records into a table. It raised an error and did not execute the statement because one of the records to be appended would have violated a primary key constraint. So NONE of the records get appended.

In MS Access, you get warned that X records can not be added due to a primary key violation, but it will go ahead an append the remaining records if you OK that.

Is there a way to say in T-SQL "OK, ignore violating records and just insert the legitimate records then!"

EugeneLim11
Posting Yak Master

167 Posts

Posted - 2008-10-17 : 00:41:44

Well Glynner,

if your query is not in a transaction, try this

insert into tablename (columnName, columnName) values (value1, value2) GO

The GO will insert the records if it is ok, and will not insert if it is not ok.

If they are in a transaction, it will go ahead as a batch or rollback as a batch. So you need to check if it is in a transaction, and if so, remove the transaction. ;) That's all.

Glynner
Starting Member

5 Posts

Posted - 2008-10-17 : 06:45:35

Hi EugeneLim11, thanks for your reply.

the query is not in a transaction: it is the only item in a Stored Procedure. Like this:

CREATE PROCEDURE AppendRecords AS
INSERT INTO DestinationTable (Column1, Column2)
SELECT ColumnA, ColumnB FROM SourceTable
GO

In DestinationTable, Column1 is the primary key (Changing this is not an option, it must stay the primary key.)

In SourceTable, the values in ColumnA should be (and nearly always are) unique. But the data actually comes from a different database that I have no control over, and occasionally a duplicated value creeps in to ColumnA, which of course would not be allowed in a primary key column.

I find it quite a severe standard of data integrity that if even ONE of the records to be added would violate the primary key, then SQL Server will raise an error and terminate without appending ANY of the records.

SQL Server takes the violation of the primary key in respect of just ONE record to be so serious that NONE of the specified records will be added. I was just wondering if there was some way to get SQL Server to relax this strict requirement and just all records that don't violate the primary key constraint.

quote:
Originally posted by EugeneLim11

Well Glynner,

if your query is not in a transaction, try this

insert into tablename (columnName, columnName) values (value1, value2) GO

The GO will insert the records if it is ok, and will not insert if it is not ok.

If they are in a transaction, it will go ahead as a batch or rollback as a batch. So you need to check if it is in a transaction, and if so, remove the transaction. ;) That's all.

tosscrosby
Aged Yak Warrior

676 Posts

Posted - 2008-10-17 : 10:12:34

quote:
Originally posted by EugeneLim11

Well Glynner,

if your query is not in a transaction, try this

insert into tablename (columnName, columnName) values (value1, value2) GO

The GO will insert the records if it is ok, and will not insert if it is not ok.

If they are in a transaction, it will go ahead as a batch or rollback as a batch. So you need to check if it is in a transaction, and if so, remove the transaction. ;) That's all.

I don't believe this is true. There is no difference between GO and Ctrl-E or F5 (all S2K options) and will result in nothing being committed to the database. And violating primary key constraints should never be allowed. If you create accurate code you wouldn't be in violation and wanting to "relax the strict requirement". Ever heard of "data integrity"? Post your code and I'm sure someone can come up with a viable solution. Something in the where clause as follows:
insert into tablex select something
where PK_filed not in (select PK_field from tablex where blah, blah, blah)

Really not a difficult concept! - IMHO

Terry

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2008-10-17 : 11:39:48

tosscrosby is absolutely right. (and he has a strangely cool handle)

The insert statement is atomic. If part of it fails it all fails and it does nothing.

He is also absolutely right in saying that you should never want to be able to insert some rows and ignore the others that fail. You should filter them out.

Please post code and data and we can help you (probably very quickly)

If you *must* do what you describe you could perform a loop and insert the records 1 at a time. each insert statement would therefore be atomic for the row only, not the set.

have a google on SQL ACID (probably not what you think it is!)
-------------
Charlie

Glynner
Starting Member

5 Posts

Posted - 2008-10-17 : 17:20:18

OK, I surrender.
Thanks guys, that was very helpful.
I think MS Access has given me naughty ideas and bad habits.
I created a MS Access database, linked in the two tables in my SQL database, created an append query in Access identical to the stored procedure in SQL and tried to run it.
1st warning is:-
"You are about to run an append query that will modify data in your table"
2nd warning is:-
"You are about to append n rows".
3rd warning is:-
"Microsoft Access can't append all the records in the append query ... can't append 1 record(s) due to key violations ... etc
To ignore the error(s) and run the query, click Yes. (yep, that's what it says!)
Do you want to run the action query anyway?" (YES/NO/HELP)

So, from my new understanding, what MS Access really must be offering to do is not to run my query (as this can't be done without violating atomicity or primary key constraints), but rather a modified append query, (just like what Tosscrosby suggested) or loop through the records and run n-1 append queries that each append only one of the records (as Transact Charlie suggested).

Come to think of it, as a data validator and cleaner, if I am given 2 records and one says:-
PersonID: 321, Name: Jack, and the other says:-
PersonID: 321, Name: Jill, I tend to throw them both out as equally unreliable!
In the above scenario, MS Access would append the first record, but not the second.

To my great shame, I have learnt to co-operate with this. I could tell (using other columns I have not mentioned) which record must be the valid one if a PersonID was duplicated, and could easily order the records in my append query so that the valid one would be the first one! Oh dear.

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2008-10-18 : 13:09:37

quote:
Originally posted by Glynner

OK, I surrender.
Thanks guys, that was very helpful.
I think MS Access has given me naughty ideas and bad habits.
I created a MS Access database, linked in the two tables in my SQL database, created an append query in Access identical to the stored procedure in SQL and tried to run it.
1st warning is:-
"You are about to run an append query that will modify data in your table"
2nd warning is:-
"You are about to append n rows".
3rd warning is:-
"Microsoft Access can't append all the records in the append query ... can't append 1 record(s) due to key violations ... etc
To ignore the error(s) and run the query, click Yes. (yep, that's what it says!)
Do you want to run the action query anyway?" (YES/NO/HELP)

So, from my new understanding, what MS Access really must be offering to do is not to run my query (as this can't be done without violating atomicity or primary key constraints), but rather a modified append query, (just like what Tosscrosby suggested) or loop through the records and run n-1 append queries that each append only one of the records (as Transact Charlie suggested).

Come to think of it, as a data validator and cleaner, if I am given 2 records and one says:-
PersonID: 321, Name: Jack, and the other says:-
PersonID: 321, Name: Jill, I tend to throw them both out as equally unreliable!
In the above scenario, MS Access would append the first record, but not the second.

To my great shame, I have learnt to co-operate with this. I could tell (using other columns I have not mentioned) which record must be the valid one if a PersonID was duplicated, and could easily order the records in my append query so that the valid one would be the first one! Oh dear.

You're welcome. Thanks for reading, digesting and understanding the advice. Do you need any more help?

-------------
Charlie

jwcolby
Starting Member

1 Post

Posted - 2008-11-21 : 09:56:25

While I understand the "in compliance with the primary key" thing, there are other reasons for wanting good records "to just go".

I have a table of 97 million records, imported from CSV files. The CSV files were occasionally corrupt. The data imported as a bunch of varchar (50) fields so almost ANYTHING could make it into the tables.

When you have a corruption in CSV files the net effect is that fields start getting slewed, i.e. field 1 ends up in field 2. for some reason, it seems to straighten back out after a record or two, however out of 97 million records I have about 30 or so with bad data. I am NEVER going to get good data for those records, I need to trash them.

How do you find this kind of "bad data" in 97 million records?

Accompanying the CSV files was a "flat file" kind of document that told me what the fields were SUPPOSED to be (field widths). I went in and set up a "template" table with the correct field widths, and additionally setting up integer fields for the numeric fields etc. I then merrily started trying to append the data from the initial "varchar(50)" fields into their matching fields in the "real" table. It works just fine UNTIL I hit a bad data record where data was "slewed", then THAT record won't go.

However... due to the "transactional" nature of SQL Server it rolls back the entire append.

This is a good example of where a "ignore errors" makes sense and is precisely the right thing to do. the bad data is never going to be corrected, and I just need those records dropped and the good records appended.

So in fact it isn't quite so cut and dried as "you shouldn't ever do this kind of thing". I am in fact faced with manually tracking down 20 or 30 "bad data" records and manually deleting them from 97 million records. If you think that is not a PITA, think again.

JWC

robvolk
Most Valuable Yak

15732 Posts

Posted - 2008-11-21 : 10:42:05

To answer the original question, this should do what you want:

CREATE PROCEDURE AppendRecords AS
INSERT INTO DestinationTable (Column1, Column2)
SELECT ColumnA, ColumnB FROM SourceTable A
WHERE NOT EXISTS(SELECT * FROM DestinationTable WHERE Column1=A.ColumnA)
GO

robvolk
Most Valuable Yak

15732 Posts

Posted - 2008-11-21 : 10:56:33

jwcolby: I think Integration Services is a better solution for your problem. It can insert the good rows into a table and either ignore or redirect the bad data to another file or table so you can continue working on them. Although you could do this in TSQL (you've already got a good start with your staging table) it's probably easier to let SSIS do the work.

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2008-11-21 : 11:14:56

The Solution to insert no duplicates by using WHERE NOT EXISTS is failing when there are MORE then 2 rows with the same value for the unique destination field.
However imho it is a good idea to take only the good data and examine the "bad" data before insert one of them.
Not tested but something like this:

insert dest_table
select col1_PK,col2
where not exists(select id from source_table where id=col1_PK group by id having count(*)>1)
from source_table

Greetings
Webfred

No, you're never too old to Yak'n'Roll if you're too young to die.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources