Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Update with random values

Author  Topic 

bjoerns
Posting Yak Master

154 Posts

Posted - 2008-08-06 : 10:30:35
Hi there,

I want to update table B with values randomly chosen from table A, for example

create table A(
id int primary key,
value int
)

create table B(
id int primary key,
value int
)

insert into A
select 1, 4 union all
select 2, 4 union all
select 3, 4 union all
select 4, 4 union all
select 5, 2 union all
select 6, 2

insert into B
select 1, NULL union all
select 2, NULL union all
select 3, NULL union all
select 4, NULL union all
select 5, NULL union all
select 6, NULL

update B
set value = (
select top 1 value
from A
where B.id > 0
order by newid()
)
select * from B

The where clause is necessary because an uncorrelated subquery is evaluated only once. My solution is not only ugly, it also performs badly (my real tables have like 500000 rows). Two questions:

(1) The execution plan shows a sort. Does SQL Server really sort the whole table just to find the minimum (top 1)?

(2) Is there a better or more elegant way to update table B the way I want to do it?

Thanks,
Bjoern


(54224 row(s) affected)

The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2008-08-06 : 10:42:49
There is no correlation between A and B so it is not doing whatever you think it is (i.e. choosing from A). It's not clear what you want.
Go to Top of Page

bjoerns
Posting Yak Master

154 Posts

Posted - 2008-08-06 : 10:59:51
Try it with and without where-clause, that's why I posted the code.

With where-clause, it does exactly what I want, namely populate B with random values, but with a similar distribution as in table A.

The question is why SQL Server does a sort to find top 1 value, i.e. the minimum, and how can I avoid that to improve performance.

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-06 : 13:03:47
quote:
Originally posted by bjoerns

Try it with and without where-clause, that's why I posted the code.

With where-clause, it does exactly what I want, namely populate B with random values, but with a similar distribution as in table A.

The question is why SQL Server does a sort to find top 1 value, i.e. the minimum, and how can I avoid that to improve performance.




you've an order by specified. so sql server will order the results out for you in the inner query. i didnt understand that purpose of where clause though. what difference will it make?
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-08-06 : 13:16:00
quote:
Originally posted by bjoerns

Try it with and without where-clause, that's why I posted the code.

With where-clause, it does exactly what I want, namely populate B with random values, but with a similar distribution as in table A.

The question is why SQL Server does a sort to find top 1 value, i.e. the minimum, and how can I avoid that to improve performance.

Because it is assigning a NEWID to every row then ordering the whole thing then taking the TOP rowid.

One thing you might try is SELECTing the DISTINCT values into a table variable with a CLUSTERED INDEX on the value. Then you could use that sub set to order by NEWID instead of the intire source table.
Go to Top of Page

bjoerns
Posting Yak Master

154 Posts

Posted - 2008-08-06 : 13:21:46
Hi Visakh,

it's TOP 1 not TOP 1000 so I expected the analyzer to be smart enough to scan the table for the minimum value instead of sorting the whole thing.

The difference is: without the where clause the query is evaluated once and the whole table B is assigned the same value. With the where clause, it's evaluated for each row of table B, leading to different values.

Bjoern
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-06 : 13:25:19
quote:
Originally posted by bjoerns

Hi Visakh,

it's TOP 1 not TOP 1000 so I expected the analyzer to be smart enough to scan the table for the minimum value instead of sorting the whole thing.

The difference is: without the where clause the query is evaluated once and the whole table B is assigned the same value. With the where clause, it's evaluated for each row of table B, leading to different values.

Bjoern



be it 1 or 1000 it has to first sort based on id generated by newid() to determine the top record(s).
Ok. understood the purpose. thanks.
Go to Top of Page

bjoerns
Posting Yak Master

154 Posts

Posted - 2008-08-06 : 13:30:58
quote:
Because it is assigning a NEWID to every row then ordering the whole thing then taking the TOP rowid.

I know, I just wonder why. Since it's TOP 1 it could simply scan for the minimum value.
quote:
One thing you might try is SELECTing the DISTINCT values into a table

The distinct would lead to evenly distributed values in table B. I want to keep the distribution from table A, in the example 2/3 value 4 and 1/3 value 2.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-08-06 : 13:52:18
With "random" do you mean that you can get 6 4's?
Or should all numbers "shuffle" and be used only once?
Do the A and B tables hold exactly the same primary key numbers?



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

bjoerns
Posting Yak Master

154 Posts

Posted - 2008-08-07 : 04:30:34
No shuffle and 6 4's is okay, yes. The real tables are large.
And no, primary key numbers can be disjoint even.

Thanks,
Bjoern
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-08-07 : 05:14:09
Here is a "shuffle" method. Feel free to modifiy it to deal with your needs.
DECLARE @a TABLE (ID INT PRIMARY KEY, Value INT)

INSERT @a
SELECT 1, 4 UNION ALL
SELECT 2, 4 UNION ALL
SELECT 3, 4 UNION ALL
SELECT 4, 4 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 6, 2

DECLARE @b TABLE (ID INT PRIMARY KEY, Value INT)

INSERT @b
SELECT 1, NULL UNION ALL
SELECT 2, NULL UNION ALL
SELECT 3, NULL UNION ALL
SELECT 4, NULL UNION ALL
SELECT 5, NULL UNION ALL
SELECT 6, NULL

UPDATE t
SET t.Value = a.Value
FROM @b AS t
INNER JOIN (
SELECT ID,
ROW_NUMBER() OVER (ORDER BY NEWID()) AS JoinID
FROM @b
) AS x ON x.ID = t.ID
INNER JOIN (
SELECT ID,
ROW_NUMBER() OVER (ORDER BY NEWID()) AS JoinID,
Value
FROM @a
) AS a ON a.JoinID = x.JoinID

SELECT *
FROM @b



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-08-07 : 06:08:05
[code]-- Prepare sample data
DECLARE @a TABLE (ID INT PRIMARY KEY, Value INT)

INSERT @a
SELECT 1, 4 UNION ALL
SELECT 2, 4 UNION ALL
SELECT 3, 4 UNION ALL
SELECT 4, 4 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 6, 2

DECLARE @b TABLE (ID INT PRIMARY KEY, Value INT)

INSERT @b
SELECT 1, NULL UNION ALL
SELECT 2, NULL UNION ALL
SELECT 3, NULL UNION ALL
SELECT 4, NULL UNION ALL
SELECT 5, NULL UNION ALL
SELECT 6, NULL

-- Prepare staging
DECLARE @Stage TABLE (Value INT, Items INT, Running INT)

INSERT @Stage
SELECT Value,
COUNT(*),
0
FROM @a
GROUP BY Value

-- Do a running total
DECLARE @Running INT

UPDATE @Stage
SET @Running = Running = Items + COALESCE(@Running, -1)

-- Do the actual UPDATE
UPDATE b
SET b.Value = s.Value
FROM @b AS b
INNER JOIN @Stage AS s ON s.Running >= ABS(CHECKSUM(NEWID())) % (@Running + 1)

-- Show the result
SELECT *
FROM @b[/code]


E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

bjoerns
Posting Yak Master

154 Posts

Posted - 2008-08-07 : 07:45:55
Thanks a lot, Peso! The second solution does exactly what I want, and I learned a lot.

As for the shuffle, it is sufficient to randomise once:

UPDATE		t
SET t.Value = a.Value
FROM @b AS t
/*
INNER JOIN (
SELECT ID,
ROW_NUMBER() OVER (ORDER BY NEWID()) AS JoinID
FROM @b
) AS x ON x.joinID = t.ID
*/

INNER JOIN (
SELECT ID,
ROW_NUMBER() OVER (ORDER BY NEWID()) AS JoinID,
Value
FROM @a
) AS a ON a.JoinID = t.ID


But if I run your code, I don't get a shuffle, which boggles me. Aren't two shuffles combined supposed to result in another shuffle?

And the still remaining question is... why doesn't the optimiser simplify a TOP 1 to a scan for the minimum?

Bjoern
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-08-07 : 09:03:42
quote:
Originally posted by bjoerns

And the still remaining question is... why doesn't the optimiser simplify a TOP 1 to a scan for the minimum?
Beacuse of the ORDER BY clause.


E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-08-07 : 09:05:18
quote:
Originally posted by bjoerns

As for the shuffle, it is sufficient to randomise once
I don't think so, because neither @a table nor @b table need to have their ID's in sequential order.
What I do is create a sequential order for each set of data, regardless of sequence in @a or @b.

DECLARE @a TABLE (ID INT PRIMARY KEY, Value INT)

INSERT @a
SELECT 101, 4 UNION ALL
SELECT 202, 4 UNION ALL
SELECT 303, 4 UNION ALL
SELECT 404, 4 UNION ALL
SELECT 505, 2 UNION ALL
SELECT 606, 2

DECLARE @b TABLE (ID INT PRIMARY KEY, Value INT)

INSERT @b
SELECT 1001, NULL UNION ALL
SELECT 2002, NULL UNION ALL
SELECT 3003, NULL UNION ALL
SELECT 4004, NULL UNION ALL
SELECT 5005, NULL UNION ALL
SELECT 6006, NULL

UPDATE t
SET t.Value = a.Value
FROM @b AS t
INNER JOIN (
SELECT ID,
ROW_NUMBER() OVER (ORDER BY NEWID()) AS JoinID
FROM @b
) AS x ON x.ID = t.ID
INNER JOIN (
SELECT ID,
ROW_NUMBER() OVER (ORDER BY NEWID()) AS JoinID,
Value
FROM @a
) AS a ON a.JoinID = x.JoinID

SELECT *
FROM @b


E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

bjoerns
Posting Yak Master

154 Posts

Posted - 2008-08-07 : 10:09:22
But when I run your code I still get results with five or six 4s... that scares me since it's not what I expect :-) I expect an one-to-one matching.

Bjoern


(54224 row(s) affected)

The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-08-07 : 10:21:12
It's because there is a distribution built-in.
Since there are only 2 2's the likelyness to get a 2 is 1/3 (33.33%) because there are also 4 4's and the likelyness to get a 4 is 66.67%. That is what the running total is for.

It's very easy to rewrite so you get a 50/50 chance instead.


E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

bjoerns
Posting Yak Master

154 Posts

Posted - 2008-08-07 : 10:28:52
Sorry if I don't get it but you assign both tables a unique joinID between 1 and 6, join them 1:1 so I expect the result to have always four 4s and two 2s?!

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-08-07 : 10:31:25
Oh, yo meant the SHUFFLE routine?



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-08-07 : 10:37:32
Here is a work-around with a staging table for shuffle routine.
DECLARE @a TABLE (ID INT PRIMARY KEY, Value INT)

INSERT @a
SELECT 101, 4 UNION ALL
SELECT 202, 4 UNION ALL
SELECT 303, 4 UNION ALL
SELECT 404, 4 UNION ALL
SELECT 505, 2 UNION ALL
SELECT 606, 2

DECLARE @b TABLE (ID INT PRIMARY KEY, Value INT)

INSERT @b
SELECT 1001, NULL UNION ALL
SELECT 2002, NULL UNION ALL
SELECT 3003, NULL UNION ALL
SELECT 4004, NULL UNION ALL
SELECT 5005, NULL UNION ALL
SELECT 6006, NULL

DECLARE @Stage TABLE (ID INT, Value INT)

INSERT @Stage
SELECT t.ID,
a.Value
FROM @b AS t
INNER JOIN (
SELECT ID,
ROW_NUMBER() OVER (ORDER BY NEWID()) AS JoinID
FROM @b
) AS x ON x.ID = t.ID
INNER JOIN (
SELECT ROW_NUMBER() OVER (ORDER BY NEWID()) AS JoinID,
Value
FROM @a
) AS a ON a.JoinID = x.JoinID

UPDATE b
SET b.Value = s.Value
FROM @b AS b
INNER JOIN @Stage AS s ON s.ID = b.ID

SELECT *
FROM @b



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

bjoerns
Posting Yak Master

154 Posts

Posted - 2008-08-07 : 11:05:08
Yes, I'm referring to your post from 09:05:18.


b.ID x.joinID a.joinID a.ID
6006 1 1 303
2002 2 2 101
1001 3 3 505
4004 4 4 606
3003 5 5 202
5005 6 6 404


This implies a 1:1 relation between A and B so there should be always four 4s in table B. But there aren't?!

Go to Top of Page
    Next Page

- Advertisement -