MIN/MAX Across Multiple Columns

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Script Library

MIN/MAX Across Multiple Columns

Author

Topic

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-07-25 : 17:55:42

I have seen questions posted a number of times where someone wants to find the maximum or minimum value from a set of columns in a single row of a table.

This script demonstrates two methods for finding the maximum value in a row across a set of columns in the row when any or all of the columns are allowed to be null or equal.

Method 1 uses a UNION ALL sub query for all the columns with a MAX. It is much simpler to code and test, especially when you get much past 4 columns. Adding another column is as simple as adding one more SELECT to the subquery.

Method 2 uses a CASE statement to determine the MAX. It is much more complex to code (and test), and gets exponentially harder to code as the number of columns goes up. I think 5 or 6 columns may be about the limit of complexity of coding that you would want to take on. One advantage of this script is that you can use the simpler to code Method 1 to test the more complex code for the Method 2 if you choose to implement it as a CASE statement.

If you have another method you would like to contribute, feel free. Also, if anyone wants to post performance test results, that would be nice.



print 'Create table to hold test data'
create table #t (
	number int not null primary key clustered,
	Val1 int,
	Val2 int,
	Val3 int,
	Val4 int
)
GO
print 'Load test data'
insert into #t
select
	number,
	-- Generate random numbers
	-- with about 1/7th null
	case
	when abs(checksum(newid()))%7 = 0
	then null 
	else checksum(newid())%1000000
	end, 
	case
	when abs(checksum(newid()))%7 = 0
	then null 
	else checksum(newid())%1000000
	end, 
	case
	when abs(checksum(newid()))%7 = 0
	then null 
	else checksum(newid())%1000000
	end, 
	case
	when abs(checksum(newid()))%7 = 0
	then null 
	else checksum(newid())%1000000
	end
from
	-- Load one million rows of test data.
	-- Number table function here
	-- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685
	dbo.F_TABLE_NUMBER_RANGE(1,1000000)
go
print 'Find rows that do not match for Method 1 and Method 2'
select
	out1.*,
	out2.*
from
(
-- Method 1, using a subquery with a max
select
	a.number,
 	a.Val1,
 	a.Val2,
 	a.Val3,
 	a.Val4,
 	[Max_of_Val1_to_Val4] =     
	  	(
	  	select
	   		X1= max(bb.xx)
	  	from
	   		(
	   		select xx = a.Val1 where a.Val1 is not null union all
	   		select xx = a.Val2 where a.Val2 is not null union all
	   		select xx = a.Val3 where a.Val3 is not null union all
	   		select xx = a.Val4 where a.Val4 is not null
	   		) bb
	  	)

from
 	#t a
) out1
join
(
-- Method 2, using a case
select
	a.number,
 	a.Val1,
 	a.Val2,
 	a.Val3,
 	a.Val4,
   	[Max_of_Val1_to_Val4] =  
	 	case
	 	when  a.Val1 is not null   and
	  		(a.Val1 >= a.Val2 or a.Val2 is null) and
	  		(a.Val1 >= a.Val3 or a.Val3 is null) and
	  		(a.Val1 >= a.Val4 or a.Val4 is null)
	 	then 	a.Val1
	 	when  	a.Val2 is not null   and
	  		(a.Val2 >= a.Val1 or a.Val1 is null) and
	  		(a.Val2 >= a.Val3 or a.Val3 is null) and
	  		(a.Val2 >= a.Val4 or a.Val4 is null)
	 	then 	a.Val2
	 	when  	a.Val3 is not null   and
	  		(a.Val3 >= a.Val1 or a.Val1 is null) and
	  		(a.Val3 >= a.Val2 or a.Val2 is null) and
	  		(a.Val3 >= a.Val4 or a.Val4 is null)
	 	then 	a.Val3
	 	when  	a.Val4 is not null   and
	  		(a.Val4 >= a.Val1 or a.Val1 is null) and
	  		(a.Val4 >= a.Val2 or a.Val2 is null) and
	  		(a.Val4 >= a.Val3 or a.Val3 is null)
	 	then a.Val4
	 	else null
	 	end
from
 	#t a
) out2
on out1.number = out2.number
where
	-- Look for results that do not match
	(out1.[Max_of_Val1_to_Val4] is null and  out2.[Max_of_Val1_to_Val4] is not null) or
	(out1.[Max_of_Val1_to_Val4] is not null and out2.[Max_of_Val1_to_Val4] is null) or
	out1.[Max_of_Val1_to_Val4] <> out2.[Max_of_Val1_to_Val4]
go
print 'Find count of rows with different columns null'
print 'Should have a rowcount of 16 to test all conditions'
select
	Null_Column_Conditions =
	case when Val1 is null then 0 else 1000 end+ 
	case when Val2 is null then 0 else 0100 end+ 
	case when Val3 is null then 0 else 0010 end+ 
	case when Val4 is null then 0 else 0001 end,
	count(*)
from
	#t
group by
	case when Val1 is null then 0 else 1000 end+ 
	case when Val2 is null then 0 else 0100 end+ 
	case when Val3 is null then 0 else 0010 end+ 
	case when Val4 is null then 0 else 0001 end
order by
	1
go
drop table #t

Results:



Create table to hold test data
Load test data

(1000000 row(s) affected)

Find rows that do not match for Method 1 and Method 2

(0 row(s) affected)

Find count of rows with different columns null
Should have a rowcount of 16 to test all conditions
Null_Column_Conditions             
---------------------- ----------- 
0                      395
1                      2444
10                     2560
11                     14760
100                    2400
101                    14955
110                    14843
111                    90206
1000                   2518
1001                   14857
1010                   14989
1011                   90256
1100                   15100
1101                   89659
1110                   89783
1111                   540275

(16 row(s) affected)

CODO ERGO SUM

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-25 : 18:31:26

RMHCOMMON?

E 12°55'05.76"
N 56°04'39.42"

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-07-26 : 18:08:26

I ran the following script with 10,000, 100,000, and 1,000,000 rows of test data to test the relative performance of the Subquery method vs. the CASE method.

It looks like there is a large difference, 5 to 1, in run time with 1,000,000 rows.

With 10,000 rows, and 100,000 rows the Subquery method sometimes is faster than the CASE method. This may be because it takes longer to compile a query plan; it is slower on the first run, but faster on the second run.


drop  table #t
go
print 'Create table to hold test data'
create table #t (
	number int not null primary key clustered,
	Val1 int,
	Val2 int,
	Val3 int,
	Val4 int,
	MaxVal int
)
GO
print 'Load test data'
declare @t table (
	number int not null primary key clustered,
	Val1 int,
	Val2 int,
	Val3 int,
	Val4 int
)

insert into @t
select	top 100 percent
	number,
	-- Generate random numbers
	-- with about 1/7th null
	Val1 =
	case
	when abs(checksum(newid()))%7 = 0
	then null 
	else checksum(newid())%1000000
	end, 
	Val2 =
	case
	when abs(checksum(newid()))%7 = 0
	then null 
	else checksum(newid())%1000000
	end, 
	Val3 =
	case
	when abs(checksum(newid()))%7 = 0
	then null 
	else checksum(newid())%1000000
	end, 
	Val4 =
	case
	when abs(checksum(newid()))%7 = 0
	then null 
	else checksum(newid())%1000000
	end
from
	-- Load one million rows of test data.
	-- Number table function here
	-- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685
	dbo.F_TABLE_NUMBER_RANGE(1,1000000) a
order by
	a.number

insert into #t
select	top 100 percent
	aa.number,
 	aa.Val1,
 	aa.Val2,
 	aa.Val3,
 	aa.Val4,
 	[MaxVal] =     
	  	(
	  	select
	   		X1= max(bb.xx)
	  	from
	   		(
	   		select xx = aa.Val1 where aa.Val1 is not null union all
	   		select xx = aa.Val2 where aa.Val2 is not null union all
	   		select xx = aa.Val3 where aa.Val3 is not null union all
	   		select xx = aa.Val4 where aa.Val4 is not null
	   		) bb
	  	)
from
	@t aa
order by
	aa.number
GO
print 'Scan table to cache data'
declare @st datetime
declare @count int
set @st = getdate()

select
	@count =
	max(isnull(aa.MaxVal,0))+
	max(isnull(aa.Val1,0))+
	max(isnull(aa.Val2,0))+
	max(isnull(aa.Val3,0))+
	max(isnull(aa.Val4,0))
from
	#t aa


select Elapsed_Time_SCAN = right(convert(varchar(30),getdate()-@st,121),12)

go
print 'Test Elapsed time for Method 1, using Subquery'
declare @st datetime
declare @count int
set @st = getdate()

select
	@count = count(*)
from
	(
	select
		a.*,
	 	[Max_of_Val1_to_Val4] =     
		  	(
		  	select
		   		X1= max(bb.xx)
		  	from
		   		(
		   		select xx = a.Val1 where a.Val1 is not null union all
		   		select xx = a.Val2 where a.Val2 is not null union all
		   		select xx = a.Val3 where a.Val3 is not null union all
		   		select xx = a.Val4 where a.Val4 is not null
		   		) bb
		  	)
	
	from
	 	#t a
	) aa
where
	aa.MaxVal <> aa.[Max_of_Val1_to_Val4]

select Elapsed_Time_Subquery = right(convert(varchar(30),getdate()-@st,121),12)

go
print 'Test Elapsed time for Method 2, using CASE'
declare @st datetime
declare @count int
set @st = getdate()

select
	@count = count(*)
from
	(
	select
		a.*,
   		[Max_of_Val1_to_Val4] =  
	 	case
	 	when  a.Val1 is not null   and
	  		(a.Val1 >= a.Val2 or a.Val2 is null) and
	  		(a.Val1 >= a.Val3 or a.Val3 is null) and
	  		(a.Val1 >= a.Val4 or a.Val4 is null)
	 	then 	a.Val1
	 	when  	a.Val2 is not null   and
	  		(a.Val2 >= a.Val1 or a.Val1 is null) and
	  		(a.Val2 >= a.Val3 or a.Val3 is null) and
	  		(a.Val2 >= a.Val4 or a.Val4 is null)
	 	then 	a.Val2
	 	when  	a.Val3 is not null   and
	  		(a.Val3 >= a.Val1 or a.Val1 is null) and
	  		(a.Val3 >= a.Val2 or a.Val2 is null) and
	  		(a.Val3 >= a.Val4 or a.Val4 is null)
	 	then 	a.Val3
	 	when  	a.Val4 is not null   and
	  		(a.Val4 >= a.Val1 or a.Val1 is null) and
	  		(a.Val4 >= a.Val2 or a.Val2 is null) and
	  		(a.Val4 >= a.Val3 or a.Val3 is null)
	 	then a.Val4
	 	else null
	 	end
	from
	 	#t a
	) aa
where
	aa.MaxVal <> aa.[Max_of_Val1_to_Val4]

select Elapsed_Time_CASE = right(convert(varchar(30),getdate()-@st,121),12)
go
print 'Test Elapsed time for Method 1, using Subquery'
declare @st datetime
declare @count int
set @st = getdate()

select
	@count = count(*)
from
	(
	select
		a.*,
	 	[Max_of_Val1_to_Val4] =     
		  	(
		  	select
		   		X1= max(bb.xx)
		  	from
		   		(
		   		select xx = a.Val1 where a.Val1 is not null union all
		   		select xx = a.Val2 where a.Val2 is not null union all
		   		select xx = a.Val3 where a.Val3 is not null union all
		   		select xx = a.Val4 where a.Val4 is not null
		   		) bb
		  	)
	
	from
	 	#t a
	) aa
where
	aa.MaxVal <> aa.[Max_of_Val1_to_Val4]

select Elapsed_Time_Subquery = right(convert(varchar(30),getdate()-@st,121),12)

go
print 'Test Elapsed time for Method 2, using CASE'
declare @st datetime
declare @count int
set @st = getdate()

select
	@count = count(*)
from
	(
	select
		a.*,
   		[Max_of_Val1_to_Val4] =  
	 	case
	 	when  a.Val1 is not null   and
	  		(a.Val1 >= a.Val2 or a.Val2 is null) and
	  		(a.Val1 >= a.Val3 or a.Val3 is null) and
	  		(a.Val1 >= a.Val4 or a.Val4 is null)
	 	then 	a.Val1
	 	when  	a.Val2 is not null   and
	  		(a.Val2 >= a.Val1 or a.Val1 is null) and
	  		(a.Val2 >= a.Val3 or a.Val3 is null) and
	  		(a.Val2 >= a.Val4 or a.Val4 is null)
	 	then 	a.Val2
	 	when  	a.Val3 is not null   and
	  		(a.Val3 >= a.Val1 or a.Val1 is null) and
	  		(a.Val3 >= a.Val2 or a.Val2 is null) and
	  		(a.Val3 >= a.Val4 or a.Val4 is null)
	 	then 	a.Val3
	 	when  	a.Val4 is not null   and
	  		(a.Val4 >= a.Val1 or a.Val1 is null) and
	  		(a.Val4 >= a.Val2 or a.Val2 is null) and
	  		(a.Val4 >= a.Val3 or a.Val3 is null)
	 	then a.Val4
	 	else null
	 	end
	from
	 	#t a
	) aa
where
	aa.MaxVal <> aa.[Max_of_Val1_to_Val4]

select Elapsed_Time_CASE = right(convert(varchar(30),getdate()-@st,121),12)
go

Results:


--------------------------------------------------------------------
-- Test with 10,000 rows of data
--------------------------------------------------------------------
Create table to hold test data
Load test data

(10000 row(s) affected)


(10000 row(s) affected)

Scan table to cache data
Elapsed_Time_SCAN 
----------------- 
00:00:00.017

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery 
--------------------- 
00:00:00.080

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE 
----------------- 
00:00:00.110

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery 
--------------------- 
00:00:00.063

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE 
----------------- 
00:00:00.017

(1 row(s) affected)



--------------------------------------------------------------------
-- Test with 100,000 rows of data
--------------------------------------------------------------------
Create table to hold test data
Load test data

(100000 row(s) affected)


(100000 row(s) affected)

Scan table to cache data
Elapsed_Time_SCAN 
----------------- 
00:00:00.093

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery 
--------------------- 
00:00:00.843

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE 
----------------- 
00:00:00.907

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery 
--------------------- 
00:00:00.657

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE 
----------------- 
00:00:00.140

(1 row(s) affected)



--------------------------------------------------------------------
-- Test with 1,000,000 rows of data
--------------------------------------------------------------------

Create table to hold test data
Load test data

(1000000 row(s) affected)


(1000000 row(s) affected)

Scan table to cache data
Elapsed_Time_SCAN 
----------------- 
00:00:00.923

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery 
--------------------- 
00:00:06.607

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE 
----------------- 
00:00:02.327

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery 
--------------------- 
00:00:06.640

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE 
----------------- 
00:00:01.373

(1 row(s) affected)

CODO ERGO SUM

DLTaylor
Posting Yak Master

136 Posts

Posted - 2009-03-27 : 10:07:26

Thanks to both of you.
I will read into replies and let you know how i get on.
Thanks

deviji
Starting Member

7 Posts

Posted - 2009-08-07 : 05:33:28

Nice Post.
say if i want to insert / update the result of the above query how can i do it?
that is calculating max / min or a row and update the same into the same row.

create table #t(v1 int,
v2 int,
v3 int,
max1 int,
min1 int,
tot int,
avg int)

in the above table calculate the max of the 3 values and store in the same table.
and i want to know which column either v1 or v2 or v3 score the maximum...

Thanks,
Regards
Viji

deviji
Starting Member

7 Posts

Posted - 2009-08-07 : 05:33:38

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources