Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 How to flag top and bottom records?

Author  Topic 

Umar Strong
Starting Member

8 Posts

Posted - 2013-08-05 : 13:37:44
I have a need to write an SQL that will flag top and bottom N% records; the value for N% is stored in a table.

For an example, my data is:

DATA
------
20.30
10.50
2.64
10.50
17.52
13.64
21.32
6.32
4.21
14.23


and the desired output needs to be:

DATA  FLAG
----  ----
2.64   1
4.21   1
6.32
10.5
10.5
13.64
14.23
17.52
20.3   1
21.32  1



Following are the DDL & DMLs for this dataset - The two tables are joined by the column MY_KEY.

Any help will be highly appreciated.

CREATE TABLE DBO.TEMP_PCT (MY_KEY INTEGER, PCT_NR INTEGER)
INSERT INTO DBO.TEMP_PCT VALUES (1, 20)

CREATE TABLE DBO.TEMP_DATA (MY_KEY INTEGER, DATA DECIMAL(5,2))
INSERT INTO DBO.TEMP_DATA VALUES (1, 20.3)
INSERT INTO DBO.TEMP_DATA VALUES (1, 10.5)
INSERT INTO DBO.TEMP_DATA VALUES (1, 2.64)
INSERT INTO DBO.TEMP_DATA VALUES (1, 10.5)
INSERT INTO DBO.TEMP_DATA VALUES (1, 17.52)
INSERT INTO DBO.TEMP_DATA VALUES (1, 13.64)
INSERT INTO DBO.TEMP_DATA VALUES (1, 21.32)
INSERT INTO DBO.TEMP_DATA VALUES (1, 6.32)
INSERT INTO DBO.TEMP_DATA VALUES (1, 4.21)
INSERT INTO DBO.TEMP_DATA VALUES (1, 14.23)


Thanks!

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-08-05 : 13:50:19
Like this:
SELECT 
DATA,
CASE WHEN 100.*NASC/N <= PCT_NR OR 100.*NDESC/N <= PCT_NR THEN 1 ELSE NULL END AS FLAG
FROM
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY DATA ASC) AS NASC,
ROW_NUMBER() OVER (ORDER BY DATA DESC) AS NDESC,
COUNT(*) OVER() AS N
FROM
DBO.TEMP_DATA
)s
CROSS JOIN DBO.TEMP_PCT p
can you have more than one row in the DBO.TEMP_PCT table?
Go to Top of Page

Umar Strong
Starting Member

8 Posts

Posted - 2013-08-05 : 13:59:27
Yes, it can have. I just extended the SQL you provided and am getting the result I wanted.

Below is the extended SQL.

Thanks a lot for your help !


SELECT
  DATA,
  CASE WHEN 100.*NASC/N <= PCT_NR OR 100.*NDESC/N <= PCT_NR THEN 1 ELSE NULL END AS FLAG
FROM
(
SELECT *,
  ROW_NUMBER() OVER (ORDER BY DATA ASC) AS NASC,
  ROW_NUMBER() OVER (ORDER BY DATA DESC) AS NDESC,
  COUNT(*) OVER() AS N
FROM
  DBO.TEMP_DATA
)s
INNER JOIN DBO.TEMP_PCT p
  ON p.MY_KEY = s.MY_KEY
Go to Top of Page

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-08-05 : 14:07:10
Great, glad you were able to fix it.

If you want to check out some nifty stuff, check out the NTILE function, which does something similar but a bit more elegantly. http://msdn.microsoft.com/en-us/library/ms175126(v=sql.105).aspx The only thing to keep in mind is that it may tile them into slightly unequal groups if the result of the division is not an integer.
Go to Top of Page

Umar Strong
Starting Member

8 Posts

Posted - 2013-08-05 : 14:43:08
Hello James,

This SQL will become a view in my code and will require some filter clause on the table DBO.TEMP_DATA during run time.

So I am wondering if the desired result can be achieved without the sub-query?


Thanks!
Go to Top of Page

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-08-05 : 15:19:01
Nothing that I am going to say below is actionable information - these are just thoughts:

If it were a readonly table, you could have pre-computed the flags and stored, in which case, you wouldn't even need a view.

Another thought is to have a computed column or persisted view, but computed columns don't work with row_number() functions, so that is a non-starter. If I am not mistaken, in a persisted view you cannot create a unique clustered index on a column that has a ranking function. So that is out as well.

You might be able to use some clever combination of TOP 20 PERCENT ... ORDER BY DATA ASC and TOP 20 PERCENT ... ORDER BY DATA DESC, but not clear to me what. Even then, the server will have to sort the data - which means it will need to read all the data.

An index on DATA column is a possibility that might help. If you have an index, it has statistics on the index, so SQL Server knows the distribution of the data; so theoretically that should help. But I haven't experimented with it to say whether or not it will.

To start off, I would keep the query as it as it is (or use a cte instead of a subquery, whichever you feel comfortable with) and make it into a view and see how the performance is.

The bottom line is, when you want to identify the top N percent based on a value, unless SQL Server has some statistical information to determine where to cut it off, it would have to examine every row of data.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-08-06 : 01:25:02
This SQL will become a view in my code and will require some filter clause on the table DBO.TEMP_DATA during run time.

if you want a dynamic filter why not make it a procedure then? or if you want to use this resultset in further join operations with other tables make it a table valued UDF.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

Umar Strong
Starting Member

8 Posts

Posted - 2013-08-06 : 12:04:56
James and Visakh16 thanks for throwing ideas. I need to use the data coming out from this code, to be joined with another data.

So I started creating UDF than realized that dynamic SQLs are not supported there. Stored Procedure is not an option for me;

I am now left with creating view.

Thanks again!
Go to Top of Page
   

- Advertisement -