Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Filtering with several criteria

Author  Topic 

umutos
Starting Member

12 Posts

Posted - 2010-05-11 : 12:35:27
Hello,

I have a problem with filtering my data in MSSQL2008. It would be really great if someone could help me

Let’s consider the following table “Data” (here just some lines of it)


ID Age City Birth date Eyes Sex
12321 11 Paris 01.02.2008 Green Man
12311 12 Berlin 02.09.1999 Blue Man
I2343 34 NY 01.03.2001 Brown Woman
… … … … … …



Now, I need only the lines in „Data“ where there is for a given “Age”, “City”, “Birth date”, “Eyes” a “Man” AND a “woman”.
e.g. when in the whole dataset the 12 year old persons from Berlin with a birth date of 2.9.1999 and blue eyes are all man, then these lines can be filtered away. When there is at least 1 man and one woman among these persons, all of these lines can stay. (Sorry for my English, hope you can understand what I try to explain)

Originally I managed to solve this problem in Matlab but the dataset is too large for matlab (22 tables with more than 1 million rows) so that I had to work in MSSQL.

It’s the first time I am working with SQL and I am really greatfull for any help!

Bernd

pk_bohra
Master Smack Fu Yak Hacker

1182 Posts

Posted - 2010-05-11 : 13:52:29
Try this:

Declare @DataTable Table
(Id int,
Age int,
City varchar(50),
BirthDate Datetime ,
Eyes Varchar(10),
Sex Varchar(5)
)

Insert into @DataTable
Select 12321, 11, 'Paris' ,'2008-02-01' ,'Green' ,'Man' union all
Select 12311 ,12 ,'Berlin', '1999-09-02', 'Blue', 'Man' union all
Select 12343 ,34, 'NY', '2001-03-01', 'Brown' ,'Woman' union all
Select 12345 ,34, 'NY', '2001-03-01', 'Brown' ,'Man' union all
Select 12311 ,12 ,'Berlin', '1999-09-02', 'Blue', 'Man'


Select * from @DataTable DT Join
(
Select qman.Age,
QMan.Eyes,
QMan.City,
QMan.BirthDate
from @DataTable QMan
join @DataTable QWoman
on Qman.Age = QWoman.Age
and QMan.city = QWoman.city
and QMan.Eyes = QWoman.Eyes
and QMan.Sex = 'Man'
and QWoman.Sex = 'Woman'
) ST on DT.Age = ST.Age
and DT.BirthDate = ST.BirthDate
and DT.Eyes = ST.Eyes
and ST.City = DT.City


Regards,
Bohra

I am here to learn from Masters and help new bees in learning.
Go to Top of Page

umutos
Starting Member

12 Posts

Posted - 2010-05-12 : 09:32:46
Thank you Bora!

Now I am a little bit confused because I asked the same question in a german speaking forum and as a response i got the folowing code:

select * from Data d1 where exists (
select 1 from data d2 where
d2.City = d1.City and
d2.BirthDate = d1.BirthDate and
d2.Eyes = d1.Eyes and
d2.Sex = (case when d1.Sex = 'Man' then 'Woman' else 'Man' end)
)


When I apply this code on my dataset i get a different table as with the one you posted to me

As I am not familiar with SQL I cant really see what you have done. YOu created some imaginary duplicates of the same table and joined them in a way that these criterias are fullfiled?

Thanks a lot!
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-05-12 : 10:16:31
can you post what would be your output for sample data above?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

umutos
Starting Member

12 Posts

Posted - 2010-05-12 : 13:01:56
Ok. As i gave just 3 lines of the tabel its not easy to show you the results.

In order to explain it better let’s consider another table. This is very similar to the one which I have. The table shows characteristics of option contracts.
(this are bets on a gain or a fall of equity(stock) titles. A “Call”-Option is a bet on a gain and a “Put”-Option is a bet on the fall of the stock. Here is the sample table:



Row_nr Security_ID Date Strike Expiration Call_Put
1 13214 01.01.2007 75 03.08.2012 Call
2 12321 02.04.2007 40 03.07.2011 Put
3 123213 03.04.1999 30 03.03.2010 Put
4 12312 03.04.1999 30 03.03.2010 Put
5 75876 03.04.1999 30 03.08.2012 Put
6 23422 04.09.1997 67 03.08.2012 Put
7 13214 01.01.2007 75 03.08.2012 Call
8 45645 03.08.2000 98 03.08.2012 Call
9 123213 03.04.1999 30 03.03.2010 Put
10 242432 01.01.2007 98 03.08.2012 Call
11 13214 01.01.2007 75 03.08.2012 Put
12 75876 03.04.1999 30 03.08.2012 Call
13 2343223 01.01.2007 75 03.07.2011 Put


Explanation of the data:

The first row shows the characteristic of one specific bet. It is a bet on the stock with the Security ID 13214 (“Security ID”). The bet starts on 1.1.2007 (“Date”) and ends on 3.8.2012 (“Expiration”).It is a Call-option (“Call_Put”), and the contract has a strike of 75 USD (“Strike”). (I think its not relevant to explain Strike price here.its just a characteristic of one specific option.

What I need:

Only positions on the same underlying equity (“Security ID”) and with the same bet starting date (“Date”), the same strike price (“Strike”) and the same bet ending day (“expiration”) which have at least 1 Call and 1 Put position.
When there is at least 1 call and 1 put we take ALL the rows no matter whether there are 99 “identical” (same strike, expiration, date and Security ID) puts and just 1 “identical” call or there is only 1 “identical” call and 99 “identical” puts.

For the sample table this would be the result I need:


Row_nr Security_ID Date Strike Expiration Call_Put
1 13214 01.01.2007 750 03.08.2012 Call
7 13214 01.01.2007 750 03.08.2012 Call
11 13214 01.01.2007 750 03.08.2012 Put
5 75876 03.04.1999 30 03.08.2012 Put
12 75876 03.04.1999 30 03.08.2012 Call



Explanation of the result:

-The rows 1,7 and the 11th row are ok, because the contracts are “identical” (same strike, security id, date, expiration) AND there is at least 1 call and 1 put.

-Rows 3 and 9 for example can’t be taken as the contracts are “identical” BUT there is no identical Call.

-Rows 5 and 12 fullfill the criteria as they have the same strike, same date, same expiration, same security_ID AND one of them is a Call and one of them is a Put.

-all the other rows have no other "identical" doubles, so they can be deleted.

When this example with the options data is not easy to understand just tell me and I will apply it to the birthdate_data

Thanks a lot!
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-05-12 : 13:10:22
[code]
SELECT t.Row_nr,t.Security_ID, t.Date, t.Strike, t.Expiration,t.Call_Put
FROM Table t
JOIN (SELECT Security_ID, Date, Strike, Expiration
FROM Table
GROUP BY Security_ID, Date, Strike, Expiration
HAVING COUNT(DISTINCT CASE Call_Put WHEN 'Call' THEN 1 WHEN 'Put' THEN 2 ELSE NULL END) =2
)t1
ON t1.Security_ID =t.Security_ID
AND t1.Date = t.Date
AND t1.Strike = t.Strike
AND t1.Expiration = t.Expiration
[/code]




------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

umutos
Starting Member

12 Posts

Posted - 2010-05-13 : 09:42:35
Thank you Visakh16!

Your code gives the solution I need but its a little bit too slow. I found another code which gives the same result (only the solution is not ordered as in your code, but thats ok) but is much faster:


SELECT Row, Date,SecurityID,Strike, Expiration,Call_Put
FROM @DataTable d1
WHERE EXISTS(
SELECT 1
FROM @DataTable d2
WHERE d2.Date=d1.Date AND
d2.Strike=d1.Strike AND
d2.Expiration=d1.Expiration AND
d2.SecurityID=d1.SecurityID AND
d2.Call_Put=(CASE WHEN d1.Call_Put='Call' THEN 'Put'
ELSE 'Call' END)
)


Thank you all very much for the help!
Go to Top of Page

umutos
Starting Member

12 Posts

Posted - 2010-05-14 : 09:22:46
Hello,

The Code discussed above(which is identical to the one of Visakh16) works great but my data is still too large and I have to summarize the rows. I think the summary can be done by 2 steps and will work with the GROUP BY function but it is quite hard for me to apply that and I was not able to sort it correctly . Maybe one of you can help me once again.

Basis of the further calculation is the table resulting from this code (which gives identical results as the code of Visakh16):


SELECT d1.Row_nr, d1.SecurityID, d1.Date,d1.Strike, d1.Expiration,d1.Call_Put, d1.Volume FROM table_01 d1
WHERE EXISTS(
SELECT 1 FROM table_01 d2
WHERE d2.Date=d1.Date AND d2.Strike=d1.Strike AND
d2.Expiration=d1.Expiration AND d2.SecurityID=d1.SecurityID
AND d2.Call_Put=(CASE WHEN d1.Call_Put='Call' THEN 'Put'ELSE 'Call'
END)
)



To get the new table I have to find a code which implements this 2 steps:

1. In a first step I want to summarize the data for every “identical call/ put pairs”
(I found out that in my dataset there exists just 1 call and 1 put with the same strike, expiration, securityID, date; this I call “identical call/put pair”).
For all identical call and put pairs (positions with the same date, strike, expiration, securityID) a new column (lets call it “AVG_Volume”) has to be created which calculates the average Volume of every “identical call/put pair”. So every Call/put pair (2 rows) is replaced by “AVG_Volume” (1 row). So with this step the dataset is halved. I don’t know how this can be done but maybe one could write the solution in “AVG_Volume” into the Call rows and delete all the Put rows in the datatable or the other way around.

2. In a second step all the data for a given security on a given date has to be summarized:
For a given Date and SecurityID all the numbers in “AVG_Volume” have to summed up (lets call this colum “SUM_AVG_Volume”). This number represents all the option pairs for a given security on a given date.

When I am able to do this 2step filtration the dataset can be reduced and I can continue with my calculations. To give an example: when for a given securityID and date there a 5 call/put option pairs (10 rows), the first step is reducing the dataset for that given date and SecurityID to 5 rows and the second step is reducing this 5 rows to 1 row showing the sum of “the average volumes”.

Maybe my explanation was not so good. Therefore here a small numerical example:

Lets say that we applied the code I gave above and we get the following table:



Row_nr Security_ID Date Strike Expiration Call_Put Volume
23 12311 01.01.2007 70 07.09.2020 Call 455
5567 12311 01.01.2007 70 07.09.2020 Put 1234
12 12311 01.01.2007 90 07.09.2020 Call 233
4332 12311 01.01.2007 90 07.09.2020 Put 124
56 12311 01.01.2006 67 09.06.2022 Call 990
997 12311 01.01.2006 67 09.06.2022 Put 123
12 2211 03.04.2008 87 08.08.2011 Call 8687
29 2211 03.04.2008 87 08.08.2011 Put 564
33 1232 02.02.1989 66 07.07.2020 Call 121
78 1232 02.02.1989 66 07.07.2020 Put 234
43 1232 02.02.1989 90 09.06.2022 Call 634
899 1232 02.02.1989 90 09.06.2022 Put 32



After Applying fist step we get the following table:


Security_ID Date Strike Expiration AVG_Volume
12311 01.01.2007 70 07.09.2020 844,5
12311 01.01.2007 90 07.09.2020 178,5
12311 01.01.2006 67 09.06.2022 556,5
2211 03.04.2008 87 08.08.2011 4625,5
1232 02.02.1989 66 07.07.2020 177,5
1232 02.02.1989 90 09.06.2022 333

And after the second step we get the final solution I need:


Security_ID Date SUM_AVG_Volume
12311 01.01.2007 1023
12311 01.01.2006 556,5
2211 03.04.2008 4625,5
1232 02.02.1989 510,5



Maybe all of that can be done in 1 step? I Just need the last table. Thank you very much for any help!

Bernd
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-05-14 : 13:13:21
[code]SELECT SecurityID,Date,SUM(Avg_Volume)
FROM
(
SELECT d1.SecurityID, d1.Date,d1.Strike, d1.Expiration,d1.Call_Put, AVG(d1.Volume *1.0) AS Avg_Volume
FROM table_01 d1
WHERE EXISTS(
SELECT 1 FROM table_01 d2
WHERE d2.Date=d1.Date AND d2.Strike=d1.Strike AND
d2.Expiration=d1.Expiration AND d2.SecurityID=d1.SecurityID
AND d2.Call_Put=(CASE WHEN d1.Call_Put='Call' THEN 'Put'ELSE 'Call'
END)
)
GROUP BY d1.SecurityID, d1.Date,d1.Strike, d1.Expiration,d1.Call_Put
)t
GROUP BY SecurityID,Date
[/code]

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

sageora
Starting Member

4 Posts

Posted - 2010-05-14 : 15:10:36
Umutos, let me ask you a question.
If we have a set of call/put options where there are let's say 2 calls with 10 and 10 volume value and one put with 20 volume value same strike price each. What would be the average volume?
I would say the avg value is 20 but according to the last solution it is 13.33

Go to Top of Page

umutos
Starting Member

12 Posts

Posted - 2010-05-15 : 07:45:08

Hey Visakh16, you re great man! Thank you very very much.And it looks very easy what you have done!

Hello Sageora,very good question.

Well in my last post I wrote: "there exists just 1 call and 1 put with the same strike, expiration, securityID, date". According to that then there is no case where we have 2 calls and 1 put.

BUT: After I read your question I went through my database and found out that there are some positions with that constellation (2 calls and 1 put or other pairs consisting of more than 2 positions).Man this looks like more work .

So therefore in that case (a pair consisting of more than 2 positions) taking just the average of 10,10 and 20 would be the false answer. In the volume case you're right that 20 would be the right answer (AVG(sum(10,10),10). But volume was just an example I gave. In my case I have to calculate the average volume weighted price difference.

Therefore I think one need 1 additional step before implementing the code visakh16 posted. The code should summarize "identical pairs" consisting of more than 2 positions like that:

The code has to average the price of all the calls in one "identical pair", average the price of all the puts in one "identical pair", sum up the volume of all the calls in one "identical pair" and sum up the volume of all the puts in one "identical pair".


lets say 1 identical call put pair looks like that:


C/P Price Volume
c1 10 20
c2 20 15
c3 30 15
p1 20 5
p2 10 5



then the code should summarize the pair to 2 positions:


C/P Price_AVG Volume_SUM
c123 20 50
p12 15 10


When this is done the rest is relatively "simple" (I think)

Building this procedure into the code (I have no clue how this should be done) one have to adapt the code of CISAKH16 slightly (I think) by changing:


AVG(d1.Volume *1.0) AS Avg_Volume

by

AVG(d1.Volume_SUM)*[Price_AVG(of call)-Price_AVG(of put)]AS Avg_vw_Price_diff


and

SUM(Avg_Volume)

by

SUM(Avg_vw_Price_diff

(I dont know why there is a *1.0 in the code of Visakh16,so i didn't used it but i am sure there is a reason why he used that)

for the above example call/put pair one would get the average volume weighted price difference : AVG(50,10)*(20-15)=150

When you could help me with the implementation of this extra summary which has to be done before implementing the last code of Visakh16 I think I would be finish (i cant imagine that ).
Thank you guys for the help! I really appreciate that!
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-05-16 : 08:10:03
[code]
SELECT d1.SecurityID, d1.Date, SUM(d1.Volume) *1.0/COUNT(DISTINCT d1.Call_Put) AS Sum_Avg_Volume
FROM table_01 d1
WHERE EXISTS(
SELECT 1 FROM table_01 d2
WHERE d2.Date=d1.Date AND d2.Strike=d1.Strike AND
d2.Expiration=d1.Expiration AND d2.SecurityID=d1.SecurityID
AND d2.Call_Put=(CASE WHEN d1.Call_Put='Call' THEN 'Put'ELSE 'Call'
END)
)
GROUP BY d1.SecurityID, d1.Date
[/code]

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

umutos
Starting Member

12 Posts

Posted - 2010-05-17 : 11:45:33
Thank you visakh16! you re a great help.
With your help I am almost finished:


Select Date,SecurityID, VW_Price_Diff=sum(AVG_VW_PC) FROM
(Select Date,SecurityID,Strike, Expiration, AVG_VW_PC=AVG(SUM_Price)
*[AVG_Price(#OF THE CALL-ROW#)- AVG_Price(#OF THE PUT ROW#)] FROM
(Select d1.Date,d1.SecurityID,d1.Strike, d1.Expiration,d1.CallPut,
AVG_Price=AVG(Price), SUM_Volume=SUM(Volume) FROM Test_05 d1
WHERE EXISTS(SELECT 1 FROM Test_05 d2 WHERE d2.Date=d1.Date AND
d2.Strike=d1.Strike AND d2.Expiration=d1.Expiration AND
d2.SecurityID=d1.SecurityID AND d2.Call_Put=(CASE WHEN
d1.Call_Put='Call' THEN 'Put'ELSE 'Call' END)
Group BY d1.Date,d1.SecurityID,d1.Strike,
d1.Expiration,d1.Call_Put
)t1 Group BY Date,SecurityID,Strike,Expiration
)t2 Group BY Date,SecurityID


the only think I need is the substraction of the prices of the call and the puts (see in the code the term with #). I don't know how to select some specifc cells.

to put in other words I just need a command which transforms the following matrix :


Sp1 Sp2 Sp3
A C 10
A P 7
B C 8
B P 1
C C 20
C P 10


into the following:

Sp1 Sp_X_sub
A 3
B 7
C 10


addition would be:

select A, SP_X_ad=sum(Sp3) from table group by A

but substraction?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-05-17 : 12:22:51
[code]select A, SP_X_ad=sum(case when Sp2='C' THEN Sp3 WHEN 'P' THEN -Sp3 ELSE NULL END) from table group by A[/code]

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

umutos
Starting Member

12 Posts

Posted - 2010-05-18 : 05:12:49
That's it! perfect! Thank you visakh16, you re great!
have a nice day!
Go to Top of Page
   

- Advertisement -