Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 help with query - join,randomization

Author  Topic 

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-11 : 10:43:49
Hi,

I have two tables as shown below. Basically in one table we have the user, and then in the second we have many links.

Basically I want to run a query like such, but with some addtions.

SELECT TOP 10 * FROM [tblPartners]
WHERE approvalStatus = 1 AND inbound_clicks_total > 150 and ((inbound_points_total - 100) > outbound_points_total)


For all accounts that meet the criteria above, I want to get records from the "outboundLinks" table.

I want the row thats brought back from the outbound links table to be selected randomly, according to the "linkWeight" column.

For example if we have a row with linkweight 100 and another with linkweight 10, the first row will have a 10x better change of coming up.

Also we should only bring back maximum 1 link per account, and on top of that the results should be randomized.


Can anyone lend a hand on this one ? I'm trying but getting nowhere, and the last thing I want to do is write an inefficient query.

any help appreciated !:)

Thanks once again!!
mike123








CREATE TABLE [dbo].[tblPartners](
[partnerID] [int] IDENTITY(1,1) NOT NULL,
[AccountTitle] [varchar](100) NULL,
[EmailAddress] [varchar](50) NULL,
[password] [varchar](50) NULL,
[contactName] [varchar](100) NULL,
[inbound_clicks_total] [int] NULL,
[outbound_clicks_total] [int] NULL,
[inbound_points_total] [int] NULL,
[outbound_points_total] [int] NULL,
[approvalStatus] [tinyint] NULL
)
GO


CREATE TABLE [dbo].[tblOutBoundLinks](
[linkID] [int] IDENTITY(1,1) NOT NULL,
[partnerID] int NOT NULL,
[linkTitle] [varchar](50) NOT NULL,
[linkDestination] [varchar](50) NOT NULL,
[linkWeight] [smallint] NOT NULL.
[displayStatus] [smallint] NOT NULL.
[approvalStatus] [smallint] NOT NULL.

) ON [PRIMARY]

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-11-11 : 11:34:25
What you can do is to put the result of first query into a temp table or table variable.

Then you do a "running sum".

Linkweight runsum

30 30
20 50
10 60

Total is 60. Get a random value between 1 and 60 with "ABS(CHECKSUM(NEWID())) % 60"
Then select the record by
WHERE rs > @randomvalue

with this approach, a weight if 30 "occupies" half the running sum of 60 and thus has 50% chance to be selected.



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-11-11 : 11:35:39
Or, as suggested before a few times, provide some proper sample data and expected output.



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-11 : 16:25:13
Hi Peso,

Is there a good tool out to create these insert statements? I remember last time I looked they were all pretty expensive for just this purpose, however so is doing it by hand :) Any personal recommendations are appreciated.

Good idea on creating the dummy data here, its actually even more complicated than I imagined.

Instead of bringing back just the top 10 we will bring back all records due to limited results present.

Rules:

To bring back an "outbound link" its approvalStatus must be 1, and its displayStatus must be 1.

Also the parent account approvalStatus must be 1.

We are going to ignore the "inbound_clicks_total" and "outbound_clicks_total" and instead use the "inbound_points_total" and "outbound_points_total"

We want to only select accounts with minimum 15,000 inbound_points_total, We also want to make sure the "inbound_points_total" is at least 10,000 higher than "outbound_points_total".


So now we basically have a list of accounts that fit this filter, and then we have to see if they have a qualifying link in the link table. Some accounts will have many links.

The "linkWeight" column is relative to their account only. For example if the user has 1 link of weight "1" , its the same as weight "1000".

We want to bring back maximum 1 link per account in the partners table. The link should be randomly selected for each user, with links with higher weights given a higher probability of being selected in that users pool of links.

This list of links brought back, should be sorted by the biggest differential between "inbound_points_total" and "outbound_points_total", as we will throw a TOP clause in there when we have enough data.

After all this we will have a list of links, we should randomize the order in which they are brought back.


The data brought back should be in the format

OBL.linkTitle,OBL.linkDestination,OBL.partnerID

I know this is alot to take in. Please let me know if this makes sense!


Thanks very much once again!,
mike123


Thanks very much!
Mike123

--data below


INSERT INTO [dbo].[tblPartners]
([AccountTitle],[EmailAddress],[password],[contactName],[inbound_clicks_total],[outbound_clicks_total],[inbound_points_total],[outbound_points_total],[approvalStatus])
VALUES ('accountTitle1','email@1.com','pass@1.com','contact1',0,0,10000,10000,1)
GO

INSERT INTO [dbo].[tblPartners]
([AccountTitle],[EmailAddress],[password],[contactName],[inbound_clicks_total],[outbound_clicks_total],[inbound_points_total],[outbound_points_total],[approvalStatus])
VALUES ('accountTitle2','email@2.com','pass@2.com','contact2',0,0,250000,100000,1)
GO


INSERT INTO [dbo].[tblPartners]
([AccountTitle],[EmailAddress],[password],[contactName],[inbound_clicks_total],[outbound_clicks_total],[inbound_points_total],[outbound_points_total],[approvalStatus])
VALUES ('accountTitle3','email@3.com','pass@3.com','contact3',0,0,750000,1000,1)
GO


INSERT INTO [dbo].[tblPartners]
([AccountTitle],[EmailAddress],[password],[contactName],[inbound_clicks_total],[outbound_clicks_total],[inbound_points_total],[outbound_points_total],[approvalStatus])
VALUES ('accountTitle4','email@4.com','pass@4.com','contact4',0,0,750000,0,0)
GO

INSERT INTO [dbo].[tblPartners]
([AccountTitle],[EmailAddress],[password],[contactName],[inbound_clicks_total],[outbound_clicks_total],[inbound_points_total],[outbound_points_total],[approvalStatus])
VALUES ('accountTitle5','email@5.com','pass@5.com','contact5',0,0,500000,495000,1)
GO

INSERT INTO [dbo].[tblPartners]
([AccountTitle],[EmailAddress],[password],[contactName],[inbound_clicks_total],[outbound_clicks_total],[inbound_points_total],[outbound_points_total],[approvalStatus])
VALUES ('accountTitle6','email@6.com','pass@6.com','contact6',0,0,60000,0,1)
GO


INSERT INTO [dbo].[tblOutBoundLinks]
([partnerID],[linkTitle],[linkDestination],[linkWeight],[displayStatus],[approvalStatus])
VALUES (1,'link1','www.google.com', 1, 1, 1)

INSERT INTO [dbo].[tblOutBoundLinks]
([partnerID],[linkTitle],[linkDestination],[linkWeight],[displayStatus],[approvalStatus])
VALUES (2,'link2','www.yahoo.com', 3, 1, 1)

INSERT INTO [dbo].[tblOutBoundLinks]
([partnerID],[linkTitle],[linkDestination],[linkWeight],[displayStatus],[approvalStatus])
VALUES (2,'link3','www.msn.com', 2, 1, 1)

INSERT INTO [dbo].[tblOutBoundLinks]
([partnerID],[linkTitle],[linkDestination],[linkWeight],[displayStatus],[approvalStatus])
VALUES (3,'link1','www.myspace.com', 2, 1, 1)

INSERT INTO [dbo].[tblOutBoundLinks]
([partnerID],[linkTitle],[linkDestination],[linkWeight],[displayStatus],[approvalStatus])
VALUES (3,'link1','www.facebook.com', 1, 1, 0)

INSERT INTO [dbo].[tblOutBoundLinks]
([partnerID],[linkTitle],[linkDestination],[linkWeight],[displayStatus],[approvalStatus])
VALUES (3,'link1','www.cnn.com', 5, 0, 1)

INSERT INTO [dbo].[tblOutBoundLinks]
([partnerID],[linkTitle],[linkDestination],[linkWeight],[displayStatus],[approvalStatus])
VALUES (4,'link1','www.ebay.com', 6, 1, 1)

INSERT INTO [dbo].[tblOutBoundLinks]
([partnerID],[linkTitle],[linkDestination],[linkWeight],[displayStatus],[approvalStatus])
VALUES (5,'link1','www.youtube.com', 1, 1, 1)

INSERT INTO [dbo].[tblOutBoundLinks]
([partnerID],[linkTitle],[linkDestination],[linkWeight],[displayStatus],[approvalStatus])
VALUES (5,'link1','www.hulu.com', 2, 1, 1)

INSERT INTO [dbo].[tblOutBoundLinks]
([partnerID],[linkTitle],[linkDestination],[linkWeight],[displayStatus],[approvalStatus])
VALUES (6,'link1','www.sqlteam.com', 3, 1, 1)


Go to Top of Page

hanbingl
Aged Yak Warrior

652 Posts

Posted - 2008-11-11 : 16:29:33
lol I googled it:

http://www.generatedata.com/
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-12 : 05:35:56
quote:
Originally posted by hanbingl

lol I googled it:

http://www.generatedata.com/



only for mysql? anyways not exactly what I had in mind ..

thx anyways :)
mike123
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-12 : 05:40:29
Since the results are randomized, I can simulate what the data should look like exactly, but heres an example before randomization:

Note this is all records brought back. Results may very due to randomization and only certain records being picked via linkweight.


OBL.linkTitle,OBL.linkDestination,OBL.partnerID, linkID

link1, www.google.com, 1,1
link2, www.yahoo.com, 2,2
link3, www.myspace.com, 3,4
link5, www.youtube.com, 5,8
link6, www.sqlteam.com, 6,10


(Edited to add LinkID to data returned)

Go to Top of Page

hanbingl
Aged Yak Warrior

652 Posts

Posted - 2008-11-12 : 10:38:17
quote:
Originally posted by mike123

quote:
Originally posted by hanbingl

lol I googled it:

http://www.generatedata.com/



only for mysql? anyways not exactly what I had in mind ..

thx anyways :)
mike123




Insert syntax is the same though.
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-12 : 10:46:17
quote:
Originally posted by Peso

What you can do is to put the result of first query into a temp table or table variable.

Then you do a "running sum".

Linkweight runsum

30 30
20 50
10 60

Total is 60. Get a random value between 1 and 60 with "ABS(CHECKSUM(NEWID())) % 60"
Then select the record by
WHERE rs > @randomvalue

with this approach, a weight if 30 "occupies" half the running sum of 60 and thus has 50% chance to be selected.



E 12°55'05.63"
N 56°04'39.26"




Hey Peso,

Regarding the "first query" I'm not really sure which table I should be querying, should I be starting this with the "partners" table , or the "outboundLinks" table ?

I'm guessing with the partner table, does that mean I should do a subquery to outbound links to get a row for each row brought back in "partners"?

These queries below get me the information according to criteria, but don't take into account the "LinkWeight" and the fact we only want 1 outbound link per partnerID.


anything further greatly appreciated..

thanks again!
mike123


select * from [tblPartners] WHERE approvalStatus = 1 AND inbound_points_total > 1500 AND (outbound_points_total - inbound_points_total > 1000) ORDER BY (outbound_points_total - inbound_points_total ) DESC


select * from [tblOutBoundLinks] WHERE approvalStatus = 1 AND displayStatus = 1
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-12 : 11:05:03
quote:


Insert syntax is the same though.



from what I can see this is just a web app? Ideally there would be a program that can look at my table and create the insert based on actual rows in my DB?

Thanks!
mike123
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-13 : 10:32:28
Hey Guys,

Trying to figure this one here out, but pretty stuck on how to proceed, I'm wondering if it shouuld be a "JOIN" or should I be doing a subquery ?

as posted above I am trying to get a random result from this table

select * from [tblOutBoundLinks] WHERE approvalStatus = 1 AND displayStatus = 1

after selecting qualified results from the query below.


select * from [tblPartners] WHERE approvalStatus = 1 AND inbound_points_total > 1500 AND (outbound_points_total - inbound_points_total > 1000) ORDER BY (outbound_points_total - inbound_points_total ) DESC


Start of the temp table method:


CREATE TABLE #TempLinks
(
idNum int identity(1,1),
partnerID int,
linkID int,
linkTitle varchar(25),
linkDescription varchar(15)
)



INSERT INTO #TempLinks(partnerID, linkID, linkTitle, linkDescription)

SELECT TOP 10 partnerID,linkID,linkTitle,linkDescription

FROM .....








thanks once again!
mike123
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-14 : 11:28:47
bump in case anyones able to lend a hand :) .. still trying to solve this one its being quite tricky but I don't think we are far off


thanks again,
mike123
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-11-14 : 11:31:13
and how do the two queries link? what are related fieldS?
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-14 : 11:35:16
Hi Visakh16,

We have these 2 parts right here.


select * from [tblPartners] WHERE approvalStatus = 1 AND inbound_points_total > 1500 AND (outbound_points_total - inbound_points_total > 1000) ORDER BY (outbound_points_total - inbound_points_total ) DESC


select * from [tblOutBoundLinks] WHERE approvalStatus = 1 AND displayStatus = 1

They will link from on the "partnerID" column, as posted from the table schemas above... Please let me know if any other q's appreciate the help...

Peso's logic is correct, but I'm just having difficulty implementing it as I'm not sure how I should structure the query..

thanks again :)
mike123
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-11-14 : 11:44:32
[code]SELECT *
FROM
(
select * from [tblPartners] WHERE approvalStatus = 1 AND inbound_points_total > 1500 AND (outbound_points_total - inbound_points_total > 1000)
)t1
CROSS APPLY (select TOP 1 * from [tblOutBoundLinks] WHERE approvalStatus = 1 AND displayStatus = 1
AND PartnerID = t1.PartnerID
ORDER BY NEWID()
)t2
ORDER BY (t1.outbound_points_total - t1.inbound_points_total ) DESC[/code]
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-14 : 11:59:18
Hi Visakh16,

This is a great step in the right direction, but there just one piece of functionality missing. I think Peso has directed the logic pretty good below.

Basically as you can see from the "outboundlinks" table, we have a weight for each link. We want to randomize the link brought back, and keep it so in the final results there is max 1 link per partnerID.

Please let me know if that makese sense



quote:

What you can do is to put the result of first query into a temp table or table variable.

Then you do a "running sum".

Linkweight runsum

30 30
20 50
10 60

Total is 60. Get a random value between 1 and 60 with "ABS(CHECKSUM(NEWID())) % 60"
Then select the record by
WHERE rs > @randomvalue

with this approach, a weight if 30 "occupies" half the running sum of 60 and thus has 50% chance to be selected.



Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-11-14 : 12:18:10
did you mean this?
SELECT *
FROM
(
select * from [tblPartners] WHERE approvalStatus = 1 AND inbound_points_total > 1500 AND (outbound_points_total - inbound_points_total > 1000)
)t1
CROSS APPLY (select TOP 1 *,o.Points+COALESCE(o1.Points,0) AS RunningSum from [tblOutBoundLinks] o
OUTER APPLY (SELECT SUM(Points)
FROM [tblOutBoundLinks]
WHERE PartrnerID=o.PartnerID
AND linkID <o.linkID)o1
WHERE approvalStatus = 1 AND displayStatus = 1
AND PartnerID = t1.PartnerID
AND o.Points+COALESCE(o1.Points,0)>ABS(CHECKSUM(NEWID())) % 60
ORDER BY NEWID()
)t2
ORDER BY (t1.outbound_points_total - t1.inbound_points_total ) DESC
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-14 : 12:37:08
Hi Visahk16,

This looks like it could be it, I am getting a bunch of errors tho. I assume by "points" column you meant the relative weight for the link which is actually the "linkWeight" colun. I changed this name and still experience a bunch of execution errors .


Msg 8155, Level 16, State 2, Line 1
No column was specified for column 1 of 'o1'.
Msg 207, Level 16, State 1, Line 13
Invalid column name 'linkWeight'.
Msg 207, Level 16, State 1, Line 13
Invalid column name 'linkWeight'.
Msg 207, Level 16, State 1, Line 6
Invalid column name 'linkWeight'.
Msg 207, Level 16, State 1, Line 6
Invalid column name 'linkWeight'.
Msg 8155, Level 16, State 2, Line 6
No column was specified for column 11 of 't2'.


Any idea ?

Thanks again! much appreciated :)
mike123
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-11-14 : 12:43:01
[code]SELECT *
FROM
(
select * from [tblPartners] WHERE approvalStatus = 1 AND inbound_points_total > 1500 AND (outbound_points_total - inbound_points_total > 1000)
)t1
CROSS APPLY (select TOP 1 *,o.linkweight+COALESCE(o1.Points,0) AS RunningSum from [tblOutBoundLinks] o
OUTER APPLY (SELECT SUM(linkweight) AS Points
FROM [tblOutBoundLinks]
WHERE PartrnerID=o.PartnerID
AND linkID <o.linkID)o1
WHERE approvalStatus = 1 AND displayStatus = 1
AND PartnerID = t1.PartnerID
AND o.linkweight+COALESCE(o1.Points,0)>ABS(CHECKSUM(NEWID())) % 60
ORDER BY NEWID()
)t2
ORDER BY (t1.outbound_points_total - t1.inbound_points_total ) DESC[/code]

also please replace * with actual columns you want
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2008-11-14 : 13:28:28
Hi Visakh16,

This looks great.. at first I was getting 0 results back. My dummy data has 2 links with a weight of 1 each. I think this hard coded value of 60 is whats causing it to mess up. When I change "60" to "2" things seem to be working properly.

Should this value of 60, be the sum of the linkweight ? I tried to change it to SUM(linkWeight) but get an error:


Thanks once again this is a HUGE help :):)

mike123
Msg 147, Level 15, State 1, Line 1
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.



Go to Top of Page
   

- Advertisement -