Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 problem with group by (?)

Author  Topic 

eva_blue
Starting Member

2 Posts

Posted - 2008-07-12 : 21:37:01
Hello to all,

I am quite new to sql, and so the time has come that I don't understand why a query has a certain output.I apologise for the long post but I'd rather describe exactly what I am doing in, hoping that someone can spot the problem that I fail to.
I have created and populated two tables in my database that look like:


id start_bindsite end_bindsite
1 10212250 10212257
2 10105334 10105368
3 101556063 101556073


gene_start gene_end gene_name
10212248 10312253 gene1
10212258 10322250 gene2
101556070 102556060 gene3

What I want is to find which gene from table genome is closer to each of the sites 1, 2, 3 in table binding_sites.

I have created a nested query that calculates the linear distances between the gene_start and both the start_bindsite and end_bindsite and then with the outer select I wish to choose the minimum distance. The problem is that when I group by the temp.start field the output is indeed for each binding site the minimum distance but the gene_name is always gene1 , whereas for the last two binding sites it is gene2 and gene3 respectively that should be presented. And when grouping by other fields or combination of other fields doesn't seem to help.

The query:


select   temp.start, temp.end, temp.gene_name, min(temp.distance) as minimum_distance
from (select start_bindsite, end_bindsite, gene_name,abs(gene_start-start_bindsite) as distance
from genome, binding_sites
union
select start_bindsite, end_bindsite, gene_name, abs(gene_start-end_bindsite) as distance
from genome, binding_sites) as temp
group by temp.start


And the output
+-----------+-----------+-----------+------------------+
| start | end | gene_name | minimum_distance |
+------------+-----------+-----------+------------------+
| 10105334 | 10105368 | gene1 | 106880 |
| 10212250 | 10212257 | gene1 | 1 |
| 101556063 | 101556073 | gene1 | 3 |
+-----------+-----------+------------+------------------+
3 rows in set (0.00 sec)


Does anyone have any idea what am I doing wrong here? Any hint or suggestion would be hugely appreciated!!
Thanks for your time in advance,

Eva

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-13 : 02:41:21
I'm still unclear with your rules for finding out matching records. Are you looking at comparing values of both start and end column values of tow tables or are you looking at min value of their difference from both?b/w your posted query does even involve genome table at all!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-07-13 : 03:04:04
Are you using SQL Server 2000 or SQL Server 2005?


E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

eva_blue
Starting Member

2 Posts

Posted - 2008-07-13 : 09:00:57
Hello again,

Thank you both for your replies.
I solved the problem after all by applying an inner join to the table described as temp and the table occuring from my whole previous query (only this time I had the same fields to the select and the group by).

I have mysql server 5.0 and what I wished to obtain from the query is first to calculate the linear distance between the gene_start and the binding site's location But the binding site has a start and an end so I wanted to calculated the distance from the gene to both. That I did with the temp table that had as an output:

+-----------+-----------+-----------+----------+
| gene_name | start | end | distance |
+-----------+-----------+-----------+----------+
| gene1 | 10212250 | 10212257 | 2 |
| gene2 | 10212250 | 10212257 | 8 |
| gene3 | 10212250 | 10212257 | 91343820 |
| gene1 | 10212250 | 10212257 | 9 |
| gene2 | 10212250 | 10212257 | 1 |
| gene3 | 10212250 | 10212257 | 91343813 |
.........................................................................
..........................................................................
rows omitted
+-----------+-----------+-----------+----------+

So what I wanted then is to find for each binding site which gene was nearest to either its start or end, i.e to pick

+-----------+-----------+-----------+----------+
| gene_name | start | end | distance |
+-----------+-----------+-----------+----------+
| gene2 | 10212250 | 10212257 | 1 |
...........................................................................
............................................................................
rows omitted
+-----------+-----------+-----------+----------+

As I said above the inner join did the trick. I am sorry for the time that you spent on my account; as soon as I left the house for a couple of hours, the solution was oh so obvious. Again, thanks a lot.

Cheers,
Eva
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-13 : 09:12:54
quote:
Originally posted by eva_blue

Hello again,

Thank you both for your replies.
I solved the problem after all by applying an inner join to the table described as temp and the table occuring from my whole previous query (only this time I had the same fields to the select and the group by).

I have mysql server 5.0 and what I wished to obtain from the query is first to calculate the linear distance between the gene_start and the binding site's location But the binding site has a start and an end so I wanted to calculated the distance from the gene to both. That I did with the temp table that had as an output:

+-----------+-----------+-----------+----------+
| gene_name | start | end | distance |
+-----------+-----------+-----------+----------+
| gene1 | 10212250 | 10212257 | 2 |
| gene2 | 10212250 | 10212257 | 8 |
| gene3 | 10212250 | 10212257 | 91343820 |
| gene1 | 10212250 | 10212257 | 9 |
| gene2 | 10212250 | 10212257 | 1 |
| gene3 | 10212250 | 10212257 | 91343813 |
.........................................................................
..........................................................................
rows omitted
+-----------+-----------+-----------+----------+

So what I wanted then is to find for each binding site which gene was nearest to either its start or end, i.e to pick

+-----------+-----------+-----------+----------+
| gene_name | start | end | distance |
+-----------+-----------+-----------+----------+
| gene2 | 10212250 | 10212257 | 1 |
...........................................................................
............................................................................
rows omitted
+-----------+-----------+-----------+----------+

As I said above the inner join did the trick. I am sorry for the time that you spent on my account; as soon as I left the house for a couple of hours, the solution was oh so obvious. Again, thanks a lot.

Cheers,
Eva


Ok. no probs. But your sample output suggests you want to take difference with each of record in first table without any relation. SO i guess what you need is a cross join or inner join with a trivial condition.
Go to Top of Page
   

- Advertisement -