Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Boxersoft
Starting Member
28 Posts |
Posted - 2010-08-21 : 12:36:53
|
| I have a table containing animals and their parents:CREATE TABLE animal ( [Name] [varchar](100) NOT NULL, [SireName] [varchar](100) NULL, [DamName] [varchar](100) NULL, [IC] [float] NULL, CONSTRAINT [PK_animal] PRIMARY KEY ( [Name] ASC ))insert into animalTestselect 'Iris', 'Minstrel', 'Lucky Lady',0 union allselect 'Minstrel', 'Adam', 'Grace',0 union allselect 'Lucky Lady', 'Cuillin', 'Lucky 11',0 union allselect 'Adam', 'Saturn', 'Lave',0 union allselect 'Cuillin', 'Huxter', 'Lyn',0 union allselect 'Lucky 11', 'Mansie', 'Lucky',0 union allselect 'Saturn', 'Mansie', 'Sheena 11',0 union allselect 'Huxter', 'Copper', 'Delila',0 union allselect 'Sheena 11', 'Viking', 'Sheena', 0.0469 union allselect 'Copper', 'Tim', 'Sheena 11',0... and I want to be able to calculate the in-breeding coefficient (IC) of the animals. An animal is considered in-bred if it has a parent common to both its maternal and paternal lines: on a pedigree ('family tree') chart, you would be able to trace a path up through one line to the common ancestor and back down through the other line. The degree ('coefficient') of in-breeding is a factor of the number of ancestors in this circuitous path, specifically 0.5^[ancestor_count].For example, with the above sample data it can be seen that 'Sheena 11' appears both in Iris's maternal and paternal lines:[Iris]->Minstrel->Adam->Saturn->Sheena 11[Sheena 11]->Copper->Huxter->Cuillin->Lucky LadyWith eight ancestors in the 'loop' (the common ancestor only gets counted once), that gives Iris an IC of 0.5^8, or 0.0039 (0.39%). If the common ancestor is also inbred, the result is inflated by that amount so as Sheena 11 has an IC of 4.69%, Iris's IC becomes 0.39% + 4.69% = 0.41%If there are additional common ancestors, the same process is followed and all the ICs are added together to get the overall IC. So, how to achieve this using SQL? It seems like it's a job well suited to a set-based approach, but I'm not sure how best to go about it. Any suggestions? |
|
|
Boxersoft
Starting Member
28 Posts |
Posted - 2010-08-25 : 09:52:30
|
| I think perhaps I made the question more specific than it ought to have been. I guess what I really need to do is find common ancestors and their 'distance' from the subject. From that I think it would be simple arithmetic. I've created a CTE that gets all ancestors of a given subject. If I join the ancestors of the subject's sire with those of its dam then that gives me all of the subject's common ancestors, but that includes ancestors of the common ones. For example, if the following paths are maternal and paternal lines for H:A>B>C>D>E>F>G>HA>B>C>I>J>K>L>H... then I only want to get C, not all of C's ancestors. I guess what I need is the shortest path from G->L. I've found some stuff on Set Theory and 'transitive closure', but it's all a bit out of my depth... |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
Boxersoft
Starting Member
28 Posts |
Posted - 2010-08-25 : 15:10:57
|
| Thanks, I'll take a look (I'll be offline until next week though). |
 |
|
|
Boxersoft
Starting Member
28 Posts |
Posted - 2010-08-31 : 03:56:36
|
| OK, I'm back and I've taken a look. Your solution in http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=115290 (11/27/2008 : 09:25:09) is brilliant and looks like it should be just the job. The 'path' that it builds would be really useful too (in my case it would show the path from the subject's sire through a common ancestor and back to the subject's dam). I can't quite get my head around it enough to convert it to my structure though. My equivalent of the 'links' are the sire & dam, and they're held in the animal row rather than being in a separate table. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-08-31 : 05:33:37
|
What is the expected result for the sample data posted 08/21/2010 : 12:36:53 ? N 56°04'39.26"E 12°55'05.63" |
 |
|
|
Boxersoft
Starting Member
28 Posts |
Posted - 2010-08-31 : 06:39:42
|
| I'd want to be able to search for a common ancestor of two individuals and find all independent paths between them. By 'independent' I mean paths that don't involve 'back-tracking' (all ancestors of a common ancestor will also be common, but are ony of interest if there's a separate route back).In the sample data I posted, if I were to search for ancestors common to Iris's parents ('Minstrel' and 'Lucky Lady'), there would be just one such path so I'd just want one row returned:CommonAncestor AncestorIC Hops IC Path'Sheena 11' 0.0469 8 0.004089453 'Minstrel-Adam-Saturn-Sheena 11-Copper-Huxter-Cuillin-Lucky Lady'[The IC is just calculated as 0.5^[Hops] * (1+AncestorIC)]Does that make sense? |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-08-31 : 09:01:13
|
quote: Originally posted by Boxersoft [Iris]->Minstrel->Adam->Saturn->Sheena 11[Sheena 11]->Copper->Huxter->Cuillin->Lucky Lady
This should be true for Mansie also (as for Sheena 11)?[Iris]->Lucky Lady->Lucky 11->Mansie[Mansie]->Saturn->Adam->Minstrel N 56°04'39.26"E 12°55'05.63" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-08-31 : 09:29:00
|
[code]DECLARE @Animal TABLE ( Name VARCHAR(100) PRIMARY KEY CLUSTERED, SireName VARCHAR(100) NOT NULL, DamName VARCHAR(100) NOT NULL, IC [float] NULL )INSERT @AnimalVALUES ('Iris', 'Minstrel', 'Lucky Lady', 0), ('Minstrel', 'Adam', 'Grace', 0), ('Lucky Lady', 'Cuillin', 'Lucky 11', 0), ('Adam', 'Saturn', 'Lave', 0), ('Cuillin', 'Huxter', 'Lyn', 0), ('Lucky 11', 'Mansie', 'Lucky', 0), ('Saturn', 'Mansie', 'Sheena 11', 0), ('Huxter', 'Copper', 'Delila', 0), ('Sheena 11', 'Viking', 'Sheena', 0.0469), ('Copper', 'Tim', 'Sheena 11', 0);WITH cteHierarchy(Name, SireName, DamName, Hops)AS ( SELECT Name, SireName, DamName, 0 AS Hops FROM @Animal WHERE Name = 'Iris' UNION ALL SELECT a.Name, a.SireName, a.DamName, h.Hops + 1 FROM cteHierarchy AS h INNER JOIN @Animal AS a ON a.Name IN (h.SireName, h.DamName)), cteDuplicates(Name, SireName, DamName, Hops)AS ( SELECT Name, SireName, DamName, Hops FROM ( SELECT Name, SireName, DamName, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS RecID, Hops FROM cteHierarchy ) AS d WHERE RecID = 1), cteFamilyAS ( SELECT u.theName AS Name, 1 + SUM(u.Hops) AS Hops FROM cteDuplicates AS f UNPIVOT ( theName FOR theCol IN (f.SireName, f.DamName) ) AS u GROUP BY u.theName HAVING COUNT(*) > 1)SELECT Name, Hops, POWER(0.5E, Hops) AS ICFROM cteFamily[/code] N 56°04'39.26"E 12°55'05.63" |
 |
|
|
Boxersoft
Starting Member
28 Posts |
Posted - 2010-08-31 : 14:35:59
|
| Sorry for the delay in responding - was out tending to the Iris in question (and the rest of the herd).You're quite right, that data sample would also yield Mansie as a common ancestor - I didn't spot it because it's a data error on my part (they are actually two different Mansies but I simplified the data for clarity and failed to notice). Your code produces the correct results with the sample data. I haven't a clue how it works yet, but it's very impressive. It's also very fast, even on the larger dataset from which I prepared that sample - I expected this to be quite a grind, but it evidently isn't.Unfortunately when I run it against the full dataset it doesn't produce the expected results. It returns the path through 'Sheena II', for example, as being 15 hops rather than the expected 8 that it produces from the sample data and I'm not sure why. Is there any way your code could be extended to produce the kind of path that you had in the train route problem? That might help trace what's going on, and would also be interesting and useful in its own right. |
 |
|
|
|
|
|
|
|