Calculating in-breeding coefficient

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Calculating in-breeding coefficient

Author

Topic

Boxersoft
Starting Member

28 Posts

Posted - 2010-08-21 : 12:36:53

I have a table containing animals and their parents:

CREATE TABLE animal (
[Name] [varchar](100) NOT NULL,
[SireName] [varchar](100) NULL,
[DamName] [varchar](100) NULL,
[IC] [float] NULL,
CONSTRAINT [PK_animal] PRIMARY KEY
( [Name] ASC ))

insert into animalTest
select 'Iris', 'Minstrel', 'Lucky Lady',0 union all
select 'Minstrel', 'Adam', 'Grace',0 union all
select 'Lucky Lady', 'Cuillin', 'Lucky 11',0 union all
select 'Adam', 'Saturn', 'Lave',0 union all
select 'Cuillin', 'Huxter', 'Lyn',0 union all
select 'Lucky 11', 'Mansie', 'Lucky',0 union all
select 'Saturn', 'Mansie', 'Sheena 11',0 union all
select 'Huxter', 'Copper', 'Delila',0 union all
select 'Sheena 11', 'Viking', 'Sheena', 0.0469 union all
select 'Copper', 'Tim', 'Sheena 11',0
... and I want to be able to calculate the in-breeding coefficient (IC) of the animals.

An animal is considered in-bred if it has a parent common to both its maternal and paternal lines: on a pedigree ('family tree') chart, you would be able to trace a path up through one line to the common ancestor and back down through the other line. The degree ('coefficient') of in-breeding is a factor of the number of ancestors in this circuitous path, specifically 0.5^[ancestor_count].

For example, with the above sample data it can be seen that 'Sheena 11' appears both in Iris's maternal and paternal lines:

[Iris]->Minstrel->Adam->Saturn->Sheena 11
[Sheena 11]->Copper->Huxter->Cuillin->Lucky Lady

With eight ancestors in the 'loop' (the common ancestor only gets counted once), that gives Iris an IC of 0.5^8, or 0.0039 (0.39%). If the common ancestor is also inbred, the result is inflated by that amount so as Sheena 11 has an IC of 4.69%, Iris's IC becomes 0.39% + 4.69% = 0.41%

If there are additional common ancestors, the same process is followed and all the ICs are added together to get the overall IC.

So, how to achieve this using SQL? It seems like it's a job well suited to a set-based approach, but I'm not sure how best to go about it. Any suggestions?

Boxersoft
Starting Member

28 Posts

Posted - 2010-08-25 : 09:52:30

I think perhaps I made the question more specific than it ought to have been. I guess what I really need to do is find common ancestors and their 'distance' from the subject. From that I think it would be simple arithmetic.

I've created a CTE that gets all ancestors of a given subject. If I join the ancestors of the subject's sire with those of its dam then that gives me all of the subject's common ancestors, but that includes ancestors of the common ones. For example, if the following paths are maternal and paternal lines for H:

A>B>C>D>E>F>G>H
A>B>C>I>J>K>L>H

... then I only want to get C, not all of C's ancestors.

I guess what I need is the shortest path from G->L. I've found some stuff on Set Theory and 'transitive closure', but it's all a bit out of my depth...

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-25 : 10:53:10

I have posted a number of queries to find "Friends" in social networks.
See http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=125959
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=115290
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=89323

N 56°04'39.26"
E 12°55'05.63"

Boxersoft
Starting Member

28 Posts

Posted - 2010-08-25 : 15:10:57

Thanks, I'll take a look (I'll be offline until next week though).

Boxersoft
Starting Member

28 Posts

Posted - 2010-08-31 : 03:56:36

OK, I'm back and I've taken a look. Your solution in http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=115290 (11/27/2008 : 09:25:09) is brilliant and looks like it should be just the job. The 'path' that it builds would be really useful too (in my case it would show the path from the subject's sire through a common ancestor and back to the subject's dam).

I can't quite get my head around it enough to convert it to my structure though. My equivalent of the 'links' are the sire & dam, and they're held in the animal row rather than being in a separate table.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-31 : 05:33:37

What is the expected result for the sample data posted 08/21/2010 : 12:36:53 ?

N 56°04'39.26"
E 12°55'05.63"

Boxersoft
Starting Member

28 Posts

Posted - 2010-08-31 : 06:39:42

I'd want to be able to search for a common ancestor of two individuals and find all independent paths between them. By 'independent' I mean paths that don't involve 'back-tracking' (all ancestors of a common ancestor will also be common, but are ony of interest if there's a separate route back).

In the sample data I posted, if I were to search for ancestors common to Iris's parents ('Minstrel' and 'Lucky Lady'), there would be just one such path so I'd just want one row returned:

CommonAncestor AncestorIC Hops IC Path
'Sheena 11' 0.0469 8 0.004089453
'Minstrel-Adam-Saturn-Sheena 11-Copper-Huxter-Cuillin-Lucky Lady'

[The IC is just calculated as 0.5^[Hops] * (1+AncestorIC)]

Does that make sense?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-31 : 09:01:13

quote:
Originally posted by Boxersoft

[Iris]->Minstrel->Adam->Saturn->Sheena 11
[Sheena 11]->Copper->Huxter->Cuillin->Lucky Lady

This should be true for Mansie also (as for Sheena 11)?

[Iris]->Lucky Lady->Lucky 11->Mansie
[Mansie]->Saturn->Adam->Minstrel

N 56°04'39.26"
E 12°55'05.63"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-31 : 09:29:00

[code]DECLARE @Animal TABLE
(
Name VARCHAR(100) PRIMARY KEY CLUSTERED,
SireName VARCHAR(100) NOT NULL,
DamName VARCHAR(100) NOT NULL,
IC [float] NULL
)

INSERT @Animal
VALUES ('Iris', 'Minstrel', 'Lucky Lady', 0),
('Minstrel', 'Adam', 'Grace', 0),
('Lucky Lady', 'Cuillin', 'Lucky 11', 0),
('Adam', 'Saturn', 'Lave', 0),
('Cuillin', 'Huxter', 'Lyn', 0),
('Lucky 11', 'Mansie', 'Lucky', 0),
('Saturn', 'Mansie', 'Sheena 11', 0),
('Huxter', 'Copper', 'Delila', 0),
('Sheena 11', 'Viking', 'Sheena', 0.0469),
('Copper', 'Tim', 'Sheena 11', 0)

;WITH cteHierarchy(Name, SireName, DamName, Hops)
AS (
SELECT Name,
SireName,
DamName,
0 AS Hops
FROM @Animal
WHERE Name = 'Iris'

UNION ALL

SELECT a.Name,
a.SireName,
a.DamName,
h.Hops + 1
FROM cteHierarchy AS h
INNER JOIN @Animal AS a ON a.Name IN (h.SireName, h.DamName)
), cteDuplicates(Name, SireName, DamName, Hops)
AS (
SELECT Name,
SireName,
DamName,
Hops
FROM (
SELECT Name,
SireName,
DamName,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS RecID,
Hops
FROM cteHierarchy
) AS d
WHERE RecID = 1
), cteFamily
AS (
SELECT u.theName AS Name,
1 + SUM(u.Hops) AS Hops
FROM cteDuplicates AS f
UNPIVOT (
theName
FOR theCol IN (f.SireName, f.DamName)
) AS u
GROUP BY u.theName
HAVING COUNT(*) > 1
)
SELECT Name,
Hops,
POWER(0.5E, Hops) AS IC
FROM cteFamily[/code]

N 56°04'39.26"
E 12°55'05.63"

Boxersoft
Starting Member

28 Posts

Posted - 2010-08-31 : 14:35:59

Sorry for the delay in responding - was out tending to the Iris in question (and the rest of the herd).

You're quite right, that data sample would also yield Mansie as a common ancestor - I didn't spot it because it's a data error on my part (they are actually two different Mansies but I simplified the data for clarity and failed to notice). Your code produces the correct results with the sample data. I haven't a clue how it works yet, but it's very impressive. It's also very fast, even on the larger dataset from which I prepared that sample - I expected this to be quite a grind, but it evidently isn't.

Unfortunately when I run it against the full dataset it doesn't produce the expected results. It returns the path through 'Sheena II', for example, as being 15 hops rather than the expected 8 that it produces from the sample data and I'm not sure why. Is there any way your code could be extended to produce the kind of path that you had in the train route problem? That might help trace what's going on, and would also be interesting and useful in its own right.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources