Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Similarity Matrix and score

Author  Topic 

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2010-11-05 : 10:32:04
Greetings

How would I do a similarity matrix using TSQL based on objects that share similar attributes. (http://en.wikipedia.org/wiki/Similarity_matrix)

The more attributes they share the higher the score. I am trying to use this as a similarity algorithm

Thanks

sample data

declare @books TABLE(book_id int, descr nvarchar(50))
insert into @books
SELECT 1, 'Gone with the wind'
UNION ALL
SELECT 2, 'Mars'
UNION ALL
SELECT 3, 'Do Androids Dream of Electric Sheep?'
UNION ALL
SELECT 4, 'Ender''s Game'
UNION ALL
SELECT 5, 'Dune'
UNION ALL
SELECT 6, 'Foundation'
UNION ALL
SELECT 7, 'Hitch Hiker''s Guide to the Galaxy'
UNION ALL
SELECT 8, '1984'
UNION ALL
SELECT 9, 'Star Wars'
UNION ALL
SELECT 10, 'Battlestar Galactica'
UNION ALL
SELECT 11, 'Fringe'

--SELECT * FROM @books
declare @terminologies TABLE(terminology_id int, descr nvarchar(50))
INSERT INTO @terminologies
SELECT 1, 'Futuristic'
UNION ALL
SELECT 2, 'Propulsion'
UNION ALL
SELECT 3, 'Guidance'
UNION ALL
SELECT 4, 'Life Support'
UNION ALL
SELECT 5, 'Cabin Structure'
UNION ALL
SELECT 6, 'Communications'
UNION ALL
SELECT 7, 'Thermal Protection'
UNION ALL
SELECT 8, 'Displays And Controls'
UNION ALL
SELECT 9, 'Space Craft'
UNION ALL
SELECT 10, 'Light Speed'
UNION ALL
SELECT 11, 'Warp Speed'
UNION ALL
SELECT 12, 'Jedi'
UNION ALL
SELECT 13, 'Force is weak with you rookie Jedi'
UNION ALL
SELECT 14, 'Civial War'

--SELECT * FROM @terminologies

DECLARE @book_terminologies TABLE(book_id int, terminology_id int)

INSERT INTO @book_terminologies
( book_id, terminology_id )
SELECT 1, 10
UNION ALL
SELECT 2, 1
UNION ALL
SELECT 2, 2
UNION ALL
SELECT 2, 4
UNION ALL
SELECT 2, 7
UNION ALL
SELECT 2, 8
UNION ALL
SELECT 3, 1
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 3, 4
UNION ALL
SELECT 3, 5
UNION ALL
SELECT 3, 6
UNION ALL
SELECT 9, 1
UNION ALL
SELECT 9, 2
UNION ALL
SELECT 9, 4
UNION ALL
SELECT 9, 5
UNION ALL
SELECT 9, 6
UNION ALL
SELECT 9, 7
UNION ALL
SELECT 9, 8
UNION ALL
SELECT 9, 9
UNION ALL
SELECT 9, 10
UNION ALL
SELECT 9, 11
UNION ALL
SELECT 9, 12
UNION ALL
SELECT 9, 13


SELECT *
FROM @book_terminologies bt
INNER JOIN @books b
ON bt.book_id = b.book_id
INNER JOIN @terminologies t
ON bt.terminology_id = t.terminology_id



If you don't have the passion to help people, you have no passion

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2010-11-05 : 21:12:29
ok so I looked into this a bit more and the link for similarity matrix points me to Sequence alignment (http://en.wikipedia.org/wiki/Sequence_alignment)
So in my case I could concatenate the terminology id for each book sequentially.
So the book Star Wars would have the following sequence of ids in ascending order
1245678910111213 as you can see terminology_id 3 is missing because Star Wars does not have a terminology relationship to Guidance. But I still want to somehow indicate that that association is missing so my sequence will be
12X45678910111213. So now let us take another book 'Mars' it sequence would be
12478. The full sequence would be 12X4XX78XXXXXX. if you compare this books terminology_id sequence to Star Wars' you will notice the sparseness
but also some of the similarities

1 2 X 4 5 6 7 8 9 10 11 12 13
1 2 X 4 X X 7 8 X X X X X


So I would like to capture the differences and compare them in order to establish the following. Given a certain book and it's attributes which other book is it similar to and rank the results in highest similarity score? Is this possible to accomplish in SQL is my question?




If you don't have the passion to help people, you have no passion
Go to Top of Page

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2010-11-09 : 10:27:26
ok seems like I am close to figuring this out. Not so much similarity matrix but more proximity matrix would solve the problem

http://www.sqlmag.com/blogs/puzzled-by-t-sql/tabid/1023/entryid/12950/Solutions-to-Proximity-Puzzle.aspx

If you don't have the passion to help people, you have no passion
Go to Top of Page
   

- Advertisement -