Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Not quite hash

Author  Topic 

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2007-04-30 : 08:57:23
I wrote this function:
(It takes a string and sums the Scrabble letter values)

CREATE function dbo.fc_GetWordHashScore(
@strWords text)
returns int
AS
begin
declare @lngReturn As int
declare @strLetter As varchar(8000)
declare @lngValue As int
declare @cnt int

set @lngReturn = 0
set @cnt=0

while @cnt<=datalength(@strWords)
begin
set @lngValue = 0
set @strLetter = substring(@strWords, @cnt, 1)

if @strLetter='E' or @strLetter= 'A' or @strLetter= 'I'or @strLetter='O'or @strLetter= 'N'or @strLetter= 'R'or @strLetter= 'T'or @strLetter= 'L'or @strLetter= 'S'or @strLetter= 'U' set @lngValue = 1
if @strLetter='D' or @strLetter= 'G' set @lngValue = 2
if @strLetter='B' or @strLetter= 'C' or @strLetter='M' or @strLetter='P' set @lngValue = 3
if @strLetter='F' or @strLetter='H' or @strLetter='V' or @strLetter='W' or @strLetter='Y' set @lngValue = 4
if @strLetter='K' set @lngValue = 5
if @strLetter='J' or @strLetter='X' set @lngValue = 8
if @strLetter='Q' or @strLetter='Z' set @lngValue = 10

set @lngReturn = @lngReturn + @lngValue
set @cnt=@cnt+1
end

return @lngReturn
end
GO

It means I can scan hundreds of thousands of records and check a text column for being similar to a text file I am comparing it too. This can be done by checking for records that have a summed Scrabble score 100 less than or greater than the Scrabble score of my text file.

I want to put it in a computed & (here's the thing) persisted column.
Trouble is, it's obviously not deterministic.

Any suggestions on how I can conquer the "deterministic" hurdle?

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-30 : 09:57:38
What is this algorithm being used for?

You really want:

eeeeeeeeee

to be considered the same or similar to:

z

?? Both have a "scrabble score" of 10.

But, anyway, this function is deterministic. (same input = same output).


- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2007-04-30 : 10:03:22
Yes it doesn't matter that 10 e's equal 1 z. It doesn't have to be perfectly accurate.
I know it's logically deterministic. But in isn't deterministic from the SQL Engine's point of view:
Server: Msg 4936, Level 16, State 1, Line 1
Computed column 'WordHash' in table 'Doc' cannot be persisted because the column is non-deterministic.

Am I breaking either of these requirements? (found in SQL 2000 documentation - I presume it's still a requirement in 2005):
The function is schema-bound
The body of the function references no database objects outside the scope of the function
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-30 : 10:24:36
Since you are using SQL 2005, just add WITH SCHEMA BINDING to your function definition. Then, you mark your computed column as PERSISTED and you should be good to go.



- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2007-04-30 : 10:59:54
Thanks Jeff. You're the man. [Sorry about my MS Access Rants. Was having a bad day. :-) ]
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-30 : 11:43:02
That's OK, happens to the best of us, me included for sure!


- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-30 : 11:59:06
[code]CREATE FUNCTION dbo.fnGetWordHashScore
(
@Word VARCHAR(8000)
)
RETURNS INT
AS
BEGIN
DECLARE @Sum AS INT,
@Char CHAR(1),
@Index INT

SELECT @Sum = 0,
@Index = 0

WHILE @Index <= LEN(@Word)
SELECT @Index = @Index + 1,
@Char = SUBSTRING(@Word, @Index, 1),
@Sum = CASE
WHEN @Char IN ('E', 'A', 'I', 'O', 'N', 'R', 'T', 'L', 'S', 'U') THEN 1
WHEN @Char IN ('D', 'G' ) THEN 2
WHEN @Char IN ('B', 'C', 'M', 'P') THEN 3
WHEN @Char IN ('F', 'H', 'V', 'W', 'Y') THEN 4
WHEN @Char IN ('K') THEN 5
WHEN @Char IN ('J', 'X') THEN 8
WHEN @Char IN ('Q', 'Z') THEN 10
ELSE 0
END + @Sum

RETURN @Sum
END[/code]

Peter Larsson
Helsingborg, Sweden
Go to Top of Page

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2007-05-01 : 03:48:13
Very tidy Pete, thanks.

Problem I have now is that my short script to add the column has been executing for the past 17 hours!:

BEGIN TRANSACTION
SET QUOTED_IDENTIFIER ON
SET ARITHABORT ON
SET NUMERIC_ROUNDABORT OFF
SET CONCAT_NULL_YIELDS_NULL ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
COMMIT
BEGIN TRANSACTION
GO
ALTER TABLE dbo.[Doc] ADD
WordHash AS dbo.fc_GetWordHashScore(DocText) PERSISTED
GO
COMMIT


There are 775 000 records with an average of two printed pages worth of text in the DocText column.
Normally I would batch an operation this big. Can you batch the population of a persisted computed column?
Could I maybe populate a standard column in bathces first and then turn that column into a persisted computed column?
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-05-01 : 08:50:29
If that's the case, you may want to create a regular column, populate it in batches, but then use triggers to populate the column going forward, instead of a computed column.

- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

coolerbob
Aged Yak Warrior

841 Posts

Posted - 2007-05-01 : 09:26:46
Well it completed after 20 hours, so that's a relief.
On the production 64-bit server it should be faster.

I think the computed column is more scaleable than the trigger approach. Am I right? Or is there no clear difference?

Also, what happens if I need to change the logic in the function? Because it's schema-bound, will I have to go through the whole 20 hour population process again?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-12-21 : 07:17:26
This calculates the Scrabble Score for the 60,387 words in dbo.WordList table in less than 2 seconds.
CREATE TABLE	#Lookup
(
Letter CHAR(1) PRIMARY KEY CLUSTERED,
Items TINYINT NOT NULL,
Points SMALLINT NOT NULL
)

INSERT #Lookup
(
Letter,
Items,
Points
)
VALUES ('A', 9, 1), ('B', 2, 3), ('C', 2, 3), ('D', 4, 2), ('E', 12, 1), ('F', 2, 4), ('G', 3, 2),
('H', 2, 4), ('I', 9, 1), ('J', 1, 8), ('K', 1, 5), ('L', 4, 1), ('M', 2, 3), ('N', 6, 1),
('O', 8, 1), ('P', 2, 3), ('Q', 1, 10), ('R', 6, 1), ('S', 4, 1), ('T', 6, 1), ('U', 4, 1),
('V', 2, 4), ('W', 2, 4), ('X', 1, 8), ('Y', 2, 4), ('Z', 1, 10)

SELECT y.Word,
SUM(l.Points * CASE WHEN y.Items <= l.Items THEN y.Items ELSE l.Items END) AS Score
FROM (
SELECT w.Word,
SUBSTRING(w.Word, v.Number, 1) AS Letter,
COUNT(*) AS Items
FROM dbo.WordList AS w
INNER JOIN master..spt_values AS v ON v.Type = 'P'
WHERE v.Number BETWEEN 1 AND LEN(w.Word)
GROUP BY w.Word,
SUBSTRING(w.Word, v.Number, 1)
) AS y
INNER JOIN #Lookup AS l ON l.Letter = y.Letter
GROUP BY y.Word
HAVING SUM(CASE WHEN y.Items > l.Items THEN y.Items - l.Items ELSE 0 END) <= 2
ORDER BY SUM(l.Points * CASE WHEN y.Items <= l.Items THEN y.Items ELSE l.Items END) DESC,
y.Word

DROP TABLE #Lookup



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page
   

- Advertisement -