Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Like, but from a table

Author  Topic 

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-05-08 : 21:42:28
No, I haven't gone valley.

I've currently got a user defined function that checks email subject lines to see if they're likely to be spam. It's very straightforward:

create function f_spam (@vcSubject varchar(255))
returns int AS

if @vcSubject like '%your credit%'
return 1
if @vcSubject like '%viagra%'
return 1
if @vcSubject like '%hot teens%'
return 1

return 0


As you can imagine, that function has grown to be about 1000 lines, and it's time to do this right.

What I'd like to do is have a table of spam phrases:

create table spam_phrases (
phrase varchar(20) PRIMARY KEY CLUSTERED
)

insert into spam_phrases (phrase) VALUES ('your credit')
insert into spam_phrases (phrase) VALUES ('viagra')
insert into spam_phrases (phrase) VALUES ('hot teens')


And then change my function, or the calling stored procedure, to see if any of the phrases are in the subject line. Is this possible without dynamic SQL? I worry that building a dynamic SQL WHERE clause using the phrases will exceed the 8000 character varchar limit, and I don't think you can execute a text variable.

Can anyone see a way to do this? I need to match phrases of arbitrary word lengths, so splitting it into words and then doing word lookups doesn't work. I suppose I could test the whole subject, then every combination of phrases in the subject, but that's *really* ugly.

Ideas?

Thanks
-b

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2003-05-08 : 21:47:00
why not just:



if exists
(select 1 from spam_phrases where @vcSubject LIKE '%' + phrase + '%')
return 1
else
return 0


something like that ....

- Jeff
Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-05-09 : 19:47:23
Er, yes, that would do it. Thanks! Didn't know you could do concatenation like that.

Thanks
-b

Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2003-05-09 : 21:17:08
BY the way --

if you are testing a lot of different subject lines for a bunch of emails at once, it might be faster to do a join right between your two tables (using the LIKE as the operator) and do it all at once rather than using a UDF ...

The SQL can probably optimize a join or a correlated subquery better than it can repeatedly calling a UDF.

Something like:

select email.from, email.to, email.subject, ... etc ...
MAX(case when p.phrase is null then 0 else 1 end) as spam
FROM
email
LEFT OUTER JOIN
spam_phrases p
on
email.subject LIKE '%' + p.phrase + '%'
GROUP BY email.from, email.to, email.subject, ..etc ...

or you could use the NOT EXISTS() in your a case expression or something like that ... this would be the way to go if you wish to tag or divert a bunch of email at once, instead of calling the UDF each time.

but test out both ways.

- Jeff

Edited by - jsmith8858 on 05/09/2003 21:17:31
Go to Top of Page
   

- Advertisement -