LIKE comparison at end of string

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

LIKE comparison at end of string

Author

Topic

ron2112
Starting Member

44 Posts

Posted - 2010-01-15 : 10:46:22

I need to do a LIKE comparison between two strings.

String A might look like this: 'B6+C10+C125'
String B would look something like this: 'C12'

I need to determine if String A contains String B, where the instance of String B is not followed by a digit. For example, in the values above String B is clearly contained within String A, but it's part of a longer value (C125). I don't want that C125 to trigger a true result -- only a C12 in isolation should trigger a true result. So by doing this:

[String A] LIKE '%' + [String B] + '[^0-9]%'

I would get a false for the values given above, but I'd get a true if String A was 'B6+C12+C125'. So far so good.

The problem occurs if String A ends with String B. For example, if String A = 'B1+C12' I get a false, because there is not a character after C12 to correspond to [^0-9].

I've tried two things:

[String A] + ' ' LIKE '%' + [String B] + '[^0-9]%'

...and...

[String A] LIKE '%' + [String B] + '[^0-9]%'
OR
[String A] LIKE '%' + [String B]

...but both of those options introduce an enormous performance hit. So I need something to replace the [^0-9] -- something that says there may or may not be a character following String B, but if there is, it's not a digit. I didn't find anything like this in the documentation on wild cards, but I'm hoping someone might know a trick?

Kristen
Test

22859 Posts

Posted - 2010-01-15 : 11:09:37

[code]
DECLARE @strFind varchar(100)

SELECT @strFind = 'C12'

DECLARE @Data TABLE
(
Value varchar(100)
)

INSERT INTO @Data
SELECT 'B6+C10+C125' UNION ALL
SELECT 'B6+C10+C12' UNION ALL
SELECT 'B6+C10+C12+D123'

SELECT *
FROM @Data
WHERE Value + 'x' LIKE '%' + @strFind + '[^0-9]%'
[/code]
Poor performance?

[code]
SELECT *
FROM @Data
WHERE Value LIKE '%' + @strFind + '[^0-9]%'
UNION ALL
SELECT *
FROM @Data
WHERE Value LIKE '%' + @strFind
[/code]
Any better?

I think the basic query is a table scan, so I don't see that using

WHERE Value + 'x' LIKE '%' + @strFind + '[^0-9]%'

is any different

Kristen
Test

22859 Posts

Posted - 2010-01-15 : 11:10:54

~~OK, So this has unacceptable performance hit?~~

I edited my original post

Kristen
Test

22859 Posts

Posted - 2010-01-15 : 11:21:56

I added an index to the Value column. That gives a query plan of Index Scan, instead of Table Scan.

The UNION ALL gives a pair of Index Seeks, using some complicated parameter mangling for LikeRangeStart and LikeRangeEnd. I don't believe its doing anything clever., so I'm subious why Query Plan is reporting Index Seek.

Its a table (or Index if you have one) scan whichever way I cut it.

Finding specifically "at the end" would only be accelerated by storing a REVERSE version of the Value, and indexing that, and doing a LIKE REVERSE('%CT12') on it!

Searching for Contains "CT12" is going to use some form of brute force.

Hopefully I won't have to eat my hat in 5 minutes

ron2112
Starting Member

44 Posts

Posted - 2010-01-15 : 11:25:03

Thanks Kristen... the first technique (adding 'x' to the end of the string) causes things to grind to a crawl. The second technique using UNION won't work because I'm doing this within a recursive CTE... more than one UNION in there will cause an error.

Kristen
Test

22859 Posts

Posted - 2010-01-15 : 12:28:43

Ah ...

WHERE Value + 'X' LIKE ...

is probably avoiding the index?

Do the query plans suggest that? I've only got a couple of rows in my test table, so the query plans are really "useful"!

Can you put a nested SUB SELECT in your CTE, in place of the current FROM? (No skills with CTE, so I don't know the answer to that)

ron2112
Starting Member

44 Posts

Posted - 2010-01-15 : 12:35:10

Even if I physically add the X to the end of the Value column itself (and then remove it afterward) there's still a performance ding, so I'm not sure what to make of that. Maybe it's just the fact of comparing to a string that's one character longer... seems unlikely but it's all I've got at the moment. Adding to the confusion is the fact that if I index that column, I get a marked performance decrease, believe it or not.

The query plans seem not to be terribly useful where CTEs are involved. Or I don't know how to read them, which is also a possibility.

Kristen
Test

22859 Posts

Posted - 2010-01-15 : 12:52:59

"I index that column, I get a marked performance decrease, believe it or not"

The index won't really help here, although SQL's stats may hint that it will, and make it use the index.

Looking for 'foo%' can use an index to find anything beginning with "foo".

For '%foo%' every index entry has to be checked, and then every match has to do a corresponding lookup in the actual data (lots of disk head travel etc.).

So just scanning the table itself would be as good, and for any matches the actual data columns can be grabbed at the same time.

So I'm kinda not surprised to hear that!

ron2112
Starting Member

44 Posts

Posted - 2010-01-15 : 12:56:50

It turns out I was wrong about the second UNION ALL in the CTE. So your second suggestion will work, but it takes about twice as long to run. I think ultimately I just need to keep tweaking it and trying different things until I get it down to a minimum.

Thanks for all your help!

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources