Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
ron2112
Starting Member
44 Posts |
Posted - 2010-01-15 : 10:46:22
|
I need to do a LIKE comparison between two strings.String A might look like this: 'B6+C10+C125'String B would look something like this: 'C12'I need to determine if String A contains String B, where the instance of String B is not followed by a digit. For example, in the values above String B is clearly contained within String A, but it's part of a longer value (C125). I don't want that C125 to trigger a true result -- only a C12 in isolation should trigger a true result. So by doing this:[String A] LIKE '%' + [String B] + '[^0-9]%' I would get a false for the values given above, but I'd get a true if String A was 'B6+C12+C125'. So far so good.The problem occurs if String A ends with String B. For example, if String A = 'B1+C12' I get a false, because there is not a character after C12 to correspond to [^0-9].I've tried two things:[String A] + ' ' LIKE '%' + [String B] + '[^0-9]%' ...and...[String A] LIKE '%' + [String B] + '[^0-9]%'OR[String A] LIKE '%' + [String B] ...but both of those options introduce an enormous performance hit. So I need something to replace the [^0-9] -- something that says there may or may not be a character following String B, but if there is, it's not a digit. I didn't find anything like this in the documentation on wild cards, but I'm hoping someone might know a trick? |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-15 : 11:09:37
|
| [code]DECLARE @strFind varchar(100)SELECT @strFind = 'C12'DECLARE @Data TABLE( Value varchar(100))INSERT INTO @DataSELECT 'B6+C10+C125' UNION ALLSELECT 'B6+C10+C12' UNION ALLSELECT 'B6+C10+C12+D123'SELECT *FROM @DataWHERE Value + 'x' LIKE '%' + @strFind + '[^0-9]%'[/code]Poor performance?[code]SELECT *FROM @DataWHERE Value LIKE '%' + @strFind + '[^0-9]%'UNION ALLSELECT *FROM @DataWHERE Value LIKE '%' + @strFind[/code]Any better?I think the basic query is a table scan, so I don't see that usingWHERE Value + 'x' LIKE '%' + @strFind + '[^0-9]%'is any different |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-15 : 11:10:54
|
OK, So this has unacceptable performance hit?I edited my original post |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-15 : 11:21:56
|
I added an index to the Value column. That gives a query plan of Index Scan, instead of Table Scan.The UNION ALL gives a pair of Index Seeks, using some complicated parameter mangling for LikeRangeStart and LikeRangeEnd. I don't believe its doing anything clever., so I'm subious why Query Plan is reporting Index Seek.Its a table (or Index if you have one) scan whichever way I cut it.Finding specifically "at the end" would only be accelerated by storing a REVERSE version of the Value, and indexing that, and doing a LIKE REVERSE('%CT12') on it!Searching for Contains "CT12" is going to use some form of brute force.Hopefully I won't have to eat my hat in 5 minutes |
 |
|
|
ron2112
Starting Member
44 Posts |
Posted - 2010-01-15 : 11:25:03
|
| Thanks Kristen... the first technique (adding 'x' to the end of the string) causes things to grind to a crawl. The second technique using UNION won't work because I'm doing this within a recursive CTE... more than one UNION in there will cause an error. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-15 : 12:28:43
|
| Ah ...WHERE Value + 'X' LIKE ...is probably avoiding the index?Do the query plans suggest that? I've only got a couple of rows in my test table, so the query plans are really "useful"!Can you put a nested SUB SELECT in your CTE, in place of the current FROM? (No skills with CTE, so I don't know the answer to that) |
 |
|
|
ron2112
Starting Member
44 Posts |
Posted - 2010-01-15 : 12:35:10
|
| Even if I physically add the X to the end of the Value column itself (and then remove it afterward) there's still a performance ding, so I'm not sure what to make of that. Maybe it's just the fact of comparing to a string that's one character longer... seems unlikely but it's all I've got at the moment. Adding to the confusion is the fact that if I index that column, I get a marked performance decrease, believe it or not.The query plans seem not to be terribly useful where CTEs are involved. Or I don't know how to read them, which is also a possibility. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-15 : 12:52:59
|
| "I index that column, I get a marked performance decrease, believe it or not"The index won't really help here, although SQL's stats may hint that it will, and make it use the index.Looking for 'foo%' can use an index to find anything beginning with "foo".For '%foo%' every index entry has to be checked, and then every match has to do a corresponding lookup in the actual data (lots of disk head travel etc.).So just scanning the table itself would be as good, and for any matches the actual data columns can be grabbed at the same time.So I'm kinda not surprised to hear that! |
 |
|
|
ron2112
Starting Member
44 Posts |
Posted - 2010-01-15 : 12:56:50
|
| It turns out I was wrong about the second UNION ALL in the CTE. So your second suggestion will work, but it takes about twice as long to run. I think ultimately I just need to keep tweaking it and trying different things until I get it down to a minimum.Thanks for all your help! |
 |
|
|
|
|
|
|
|