Split Function performance - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Split Function performance

Author

Topic

Kristen
Test

22859 Posts

Posted - 2007-10-08 : 19:46:44

I had an interesting read of "Arrays and Lists in SQL Server 2005 by Erland Sommarskog, SQL Server MVP.

I was particularly interested in:

Separate functions to return INT / (n)VARCHAR (to avoid casting Varchar to Int in a JOIN between SplitFunction to MyTable)

In fact he advocates that the String version should return two columns - Varchar and Nvarchar - to be doubly useful, avoid a Cast between Varchar / Nvarchar, and only take a few run-time percent extra.

Then there is the issue that (n)Varchar(MAX) is slower then maxing Varchar(8000) / Nvarchar(4000); so worth having a separate function for (n)Varchar(MAX) and only using it when the string to be split could be > 4,000 / 8,000 characters

Also worth forcing some comparisons to COLLATE Binary (where the outcome is not influenced by the collation) - e.g. a CHARINDEX for a collation-independent character - such as ","

But the key thing I noticed was that @TableVar have no statistics, so the optimiser uses a Blind Assumption.

That sounds terrible!

Coupled with that is that it may be better to Split into a #TempTable and then JOIN #TempTable to MyTable so that Auto Stats can be used (in SQL2005 at least) to cause recompile of the relevant statement(s) [only] to good effect.

So that leads to my question:

His article says noting about creating a PK on the @TableVar, which I would hope would improve the Query Planner's Blind Assumption.

I have two Split functions which return a @TableVar with IntPosition and either IntValue or StrValue.

If I make a PK, in my @TableVar, of (Value, IntPosition) - and given that I nearly always join on Value rather than Position - is that going to be as good as some AutoStats? Its a Clustered PK after all ...

Slight snag is that currently I do allow NULL as a Value - but I also have a Parameter to my Split function that removes any "empty" values, and that is the Default, so I could easily have a variant of the Function that defined the Value to be NOT NULL and make that the PK.

The article also raises a question about whether being "clever" and creating the Split Function as an Inline function (i.e. expanded and expressed within the Query) or having something potentially "slicker", but Multi-line and thus computed separately.

The Query Planner may be able to create a better plan with the Inline function, however it has no idea how many rows the Inline Function is going to return, and no statistics either.

Discuss

Kristen

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2007-10-08 : 23:07:46

Just a thought... "Split" functions may suffice for GUI code which is inherently "RBAR" (pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row"). Why would anyone use such a function to split a whole table worth of data using the RBAR associated with functions? (Not acking the real question of "Why would anyone store data that needs to be split in a table to begin with...)

--Jeff Moden

Kristen
Test

22859 Posts

Posted - 2007-10-09 : 04:25:22

We encounter this all the time. Checkboxes on a web page relating to rows with matching Status column to be selected. They only have a few hits, of course, and even the crappy-est algorithm is sufficiently performant.

Bu then we occasionally get longer lists, and there the difference between a good algorithm and a bad one is huge. Indeed, when I started checking this we discovered that our UDF was disastrous for long lists 2 seconds v. 57ms for a better algorithm.

Blow-by-blow timings here: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648&whichpage=2#305425

In the particular example I was up against recently it was:

Display all "dealers" with checkboxes, user selects the ones they want, and the list then gets sent back to SQL to JOIN. Typically it runs to 100's and 1000's of IDs.

The list does not need to be "sticky", so there isn't any suitable server-side version of the data

Kristen

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-09 : 04:29:50

We have a solution for this kind of "checkbox selects".
We have a table in the database which we

1) delete for current user
2) insert options
3) call the normal sp which joins the table mentioned in 1) and 2)

E 12°55'05.25"
N 56°04'39.16"

Kristen
Test

22859 Posts

Posted - 2007-10-09 : 04:43:46

(2) Still got to get "the list" to the DB.

Have you got a smart/fast way of doing that?

We also use this type of solution. However:

The continuous Delete/Insert jacks up the TLog size/backup. So we store that table in a separate "simple R.M." database. That in turn gives us some risk in a disaster recovery (we allow the user choices to be "sticky", which was a bit silly, because if we hadn't done we wouldn't have to back up at all!)

Kristen

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources