Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Kimberly
Starting Member
3 Posts |
Posted - 2009-04-07 : 15:42:56
|
How do I determine when a stat, outside of the indexes, might help more than it might hurt? I am trying to go in and figure out what, if any, stats outside of the indexes would actually help. But, I'm struggling a bit with my understanding of the interactions between indexes and stats outside of the actual indexes. Examples of cases I'm trying to figure out:1.) There is a field, RS, that exists in every table; it is defined as an int. This field only holds one of 3 values, typically 90% or more of records in a table will have the value set to 1 – this becomes 100% during nightly maintenance activities, but during the day additional record additions can change that percentage. The field is included in virtually every query, limiting to RS = 1. The other fields in the WHERE and JOIN clauses in a typical query would be using fields from whatever the clustered and/or non-clustered indexes are on that table. SQL complains about missing stats on the RS field. But, with what I know of the actual distribution of the data, I',m guessing that this would be a field that a stat on it will cost more to keep updated than any value it may add, but am not entirely my logic matches with what SQL does do.2.) When a query includes "SomeField IS NOT NULL", SQL typically complains that it has no stats on that field. This is not going to be a field that is in any of the existing indexes on a table. Since the query isn’t looking for a specific value and I know that the vast majority of records have a value there (but for some business rule reasons those without a value are being excluded here), is there any performance/efficiency gain by having stats for such a field?(This is similar to case 1, where the typical query is looking for the majority value case for the field at issue, not the minority case). 3.) A query that did use a clustered index seek, but the query plan also showed missing stats on one of the fields. The field is included in the clustered index. Clustered index on: Table_ID, Field_ID, Record_ID, L_IDQuery: FROM dbo.KeyValueTable WHERE Table_ID = 22 AND Field_ID = @FieldID AND L_ID = @LID AND Record_ID IN (1222, 451) AND RS = 1SQL complained of missing stats on Record_ID. Since the field is in the index already, why does it wants more stats on it?None of the fields included in the index are modified after the record is created. The Table_ID field is the best "grouping" field for the index, there are roughly 500 Table_ID values for the application; this does not vary. Record_IDs are only unique within a Table_ID, the number belonging to a Table_ID can vary from as few as 2 records to many thousands, they may or may not repeat in other Table_IDs. Field_IDs are unique, each belongs to only one Table_ID, and the number of Field_IDs can vary as the application modifies field information. A typical query is looking for multiple Record_IDs that belong to a specified Table_ID, Field_ID, L_ID combination. Rarely the Record_ID is determined in an IN clause or based on a variable (Record_ID = @Var), most often it is part of the INNER JOIN criteria linking this table to some other set of data (Record_ID = OtherTable.SomeFieldName).I'm not really looking for a black & white answer, more for some general guidelines to limit the number of things I try.Other considerations (and yes, I am very aware this is not ideal, but it is what it is) – The DB that is used in ways that are "not normal" for a DB and that has lead to issues under SQL 2005 related to its statistics handling. Turning off auto-create stats and auto-update stats lead to over a 50% performance improvement in the application (this was done in consultation with Microsoft). Since no automatic updating occurs, the application has an intra-day procedure that runs which executes stats updates after a user has made data changes. All tables have clustered indexes on them, a handful of tables have additional non-clustered indexes as well. I am on a very tight timeline (about a week), so really don't have much time to do trial & error approach. I have a single processor machine with 2Gig of RAM and a single hard drive where I can try different approaches. This does not have enough room for more than 1-3 examples of installations. For testing with a larger installation, then its just 1. I have some limited access to a test environment with dual processors, but it presents set-up challenges that take up time I don't have much of. Production is typically quad-processor, lots of RAM, efficient disk arrays, etc. (i.e. looks nothing like where I am testing). |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
Kimberly
Starting Member
3 Posts |
Posted - 2009-04-07 : 16:02:36
|
I completley understand statistics for indexes, its adding stats for fields that are not in indexes that I'm trying to find a feel for.I can't rely on SQL telling me, it is more wrong than right (for this particular database use). |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-04-07 : 16:12:21
|
Statistics are built on columns and indexes in a database and describes the data distribution and selectivity in tables.Statistics are normally updated automatically for each and every JOIN and WHERE that is made against the table.Statistics is stored as a histogram for about 200 evenly distributed datapoints for every table.Statistics exists for permanent and temporary tables and do not exists for table variables. E 12°55'05.63"N 56°04'39.26" |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-04-07 : 16:16:26
|
For the SQL Server Query Engine to decide for an index, the selectivity (or granularity) of the data has to be less than 25% - 30% ot total values. This is because a SEEK is an unordered method to get data from database, and unordered fetches are random and costly.If selectivity is greater than 25% - 30% the SQL Server Query Engine decides to do a SCAN which is ordered and much more efficient for a larger set of data. E 12°55'05.63"N 56°04'39.26" |
 |
|
Kimberly
Starting Member
3 Posts |
Posted - 2009-04-08 : 00:03:33
|
I beleive a seek is a more efficient operation than a scan for a query plan (unless the table is small, then it probably doesn't matter which is used).Thanks for the info on selectivity, scanning blogs for that keyword is helping.Since stats are used to calcuate the of various query operators, estimating rowcounts, that leads me to conclude that adding stats on any column where I know the "typical" query looks at the majority value is not going to be of real use (my examples 1 & 2).But, my third case I'm still unsure of. |
 |
|
|
|
|
|
|