Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Help with query and index

Author  Topic 

hummer123
Starting Member

5 Posts

Posted - 2013-04-02 : 02:17:17
Hi,

I am trying to understand a issue we have with a large database and a query that we need to perform. I am not a database expert as such but know my way around a little ;-)

Server. 24 core CPU with 64GB memory and SAS drives.

The database has 2 tables.

Table 1 – domains_com contains a list of domains with approx. 160 million rows

Columns:

Domainid
Domainame
Locked
Ignored

This table already has an index and a query on any column is instant.

Table 2

This table contains multiple results of nameserver and whois data for the domains.

Columns

[domainid]
[registrar]
[whois_server]
[registrar_homepage]
[date_created]
[date_expiring]
[date_updated]
[nameservers]
[domain_status]
[domain_ip]
[temp_whois_result]
[last_checked]
[updated]
[app_name]
[crawl_ip]
[info_registrant]
[info_admin]
[info_tech]
[info_billing]

The problem I am having is with a queries where I need to retrieve the temp_whois_result column like below. This column can be quite large up 3k characters.

I have tried creating a non clustered index on results_com as follows and already have a index on domains_com.

Index Key Columns: domainid,domainame
Included columns: Last_Checked,App_Name,info_registrant, whois_server

declare @hours int set @hours = 3
SELECT TOP 200 domainame,Last_Checked,App_Name,info_registrant, whois_server,temp_whois_result
FROM domains_com_joined_results_com WHERE last_checked >= dateadd(hour, -1*@hours, getdate()) order by last_checked desc

When I use “Display estimated execution plan” for this query it tells me I should include the column temp_whois_result . The problem if I include this column is the index grows to over 80gb which I think is far too large.

If I don’t include the temp_whois_result in the index, then the query is instant.

The thing I don’t understand is that the where condition is on the “last_checked” column, so I thought that the query would find the records in the index with a pointer to the records in the table where the temp_whois_result column would be retrieved.

I have read quite a few articles on indexes, clustered v non clustered and am still not sure where I am going wrong.

Any help appreciated.

Thanks

Scott

chadmat
The Chadinator

1974 Posts

Posted - 2013-04-02 : 03:27:31
If you don't include everything in the select list, then a seek on a NC index has to do a Bookmark Lookup to the clustered index (Which can be expensive if there are many results). This is why you are getting a suggestion to include the temp_whois_result column.

-Chad
Go to Top of Page

hummer123
Starting Member

5 Posts

Posted - 2013-04-02 : 04:12:36
Thanks, I think I understand this better now, I found this article which helped explain it well..

http://connectsql.blogspot.com/2011/01/sql-server-bookmark-lookups-query.html

So, is there anyway around this or better alternative ?
thanks

Scott
Go to Top of Page

MIK_2008
Master Smack Fu Yak Hacker

1054 Posts

Posted - 2013-04-02 : 05:40:21
I don't see a reason for index implementation. What you said is that "The problem I am having is with a queries where I need to retrieve the temp_whois_result column like below. This column can be quite large up 3k characters." Is it that the column being quite large upto 3k? I dont think its an issue, data can be in MBs and GBs ..
Is this query performing bad? I mean takes longer time to execute and bring back the desired data? If so .. is there any index implemented on last_checked column?

Cheers
MIK
Go to Top of Page

hummer123
Starting Member

5 Posts

Posted - 2013-04-02 : 06:51:38
quote:
Originally posted by MIK_2008



It maybe my lack of understanding of the SQL indexing. I will never perform a select query on the temp_whois_result, I just need to view the data. I was presuming that as the select is on the last_checked column then that is all I needed.

The actual size of the index is not really an issue, the problem is that the results table gets 250k inserts per day so the indexes get fragmented very quickly so when I do an index rebuild it takes much longer.

The index I have is below:

Index Key Columns: domainid,domainame
Included columns: Last_Checked,App_Name,info_registrant, whois_server

Thanks

Scott



Go to Top of Page

MIK_2008
Master Smack Fu Yak Hacker

1054 Posts

Posted - 2013-04-02 : 08:11:25
Well my original question is still unanswered

Secondly, I don't think the specified index would be utlized by the query

SELECT TOP 200 domainame,Last_Checked,App_Name,info_registrant, whois_server,temp_whois_result
FROM domains_com_joined_results_com WHERE last_checked >= dateadd(hour, -1*@hours, getdate()) order by last_checked desc

Even if you include temp_whois_result column in the index.

And yes if you add/include more columns in an index it would have direct impact on the DML transactions.

Lastly, check for indexes that are not in-use, remove them at all in order to avoid unnecessary maintanance ... http://www.sqlservercentral.com/blogs/basits-sql-server-tips/2012/06/15/find-unused-indexes-using-sysdm_db_index_usage_stats/

Cheers
MIK
Go to Top of Page
   

- Advertisement -