Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Query speed improvement problem

Author  Topic 

cy163
Starting Member

2 Posts

Posted - 2009-08-13 : 10:33:44
Hi All,

The execution time for the following query statement exceeds 9s. Please tell me of where I am wrong

I have 3 tables DocsFeature, CandidateDocs, ExampleDoc.

Generally, the query is used to find similar documents which are relevant to an example document. The example document is represented by a set of feature words listed in table ExampleDoc. Only those documents are SELECTED that share at least one word with the example document and whose DocID appear in CandidateDocs.

1.
DocsFeature contains feature words extracted from a collection of documents.

The create statement for DocsFeature is


CREATE TABLE `DocsFeature` (
`DocID` varchar(8) COLLATE latin1_bin NOT NULL,
`Word` varchar(20) COLLATE latin1_bin NOT NULL,
`Freq` tinyint(3) unsigned NOT NULL DEFAULT '0',
`TitleWord` tinyint(1) NOT NULL,
`CommonWord` tinyint(1) NOT NULL,
`NrNsNtNz` char(2) COLLATE latin1_bin NOT NULL,
KEY `idx_DocID` (`DocID`),
KEY `idx_Word` (`Word`),
KEY `idx_NrNsNtNz_TitleWord` (`NrNsNtNz`,`TitleWord`),
KEY `idx_TitleWord` (`TitleWord`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_bin


sample records are as follows



....
A111 Computer 9 n (Document A111 contains 'Computer',and it occurres 9 times)
C27 Dolphin 5 n (Document C27 contains 'Dolphin',and it occurres 5 times)

D21 Food 2 n (Document D21 contains 'Food',and it occurres 2 times)

.....

2.

CandidateDocs contains a list of DocID, having only one field DocID

It is created as a temp table on the fly when the program is running.

CandidateDocs has index on DocID

3. ExampleDoc table was created using the following statement.



CREATE TABLE `ExampleDoc` (
`Word` varchar(20) DEFAULT NULL,
`Freq` tinyint(3) unsigned NOT NULL DEFAULT '0',
`TitleWord` tinyint(1) DEFAULT NULL,
`CommonWord` tinyint(1) DEFAULT NULL,
`NrNsNtNz` char(2) DEFAULT NULL,
KEY `idx_Word` (`Word`),
KEY `idx_NrNsNtNz` (`NrNsNtNz`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

I do the following query



CREATE TABLE TempTable_1
AS SELECT
DocsFeature.DocID AS DocID, SUM( DocsFeature.Freq+ ExampleDoc.Freq) Score
FROM
DocsFeature, ExampleDoc, CandidateDocs
WHERE
DocsFeature.Word = ExampleDoc.Word
AND CandidateDocs.DocID = DocsFeature.DocID
group by
DocsFeature.DocID
HAVING COUNT(DocID) > 1;

There are 10 million records in DocsFeature, about 90 records in ExampleDoc, and 20 thousand records in CandidateDocs.

In this case, It would takes 9s.

Please help me to improve this statement.
any help will be appreciated very much.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-08-13 : 10:36:30
If you are using MySQL, please ask over at www.dbforums.com
since this is a Microsoft SQL Server forum (with some support for Microsoft Access).



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page
   

- Advertisement -