Hi All,The execution time for the following query statement exceeds 9s. Please tell me of where I am wrongI have 3 tables DocsFeature, CandidateDocs, ExampleDoc. Generally, the query is used to find similar documents which are relevant to an example document. The example document is represented by a set of feature words listed in table ExampleDoc. Only those documents are SELECTED that share at least one word with the example document and whose DocID appear in CandidateDocs.1.DocsFeature contains feature words extracted from a collection of documents. The create statement for DocsFeature isCREATE TABLE `DocsFeature` ( `DocID` varchar(8) COLLATE latin1_bin NOT NULL, `Word` varchar(20) COLLATE latin1_bin NOT NULL, `Freq` tinyint(3) unsigned NOT NULL DEFAULT '0', `TitleWord` tinyint(1) NOT NULL, `CommonWord` tinyint(1) NOT NULL, `NrNsNtNz` char(2) COLLATE latin1_bin NOT NULL, KEY `idx_DocID` (`DocID`), KEY `idx_Word` (`Word`), KEY `idx_NrNsNtNz_TitleWord` (`NrNsNtNz`,`TitleWord`), KEY `idx_TitleWord` (`TitleWord`)) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_bin
sample records are as follows....A111 Computer 9 n (Document A111 contains 'Computer',and it occurres 9 times)C27 Dolphin 5 n (Document C27 contains 'Dolphin',and it occurres 5 times)D21 Food 2 n (Document D21 contains 'Food',and it occurres 2 times).....
2.CandidateDocs contains a list of DocID, having only one field DocIDIt is created as a temp table on the fly when the program is running.CandidateDocs has index on DocID3. ExampleDoc table was created using the following statement.CREATE TABLE `ExampleDoc` ( `Word` varchar(20) DEFAULT NULL, `Freq` tinyint(3) unsigned NOT NULL DEFAULT '0', `TitleWord` tinyint(1) DEFAULT NULL, `CommonWord` tinyint(1) DEFAULT NULL, `NrNsNtNz` char(2) DEFAULT NULL, KEY `idx_Word` (`Word`), KEY `idx_NrNsNtNz` (`NrNsNtNz`)) ENGINE=MyISAM DEFAULT CHARSET=latin1
I do the following queryCREATE TABLE TempTable_1 AS SELECT DocsFeature.DocID AS DocID, SUM( DocsFeature.Freq+ ExampleDoc.Freq) Score FROM DocsFeature, ExampleDoc, CandidateDocs WHERE DocsFeature.Word = ExampleDoc.Word AND CandidateDocs.DocID = DocsFeature.DocID group by DocsFeature.DocID HAVING COUNT(DocID) > 1;
There are 10 million records in DocsFeature, about 90 records in ExampleDoc, and 20 thousand records in CandidateDocs.In this case, It would takes 9s.Please help me to improve this statement.any help will be appreciated very much.