Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Need to find distinct word count

Author  Topic 

mukherjee12
Starting Member

6 Posts

Posted - 2008-08-19 : 14:04:13
Ok folks - needs some help here. we are planning to localize our site to 8 languages. The vendor is asking for a work count. I need to find a way (statments)/ scripts that will go through all my tables and give me a distinct word count.

I am looking to do this for all our PR Tables, headers tool tips ...basically the entire site.

Any help here is much appreciated!!!!!

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-08-19 : 14:31:49
Not sure I'm following - Are you saying that you want to go through all tables, all (character based) columns, all rows and store each word. Then get a distinct count of those words? Is a "word" defined as the strings which are seperated by a space, tab, linefeed, period, semicolon, questionmark, etc?

Be One with the Optimizer
TG
Go to Top of Page

mukherjee12
Starting Member

6 Posts

Posted - 2008-08-19 : 14:42:32
yes I want to go through all tables, all columns and get a distinct count of those words - basically the end result is we have a site (SaaS) that we are looking to translate - all the words on each page to lanuage x,y,z. The tranlation vendor is asking our word count.
Go to Top of Page

mukherjee12
Starting Member

6 Posts

Posted - 2008-08-19 : 14:43:22
since they charge us per word - they have a memory tool- so they dont charge us for the same words twice over.....
thanks!
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-08-19 : 14:48:03
Are any of the columns of datatype: text, ntext, char(max), nchar(max), varchar(max), nvarchar(max)?
Do you have columns that should be ignored like names, emails, urls, etc?

EDIT:
How big is the database?

Be One with the Optimizer
TG
Go to Top of Page

mukherjee12
Starting Member

6 Posts

Posted - 2008-08-19 : 15:00:23
YES WE HAVE ALL THOSE COLUMNS.
IN TERMS OF SIZE WE HAVE 2 MAIN FILES- BUT THE SIZE WOULD CHANGE ONCE WE EXPORT...WE WANT TO IGNORE NAMES, EMAILS URLS AND ANY DATA THAT WOULD BE ENTERED IN BY THE END USER...(WE HAVE A OOGLE TRANSLATOR FOR THIS)
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2008-08-19 : 15:01:11
Please, dear God, tell me that this vendor is not merely going to do a search and replace on the words?
What does a distinct word count have to do with the difficulty of translation? I think this is a red flag that you should find another vendor, lest you find all your string translated into Engrish http://www.engrish.com/

Boycott Beijing Olympics 2008
Go to Top of Page

mukherjee12
Starting Member

6 Posts

Posted - 2008-08-19 : 15:07:42
No no definitly not....they work with an editor to make sure it all makes sense...but were shopping for a vendor- and they want to know a word count....id rather give them a word count then vice versa
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2008-08-19 : 15:16:30
The tough part will be to split any value into a set of words. You can probably use any one of the many "split functions" that have been posted here. Here is one thread on the subject:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648
I think there are some functions in this thread that split ntext datatypes though I've never used them...

ie: for one table, and one column this would get every "word" as delimited by a space assuming you have a function called fnParseString and a table called [words]:

insert words (words)
select ca.val
from <table> t
cross apply dbo.fnParseString(t.<characterColumn>, ' ') ca


Once you have that working it is just a matter of setting up a couple nested loops with information_schema views. For each table, each character column, generate and exec a dynamic statement like above.

Get started and post back with any questions...have fun :)

Be One with the Optimizer
TG
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2008-08-19 : 15:24:55
A word count makes sense. A distinct word count makes no sense at all.
There is a huge difference between the word count of a short story and the word count of a 400 page novel. There would only be a small difference in distinct word count between the two.

Are you sure they do not need a total word count?

Boycott Beijing Olympics 2008
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2008-08-19 : 15:25:41
Duplicate post.
Go to Top of Page

mukherjee12
Starting Member

6 Posts

Posted - 2008-08-19 : 15:43:09
ill try! thank you!!!!!!!!!!!!! you the man
Go to Top of Page
   

- Advertisement -