LINGUISTIC query - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Other Forums

MS Access

LINGUISTIC query

Author

Topic

ACCESS-SQL_MARIO
Starting Member

18 Posts

Posted - 2004-11-01 : 12:59:56

Hi!
I need a query to this column in ACCESS database

KILLER
KILLED
BOY
BOYS
CAR
RECALL
CALL
etc.

I want
KILLER
KILLED
RECALL
CALL
etc.

simply all records that have a common string of at least 4 characters

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-11-01 : 13:36:33

will this do?
select *
from Table1
where ColName like '%kill%' or ColName like '%call%'

Go with the flow & have fun! Else fight the flow

ACCESS-SQL_MARIO
Starting Member

18 Posts

Posted - 2004-11-01 : 13:42:13

I need not only
*CALL*
*KILL*

but *any-duplicated-atleast4character-strings* :-)

I was told sql/access cannot do that (?)

anyway thanks for a sprint reply

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-11-01 : 13:46:02

so you want to get them all without knowing the 4 char word you want??

Go with the flow & have fun! Else fight the flow

ACCESS-SQL_MARIO
Starting Member

18 Posts

Posted - 2004-11-01 : 13:47:08

yes!

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-11-01 : 13:49:24

well i guess it can be done but that is highly resource hungry...
you'd need to put all 4 letter words in a temp table then go row by row
looking for the words from a temp table...
way messy...
why do you want to do that?

Go with the flow & have fun! Else fight the flow

ACCESS-SQL_MARIO
Starting Member

18 Posts

Posted - 2004-11-01 : 13:57:29

It's a long story:

My company has a database with two colmuns

1'English word' 2'Polish translation'
1 X 2 Y,Z
1 Q 2 R,Y

We are looking for the Y string in the 'Polish translation' column

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-11-01 : 14:01:25

that's bad db design man.... sorry for you...

so what do you want to do then?

Go with the flow & have fun! Else fight the flow

ACCESS-SQL_MARIO
Starting Member

18 Posts

Posted - 2004-11-01 : 14:06:57

Maybe redesign the DB but my boss would go mad about it because as you said it would be time-consuming

I quess the DB should look like
1'English word' 2'Polish translation-a' 3'Polish translation-b'
1 X 2 Y 3 Z
1 Q 2 R 3 Y

then I would simply look for duplicates

but it would still be difficult because the Polish languge has a lot of flection suffixes (Slavic languages have this problem as you know :-))

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-11-01 : 14:14:02

yeah i know.
your best bet would be this:
tables:
English (Eng_Id, wordName)
Polish (Pol_Id, wordName)
EnglistToPolish(Eng_Id, Pol_Id)

then you simply link the correct words in the intermediate table EnglistToPolish.
i don't think that redesign would be that lengthy.
i guess the pros would be way up than cons
searches way faster, normalized data...
redesign man.

Go with the flow & have fun! Else fight the flow

ACCESS-SQL_MARIO
Starting Member

18 Posts

Posted - 2004-11-01 : 14:26:58

Dislaimer: I am a Newbie to sql/access

I cannot see the reason for creating
EnglistToPolish(Eng_Id, Pol_Id)

I just want to search in only one column for duplicated strings of at least 4 characters!!! God help me! I want my Atari back!!!

PS Somebody replied to my request like this:
You'll have to use the InStr() function.
It will have to be worked a lot though to read every 4 characters of a word.

Ex.: if you have the words Recall and Caller, it will have to look for these:
Is RECA in Caller? No!
Is ECAL in Caller? No!
Is CALL in Caller? Yes!

It sounds simple, but try it with the words "Information" and "Confrontation"...

So, you'll have to learned about "Recursive" and "InStr()".
Good luck!

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-11-01 : 14:43:22

this will be incredibly slow, but here's one way to do it. It requires a table of numbers, from 1 - 100 or so. (whatever your longest word is). Create this table in your database, call it "Numbers", give it 1 column, call it "N", make it an int. store values from 1-100 in it.

ONce you have that numbers table, you can write a query to get all 4-letter combos, and put them into a newly created temp table (this may take a while to run):

select distinct mid([Word],N,4) as Segment
into Segments
from YourTable, Numbers
where Numbers.N <= Len([Word]) - 3

Once you have that, you can join that to your words table like this:

select Segments.Segment, YourTable.Word
from YourTable, Segments
where YourTable.Word LIKE "*" & Segments.Segment & "*"

Now you have a list of all 4-letter segments, and all of the words that match them. Does this help at all?

- Jeff

ACCESS-SQL_MARIO
Starting Member

18 Posts

Posted - 2004-11-01 : 14:51:39

OK Jeff
I will try to do it now (it's a long night :-))

but for now read what I was advised elsewhere:
Unfortunately, this is one of "those" problems that even a 3.2 GHz machine won't do fast, because it is a sliding-window-subset-matching problem, in which both the object string AND THE SUBJECT string have sliding windows.

If I were to even ATTEMPT this problem for someone, I would advise them that the problem is in the category of "computable, N! * L" - i.e. computable but expensive. 'cause first, it is a factorial combination of every table member against every other table member. (That's the N! part.) And because you can match internal or trailing string elements, you can't even make it (N-1)! by pre-sorting the strings. Second, it is further proportional to the average-length (of the strings) because that governs the number of subsets you can have.

Without knowing the full rules, the only way I know to do this is to brute-force it. Since letters in your sub-strings can be duplicated (e.g. KILL duplicates 'L'), I can't even get subtle by using some form of masking generator to show the letters in the string as a pre-filter exercise.

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-11-01 : 14:53:41

quote:
I cannot see the reason for creating
EnglistToPolish(Eng_Id, Pol_Id)

the reson is normalization

say you have an english word:
blabla
the polish language has two meanings for that word:
blablaPL1 and blablaPL2
so in your table you have


EnglistToPolish
Eng_Id Pol_Id
   1      1
   1      2

therefore joins are much simpler. no logic is needed for parsing csv strings...

any clearer??

Go with the flow & have fun! Else fight the flow

ACCESS-SQL_MARIO
Starting Member

18 Posts

Posted - 2004-11-01 : 14:59:32

But spirit look:

the Eng_Ids will not be the same (X has a different id from Q)
1'English word' 2'Polish translations'
1 X 2 Y, Z
1 Q 2 R, Y

Disclaimer: I may be mistaken

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-11-01 : 15:08:02

[code]
English
Eng_Id wordName
1 X
2 Q

Polish
Pol_Id wordName
1 R
2 Y
3 Z

EnglistToPolish
Eng_Id Pol_Id
1 2
1 3
2 1
2 2
[/code]

any clearer?

Go with the flow & have fun! Else fight the flow

ACCESS-SQL_MARIO
Starting Member

18 Posts

Posted - 2004-11-01 : 15:18:49

I'm beginnig to get the point
I will check both ways (spirit's and Jeff's) on the 10000 record of vocabulary DB at my office tommorow. I will be back with a report then.

Cheers!
Grateful MARIO from Warsaw

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-11-01 : 15:37:09

can't wait...

Go with the flow & have fun! Else fight the flow

ACCESS-SQL_MARIO
Starting Member

18 Posts

Posted - 2004-11-02 : 01:39:56

JEFF, it works!. The DB must be cut into several smaller DBs. But it is exactly what I need.

SPIRIT, Your way is applicable for redesigning of the DB. And I'd better implement it pretty soon.

MARIO

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources