Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 How to match data from two different sources?

Author  Topic 

larryg003
Starting Member

18 Posts

Posted - 2010-07-26 : 14:53:09
Hi guys,

I'm new for work and I need to match some data from two different sources. I have a db in sql management studio which has 42,000,000 rows. There are 3 columns labeled: "Article ID, Article Text, Date and ticker"

I have a separate excel file which has 800 samples from the db. I have the ticker, Date and Article Text, how do I run a query which will match everything together and identify the Article ID? There are 20 Tickers in the sample, so there are 40 articles/ticker (each with a different date).

While the Article ID is unique, I need to make sure that the article
text in the excel row matches the one in the database. I don't need an exact match, but a 10 word string for the article text should mean the article is unique.

Thank you for your help guys,

-Larry

X002548
Not Just a Number

15586 Posts

Posted - 2010-07-26 : 15:12:05
save the text as a csv, and load it to a staging table, then do a join

Post the DDl of the table you have, and some sample data from both sources

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

larryg003
Starting Member

18 Posts

Posted - 2010-07-26 : 16:32:52
Hi can you be a little more specific? How do i do that?

Go to Top of Page

naveengopinathasari
Yak Posting Veteran

60 Posts

Posted - 2010-07-27 : 04:32:23
Hi,

For Loading the data from Excel to DB (hope it is sql server).

STEP 1 : Import the data from excel to database.
it will create a table and the load the data to the database. Each excel sheet will be created as a table.

STEP 2 : You can query the excel sheet table with the actual table you have in your db.

for more details please provide the required files and data.

Lets unLearn
Go to Top of Page

larryg003
Starting Member

18 Posts

Posted - 2010-07-27 : 17:44:11
I have the file as a .csv file.

What is the query I need to put in the sql query?

My 43,000,000 row db is called news.dbo
and my .csv excel is called sample.dbo

If I want to match the article text in the sample to the article text in the news.dbo and its corresponding article_ID, what would my query be?

Is it:

select article_ID, Date, Article_Text
from news.dbo
UNION
Date, Article_Text
from sample.dbo
where [Article_Text] like '%BP released quarter dividend of 35 cents%'
or [C] like '%Chevron announced the retirement of CEO David Philips%'
or [C] like '%Chevron is announcing the addition of the XYZ platform, which will oversee%'
or [C] like '%Chevron is announcing a new aggressive pricing strategy%'
...
all 800 rows right?

Eternally grateful to whoever can help me out on this...



Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2010-07-28 : 13:40:17
If you want to "Match" you need to JOIN

Is the CSV file in a table now?

SELECT *
FROM news n
JOIN
sample s
ON n.Article_Text = s.Article_Text

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

larryg003
Starting Member

18 Posts

Posted - 2010-07-29 : 11:44:01
The match function uses exact matches with words/spaces, etc...

The Article_Text may vary depending on the source on which it was released. I need a partial match. I want to choose a string in each sample Article Text and match it with the identical text in the news.dbo
Thank you for all of your help,

-Larry G
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2010-07-29 : 12:08:32
quote:
Originally posted by larryg003

The match function uses exact matches with words/spaces, etc...




OK

quote:

The Article_Text may vary depending on the source on which it was released. I need a partial match.



umm...ok

quote:

I want to choose a string in each sample Article Text and match it with the identical text in the news.dbo



OK..now I'm confused...

What do you want to find a partial match against, and what do you want to be partial????

Then do you want to tkae thos partial hits, then find an exact match against your sample??

Can you describe in business terms what you are trying to do, and why?



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

larryg003
Starting Member

18 Posts

Posted - 2010-07-29 : 12:49:46
Hi X002548,

Thanks for your help.
Go to Top of Page

larryg003
Starting Member

18 Posts

Posted - 2010-07-29 : 13:04:05
What I am trying to do is match the Article Text in the Sample db with the article text in the news.dbo. I have the article ID in the news.dbo and not in the sample.dbo, so I need to figure that out. Unfortunately, the text may be slightly different in the sample.dbo, due to conversion errors. Does the match function work by matching part of the text or the full word-word string?

I wanted to join by strings because I know the full text articles may not be exactly the same (weird characters here and there). That's why I wanted to use strings, to match the stories by words and phrases.

Of course, if the match function does not look for entire matches and will accept identical matches, that will certainly do the trick.

Thank you for all of your help,
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2010-07-29 : 13:11:07
I think you're going to need to do some type of Hashing...I wonder about SOUNDEX..do we have that is SQL Server?

I gotta look it up in BOL

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page
   

- Advertisement -