Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
mikebird
Aged Yak Warrior
529 Posts |
Posted - 2013-07-17 : 10:41:10
|
How do I extract data from a public website? With a browser, or is there a different tool?Which type of file do I need to generate to use with SSIS?I've done similar with HTML emails, because I get mountains of 'emI've been asked to do this. As any regular boss would want to do something, with no concept of waht's possible... wanting a 'can do' attitude...I tried viewing my home page as raw source or save as... HTMLI have 124 contactsThey want an email marketing camapaign. They say 140k or 150k contacts in LinkedIn. Where did they get those figures??Overall, I think of my own contacts and can see some names in the HTML whis is a bit messy and needs some automated de-tagging of the text, which I've tested. Not sure what a marketing campaigner would do - their own private contact list... I think what you get from a website is very limited. I think they need to pay for a signed agreement with LinkedIn, as they have value in their database and will only reveal specific details. |
|
mfemenel
Professor Frink
1421 Posts |
Posted - 2013-07-17 : 15:58:57
|
I'm not sure what format you get from Linked in, but here is the Main() body of a script task that will grab an xml file from an HTTP request and then I have a data flow with an xml source to rip the xml. public void Main() { //For each loop through the ForLoop container, download a file. try { // Logging start of download bool fireAgain = true; //Append file_min value to url to get file. Filemin is adjusted with each loop through the forloop container. string getfile = "http://www.sqlsaturday.com/eventxml.aspx?sat=" + Dts.Variables["User::file_min"].Value.ToString(); //Set the url of the HTTP connection manager to getFile(above) Dts.Connections["mySSISConnection"].ConnectionString = getfile; //Output to the execution status what is happening. Dts.Events.FireInformation(0, "Download File", "Start downloading " + Dts.Connections["mySSISConnection"].ConnectionString, string.Empty, 0, ref fireAgain); // Get your newly added HTTP Connection Manager Object mySSISConnection = Dts.Connections["mySSISConnection"].AcquireConnection(null); // Create a new connection HttpClientConnection myConnection = new HttpClientConnection(mySSISConnection); // Download file and use the Flat File Connectionstring (D:\SourceFiles\Products.csv) // to save the file (and replace the existing file) //Set the file name and location. string filename = "c:\\eventxml_" + Dts.Variables["User::file_min"].Value.ToString(); Dts.Connections["eventxml"].ConnectionString = filename; //Execute the DownloadFile method and grabe the file. myConnection.DownloadFile(Dts.Connections["eventxml"].ConnectionString, true); Dts.Variables["User::xml_file_name"].Value = filename; // Logging end of download Dts.Events.FireInformation(0, "Download File", "Finished downloading " + Dts.Connections["eventxml"].ConnectionString, string.Empty, 0, ref fireAgain); // Quit Script Task succesful Dts.TaskResult = (int)ScriptResults.Success; } catch (Exception ex) { // Logging why download failed Dts.Events.FireError(0, "Download File", "Download failed: " + ex.Message, string.Empty, 0); // Quit Script Task unsuccesful Dts.TaskResult = (int)ScriptResults.Failure; } }Mike"oh, that monkey is going to pay" |
|
|
|
|
|
|
|