2017年1月14日 星期六

[WPF] 抓取網頁資料

Step 1: 工具 > NuGet封裝管理員 > 管理方案的NuGet套件

Step 2: 瀏覽 > 搜尋HtmlAgilityPack > 安裝HtmlAgilityPack

*xpath是使用HAPXPathFinder v0.9軟體來尋找出來的,範例檔有提供

Step 3: Coding
using HtmlAgilityPack;
using System.Net;
using System.IO;

string url = "http://www.just-the-word.com/main.pl?word=hello";
string xpath = "/html[1]/body[1]/div[1]/div[3]/div[1]";  //xpath是使用HAPXPathFinder v0.9軟體來尋找出來的

HttpWebRequest httpWebRequest = WebRequest.Create(url) as HttpWebRequest;
try
{
    HttpWebResponse httpWebResponse = httpWebRequest.GetResponse() as HttpWebResponse;

     Stream stream = httpWebResponse.GetResponseStream();
     StreamReader reader = new StreamReader(stream, Encoding.UTF8);
     string s = reader.ReadToEnd();
     reader.Close();
     stream.Close();
     httpWebResponse.Close();

     HtmlDocument htmlDoc = new HtmlDocument();

     htmlDoc.LoadHtml(s);
     //16GB 32GB 64GB 的運送時間XPATH
     HtmlNode anchors = htmlDoc.DocumentNode.SelectSingleNode(xpath);
     //HtmlNode anchors32 = htmlDoc.DocumentNode.SelectSingleNode("/html/body/div[2]/div[3]/div/div[2]/div[2]/div[3]/ul/li[2]/label/span/span[3]/span");
     //HtmlNode anchors64 = htmlDoc.DocumentNode.SelectSingleNode("/html/body/div[2]/div[3]/div/div[2]/div[2]/div[3]/ul/li[3]/label/span/span[3]/span");

     //output
     //set RichTextBox
     richTextBox.Document.Blocks.Add(new Paragraph(new Run(anchors.InnerText)));
     //get RichTextBox
     //string richText = new TextRange(richTextBox.Document.ContentStart, richTextBox.Document.ContentEnd).Text;
}
catch (WebException web)
{
     //error message
}

完成

參考資料:
http://www.just-the-word.com/main.pl?word=walk&mode=combinations#N N*

範例檔下載

沒有留言: