In this article we will create a console app to crawl a
webpage and read data using c#. We can use HttpWebRequest and HttpWebResponse
also to get response from a webpage but I used httpagilitypack to read data in
the example given below. It is much more easier to read the title, header and
other html tags using htmlagility.
1) Create
a console application.
3) Copy
htmlagility pack DLL to folder inside the project and add reference to HtmlAgilityPack.dll
4) Next
create a webclient and pass the URL to read to webclient. To read content in
all languages (like Chinese, Japanese, Indonesian, Russian etc) make sure that
the encoding is UTF8.
Complete code for crawling and reading data from
webpage is given below
class Program
{
static void Main(string[]
args)
{
WebClient
webclient = new WebClient();
HtmlDocument
htmlDoc = new HtmlDocument();
htmlDoc.Load(webclient.OpenRead("http://deebujacob.blogspot.com/2013/03/rendering-multiple-series-in-highcharts.html"),
Encoding.UTF8);
Console.WriteLine(htmlDoc.DocumentNode.SelectSingleNode("//title").InnerText);
Console.WriteLine(htmlDoc.DocumentNode.SelectSingleNode("//body//h1").InnerText);
Console.ReadLine();
}
}
No comments:
Post a Comment