Re: Simplest way to download a web page and print the content to
stdout with boost
"Francesco S. Carta" <entul...@gmail.com> wrote:
gervaz <ger...@gmail.com> wrote:
On Jun 13, 1:42 pm, "Francesco S. Carta" <entul...@gmail.com> wrote:
gervaz <ger...@gmail.com> wrote:
Hi all,
can you provide me the easiest way to download a web page (e.g.http=
://www.nytimes.com) and print the output to stdout using the boost
library?
Thanks,
Mattia
Yes, we can :-)
Sorry, but you should try to find the way by yourself first - that's
not hard, split the problem and ask Google, find pointers and follow
them, try to write some code and compile it. If you don't succeed you
can post here your attempts and someone will eventually point out the
mistakes.
--
FSChttp://userscripts.org/scripts/show/59948
Ok, nice advice :P
Here what I've done (adapted from what I've found reading the doc and
googling):
#include <iostream>
#include <boost/asio.hpp>
int main()
{
boost::asio::io_service io_service ;
boost::asio::ip::tcp::resolver resolver(io_service) ;
boost::asio::ip::tcp::resolver::query query("www.nytimes.com",
"http");
boost::asio::ip::tcp::resolver::iterator iter =
resolver.resolve(query);
boost::asio::ip::tcp::resolver::iterator end;
boost::asio::ip::tcp::endpoint endpoint;
while (iter != end)
{
endpoint = *iter++;
std::cout << endpoint << std::endl;
}
boost::asio::ip::tcp::socket socket(io_service);
socket.connect(endpoint);
boost::asio::streambuf request;
std::ostream request_stream(&request);
request_stream << "GET / HTTP/1.0\r\n";
request_stream << "Host: localhost \r\n";
request_stream << "Accept: */*\r\n";
request_stream << "Connection: close\r\n\r\n";
boost::asio::write(socket, request);
boost::asio::streambuf response;
boost::asio::read_until(socket, response, "\r\n\r\n");
std::cout << &response << std::endl;
return 0;
}
But I'm not able to retrieve the entire web content.
Other questions:
- the while loop seems like an iterator loop, but what
boost::asio::ip::tcp::resolver::iterator end stands for? Is a zero
value?
Whatever the value, in the framework of STL iterators the "end" one is
simply something used to match the end of the container / stream /
whatever so that you know there isn't more data / objects to get. You
shouldn't worry about its actual value - I ignore the details too,
maybe there is something wrong with your program and I'll have a look,
but I'm pressed and I wanted to drop in my 2 cents.
- to see the output I had to use &response, why?
That's not good to pass the address of a container to an ostream
unless you're sure its actual representation matches that of a null-
terminated c-style string. In this case I suppose you have to convert
that buffer to something else, in order to print its data.
There is also the chance that you have to
- call "read_until" to fill the buffer
- pick out the data from the buffer (eventually flushing / emptying
it)
multiple times, until there is no more data to fill it.
Hope that helps you refining your shot.
I've played with your program a bit. Up to the line:
request_stream << "GET / HTTP/1.0\r\n";
should be all fine.
In particular, the loop that checks for the end of the endpoint list
is fine because, as it seems, those iterators get automatically set to
mean "end" if you don't assign them to anything - it works differently
from, say, a std::list, where you have to explicitly refer to the
end() method of a list instantiation.
The first problem with your code is where you send the server the
"Host" header. You should replace "localhost" with the domain name you
want to read from - in this case:
request_stream << "Host: www.nytimes.com\r\n";
Then we have the (missing) loop to retrieve the data.
The function "read_until" that you are calling will throw when the
socket has no more data to read, and consider also that all overloads
of that function return a size_t with the amount of bytes that it has
transferred to the buffer.
Seems like you have to intercept the throw, in order to know when to
stop calling it. Another option is to use the "read_until" overload
that doesn't throw (it takes an error_code argument, instead) and
maybe check if the returned size_t is not null - then you would break
the loop.
So far we're just filling the buffer. For printing it out you have to
build an std::istream out of it and get the data out through the
istream.
Try to read_until "\r\n", not _until "\r\n\r\n", then getline on the
istream to a string.
If you want I'll post my (working?) code, but since I've learned a lot
by digging my way, I think you can take advantage of doing the same.
Have good coding and feel free to ask further details if you want -
heck, reading boost's template declarations is not very good time...
(don't exclude the fact that I could have said something wrong, it's
something new for me too, I hope to be corrected by more experienced
users out there, in such case)
--
FSC
http://userscripts.org/scripts/show/59948