Re: fetch content from google results
On 28-9-2008 9:56, prabesh shrestha wrote:
I need to fetch the url and little description that google provides
when we search something.I found a way to fetch the content form the
websites but that didn't worked with google search.I am initiation the
project conceptual search.
Are you using a HttpURLConnection to perform the search?
When connecting to Google (or any other server), Java's implementation
of HttpURLConnection identifies itself by default with "Java/1.6.0_07"
as User-Agent request header (or similar, depending on which version of
Java is installed).
Google checks for the User-Agent request header and rejects requests
issued by unsupported browsers/user-agents, including "Java/1.6.0_07".
However, if you set the User-Agent request header of the
HttpURLConnection to a value used by a modern browser (e.g. Internet
Explorer, Firefox or Safari), you should be able to obtain the results
of the Google search.
Example program:
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;
public class GoogleSearch {
// User Agent value of Internet Explorer 7 on Windows XP
public final static String UA_IE7 =
"Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)";
public static void main(String[] args) throws Exception {
// Create search URL
URL searchURL =
new URL("http://www.google.com/search?hl=en&q=Foo+Bar");
// Open connection
HttpURLConnection httpConnection =
(HttpURLConnection) searchURL.openConnection();
// Set User-Agent request header
httpConnection.setRequestProperty("User-Agent", UA_IE7);
// HTTP response code (200 means success)
System.out.println(httpConnection.getResponseCode());
// Open input stream on the search result page
InputStream searchResultStream =
httpConnection.getInputStream();
// TODO: process search result stream
}
}
--
Regards,
Roland