Re: extracting urls
On Nov 18, 9:01 am, mnml <rdelsa...@gmail.com> wrote:
Hi, I made a little function to extract urls from any content with a
regular expression but it doesn't really work.
when i try to extract urls fromhttp://google.comi only get 4 results
in my array:
*http://images.google.nl/imghp?oe=ISO-8859-1&hl=nl&tab=wi
* http://
* .nl
* /imghp?oe=ISO-8859-1&hl=nl&tab=wi
Here is the code of my function:
public static void find_url(String content) {
Pattern p = Pattern.compile("(@)?(http://)?[a-zA-Z_0-9\\-]+(\\.\\w[a-
zA-Z_0-9\\-]+)+(/[#&\\n\\-=?\\+\\%/\\.\\w]+)?");
Matcher m = p.matcher(content);
if (m.find())
{
for (int i=0; i<=m.groupCount(); i++) {
myVar.urls[i] = m.group(i);
}
}
}
Don't clutter the forum with your multi posts, please!
Your regex code is very wrong. Study this code and go to bed. I didn't
touch your weird regex string but I firmly believe it is also wrong
for your desired purpose which I don't know in its details.
----------------------------------------------
import java.net.*;
import java.util.regex.*;
import java.io.*;
import java.util.*;
public class Mnm{
public static void main(String[] args) throws Exception{
String contStr = "";
String line = null;
Locale.setDefault(Locale.US);
// String urlStr = "http://google.com";
String urlStr = "http://www.google.com/ig?hl=en";
if (args.length > 0){
urlStr = args[0];
}
URL url = new URL(urlStr);
InputStream is = url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null){
contStr += line;
}
findUrl(contStr);
}
public static void findUrl(String content) {
int gc, counter, gcounter;
gc = counter = gcounter = 0;
Pattern p = Pattern.compile
("(@)?(http://)?[a-zA-Z_0-9\\-]+(\\.\\w[a-zA-Z_0-9\\-]+)+(/[#&\\n\\-=?\
\+\\%/\\.\\w]+)?");
Matcher m = p.matcher(content);
gc = m.groupCount();
for (int i = 0; i <= gc; ++i){
System.out.println("GROUP" + i + " : ");
while (m.find()){
++counter;
++gcounter;
System.out.println(gcounter + ".> " + m.group(i));
}
m.reset(content); // for next group
gcounter = 0;
}
if (counter == 0){
System.out.println("--no match--");
}
}
}
----------------------------------------