Re: Monitoring a javascript-based web page...

From:

John Ersatznom <j.ersatz@nowhere.invalid>

Newsgroups:

comp.lang.java.programmer

Date:

Mon, 18 Dec 2006 15:53:13 -0500

Message-ID:

<em6v0v$ud7$3@aioe.org>

gfr92y@yahoo.com wrote:

John Ersatznom wrote:

gfr92y@yahoo.com wrote:

I would like to "automatically" check a web page for messages that is
written in javascript and that requires me to sign-in with a username
and password, and email either the messages or a picture of the
messages to my email address.

In my utter ignorance, I would think some type of macro or "robot"
might do this for me.

Can anyone point me in the right direction for such a tool?

If it were me:

* First I'd use a tool like Wireshark to sniff the traffic while logging
in manually.
* Probably I'd find an HTTP GET www-formurl-encoded or HTTP POST, or
maybe an HTTPS transaction if I were really lucky.
* Then I'd figure out how to use HttpURLConnection to make the
connection (SSL if necessary) and send the same form submission from
inside a Java method.
* Then I'd write a method to do so, retrieve the result page, save it or
parse it in some way, and (if need be) send whatever HTTP request logs
out again.
* Sending email likewise: I'd send a test mail to myself at another
server (e.g. from my main to my gmail) while sniffing the traffic and
duplicate the protocol (this time at a low level). It probably consists
of contacting a mail host at your ISP (better make this a replaceable
string, e.g. with a GUI input form or at least a resource bundle) on
port 25 and sending stuff like HELO youraccountname MAIL FROM
youraccountname headers body Control-D or whatever they do nowadays.
* Concoct a method to send the mail, stuffing the body with whatever
data. Image encoding would be a PITA but I could probably cobble that
together too if I had to, and have it generate mail with MIME attachments.
* Googling the protocols involved (likely HTTP or HTTPS and SMTP) for
more information would probably also be in the offing.
* There'd need to be error trapping and recovery, too. Silent failure is
not acceptable as a rule.
* And I'd consider carefully how to make the bot play very nice. For
example, it should retrieve the resource and send one mail once a day or
some such, no more often than a human being doing it manually probably
would. This lowers the chance that someone will detect a bot being used
that has a dislike for people automating their end of something, as well
as that the bot will actually be a genuine problem causing excessive
loads or bandwidth use. Of course, to look like a human doing the task
it has to ignore robots.txt, which is a faux-pas, but I wouldn't
consider it a serious one as long as the bot a) never generates for the
server it hits more traffic than a human browsing the site in Firefox
and b) isn't ripping content in some way, such as an archiver or search
index, that goes into a publicly visible place (e.g. Google or the
Wayback machine). For a bot that logs on somewhere once a day and grabs
a single item for your personal use, these conditions are easily met.
(One way to state the informal rule I came up with is: "If the bot
emulates you or a single assistant doing something by hand, it can
pretend to be a human, as it makes no difference to anyone else anyway.
If it does something only massive automation could ever do, it has to
admit it's a bot.")

Is that the easiest way to go about it???

To go about automating it? Yes. Unless you've got a browser and an email
app that can be puppeted by some scripting language (I think Mac apps
sometimes have such features.) If it's too daunting you can always do it
manually every few hours or whenever. But you'd miss out on learning a
bunch about how network programming works. And neither of the latter
solutions would involve Java. :)