With the simple HTML GET method implemented with multi-threading in the WAPT, It’s time to compare HTML parsers.
What I need is a parser that handles simple things, such as HTML GET and submitting forms. A few searches lead me to two options.
The first one is HtmlUnit:
HtmlUnit is a “GUI-Less browser for Java programs”. It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc… just like you do in your “normal” browser.
The second one is JSoup:
jsoupis a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
I’m leaning more into using JSoup, as it appears to be based on searching for and manipulating CSS classes/ids, so the learning curve isn’t quite as steep as other Parsers.
My main issue under consideration at this point is thus:
I’m building a generic web application tester, I need a parser I can insert variables into, so I’ll need a form of some sort to let the user paste HTML or something. The tool then would hunt down that form, parsing out the HTML to get the parameters too (conveniently!), then login.
Hoping to have a fully functioning prototype by the end of this coming week (12 Sept)
Testing JSoup was unsuccessful – while it can login 10 times into wordpress (example), it requires domain knowlege of:
- The username parameter
- the password parameter
- the html structure of a succesful login
- the html structure of an unsuccessful login
As a result of these factors I will look for other frameworks for testing.