A GUI-less HTML web-browser – comparing leading Java HTML parsers

With the simple HTML GET method implemented with multi-threading in the WAPT, It’s time to compare HTML parsers.

What I need is a parser that handles simple things, such as HTML GET and submitting forms. A few searches lead me to two options.

The first one is HtmlUnit:

HtmlUnit is a “GUI-Less browser for Java programs”. It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc… just like you do in your “normal” browser.

The second one is JSoup:

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

I’m leaning more into using JSoup, as it appears to be based on searching for and manipulating CSS classes/ids, so the learning curve isn’t quite as steep as other Parsers.

My main issue under consideration at this point is thus:

I’m building a generic web application tester, I need a parser I can insert variables into, so I’ll need a form of some sort to let the user paste HTML or something. The tool then would hunt down that form, parsing out the HTML to get the parameters too (conveniently!), then login.

Hoping to have a fully functioning prototype by the end of this coming week (12 Sept)

An update:

Testing JSoup was unsuccessful – while it can login 10 times into wordpress (example), it requires domain knowlege of:

  1. The username parameter
  2. the password parameter
  3. the html structure of a succesful login
  4. the html structure of an unsuccessful login

As a result of these factors I will look for other frameworks for testing.


A platform for load/performance testing

Some things to consider:
After reading a blog post on the costs/benefits of desktop applications versus web applications, I’m feeling rather inclined to consider whether this application can be web-based, at least partially.

It would make rendering of results extremely easy, fluid and dynamic through the inclusion of Javascript libraries such as D3 and dimple.js.

UX and Design becomes easier through Bootstrap, and the application becomes accessible by all devices, not just those running Windows/.NET, or Java.

The key problem I envision with this design choice is hosting/distribution. If it is hosted, service providers may become a little bit nervous about the distribution of what can become a  DDoSing tool on their architecture.

  • Can always restrict access through a login system, and have restrictions in place to avoid abuse of the service, however.
  • Alternatively, just distribute the source code and give instructions on how to run your own implementation

There’s also the problem of benchmarking performance in-cloud, as highlighted by Mukherjee et al in their paper “Performance Testing Web Applications on the Cloud”. Basically, since you’re never guaranteed a reproducible testing platform (constantly changing network architecture and load), providing consistent results becomes annoying.

See: Mukherjee, J., Wang, M., & Krishnamurthy, D. (2014, March 31 2014-April 4 2014). Performance Testing Web Applications on the Cloud. Paper presented at the 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

Using JMeter & Parameters 2: Logins over SSL

This turned out to be much simpler than I imagined. Once you’ve worked out how to use parameters (see last post), its as simple as creating two parameters: one for ‘username’ and one for ‘password’, and pointing JMeter to the login form’s POST URI.

For this particular test I used the ASX “MyASX” portal, due to it having a simple RESTful API and still using SSL.

A bit of digging at the home page with Chrome Developer Tools uncovers the RESTful URI: “https://myasx.asx.com.au/home/login”. Pointing JMeter to this URI, and setting parameters for username & password gives us a successful login.

Easy enough for someone with four years experience in web development, not so much for a small business owner trying to find bottlenecks in their website.

Using JMeter & Parameters

Users are able to use JMeter for advanced load testing through the use of parameters. This is however not a remotely user friendly experience for the user. JMeter requires the user to inspect the header of the HTTP GET request for parameters to be sent.

  1. To begin, our user must navigate to the page containing the form to be tested, and open up either Firebug in Firefox or Dev tools within Chrome.
  2. Once there, the user enters the required search term, or login details, and clicks  the application’s Okay/Login/Search command button
  3. Once the request has gone through and the new page has loaded, the user must search through the “network” tab for the specific Document request.
    1. In my test case, I searched “test” in Google, and had to find the Document request titled “search?q=test&oq=test&aqs=chrome..69i57j0l2j69i65l2j0.703j0j8&sourceid=chrome&es_sm=93&ie=UTF-8”
  4. Once the correct document is open, the user is then able to inspect their query string parameters
    1. The only really required parameter for Google is “q”, so I only tested using that
  5. The user then enters these query string parameter names into JMeter under the parameter tab of the webpage they’re testing, and JMeter will do the rest.
  6. The user of JMeter of course has no actual idea if the task was performed correctly, unless they open “Response data” and notice that the Google homepage was not returned, but rather their search term.

In a later blog we will attempt to test logins over SSL on a simple web application.

httperf – Research/Usage

httperf in comparison to JMeter, is much more difficult to set-up (for inexperienced users).

While running the application can be installed using:

apt-get install httperf

and then run using

httperf –server=www.google.com –rate=10 –num-conns=50

the options required to run httperf aren’t obvious without guidance, and require reading through the httperf man pages for meaningful testing

Output of the above command is:

smoke@Smokki-JLT:/usr/local/bin$ httperf –server=www.google.com –rate=10 –num-conns=50
httperf –client=0/1 –server=www.google.com –port=80 –uri=/ –rate=10 –send-buffer=4096 –recv-buffer=16384 –num-conns=50 –num-calls=1
httperf: warning: open file limit > FD_SETSIZE; limiting max. # of open files to FD_SETSIZE
Maximum connect burst length: 1

Total: connections 50 requests 50 replies 50 test-duration 5.005 s

Connection rate: 10.0 conn/s (100.1 ms/conn, <=4 concurrent connections)
Connection time [ms]: min 88.3 avg 121.9 max 624.1 median 106.5 stddev 76.7
Connection time [ms]: connect 56.2
Connection length [replies/conn]: 1.000

Request rate: 10.0 req/s (100.1 ms/req)
Request size [B]: 67.0

Reply rate [replies/s]: min 9.8 avg 9.8 max 9.8 stddev 0.0 (1 samples)
Reply time [ms]: response 65.7 transfer 0.0
Reply size [B]: header 266.0 content 261.0 footer 0.0 (total 527.0)
Reply status: 1xx=0 2xx=0 3xx=50 4xx=0 5xx=0

CPU time [s]: user 1.53 system 3.47 (user 30.5% system 69.3% total 99.7%)
Net I/O: 5.8 KB/s (0.0*10^6 bps)

Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

Ignoring the errors, the application provides rather useful output:

Connection time [ms]: min 88.3 avg 121.9 max 624.1 median 106.5 stddev 76.7
Reply rate [replies/s]: min 9.8 avg 9.8 max 9.8 stddev 0.0 (1 samples)

In contrast to JMeter, no options are provided for creating graphs, which is to be expected from a basic command-line application.
httperf could be expanded by wrapping a Java/C# application around it to allow for iterative testing, test plan setup and graphing output options using CSV output.

HTTPS/SSL is supported, however requires quite a bit of tinkering:

The code first needs to be compiled with ssl support enabled, with ssl of some variety being installed on the system building httperf.
Ssl is then enabled using the -ssl flag. There is no guarantee that ssl will then work, however one can verify if httperf is using ssl using the following command in bash:

netstat -an | grep 443 | wc -l

JMeter – Research/usage

Using JMeter:
Trying to measure HTTPS Requests to google.com (since the server supports SSL too)
Easy enough to set 150 threads, loop counts, ramp up period, etc.

Setting the web URL default + any other pages I want to request was also simple, I just had to right click and add.

Officially, Jmeter is supposed to support SSL. Eventually I did figure out that to set HTTPS you type HTTPS in the Protocol[HTTP]: field

Tool is too easily capable of accidental Denial of Service attacks – an miss-type resulted in 15000 threads being sent at Google.

The output could definitely be improved – the user must specify the format of output – ie graph, results in table form, etc, prior to testing. If specified afterwards, the objects have no access to existing testing data.

Also, changing the URL of the test did not create a new graph – the tool continued graphing in the same graph object from a previous URL.

The request JMeter sends out looks like this:

GET https://www.google.com.au/

[no cookies]

Request Headers:
Connection: keep-alive
Host: http://www.google.com.au
User-Agent: Apache-HttpClient/4.2.6 (java 1.5)

And the response headers look like this:

Response headers:
HTTP/1.1 200 OK
Date: Wed, 06 May 2015 12:52:49 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=480d1186ae65ebb6:FF=0:TM=1430916769:LM=1430916769:S=reJQGYLeyGXLWT_7; expires=Fri, 05-May-2017 12:52:49 GMT; path=/; domain=.google.com.au
Set-Cookie: NID=67=Yx5XfSpqh_3dlD4olPdxxUR8PeB9sVlRAO-SKyZ5Z6q4jyw4lhKGfDvWUpxf10RMRfF9Jo75GuzmyPs5D_VzyaB46a9NDqFL2ImuLoO2rL-QZthOgdmvtacAtV8FdZnZ; expires=Thu, 05-Nov-2015 12:52:49 GMT; path=/; domain=.google.com.au; HttpOnly
P3P: CP=”This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info.”
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 443:quic,p=1
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked

It isn’t particularly clear if JMeter successfully hit the site in HTTPS, further research indicates that the only hint is the line:

Alternate-Protocol: 443:quic,p=1

which is

Alternate-Protocol: 80:quic,p=1

for HTTP

Literature Review – Potential Article #5

Mukherjee et al highlight specific challenges to testing web applications hosted in the cloud. Despite these challenges, they then develop a framework for testing web applications in a pseudo-MVC set-up using in-cloud tools.

  • In particular, bottlenecks external to the server(cloud) need to be measured, as they can randomly affect benchmark performance
    • Things such as internal network latency could become a problem, compared to actually testing in an isolated system.
    • Virtualisation overhead for example, becomes a problem in the cloud.
  • Further, they use httperf to submit requests to their web application to perform their testing – it generates a certain number of connections (sessions) per second
    • Httperf is a relatively simplistic tool – it generates workload and sends it off to the server
    • The solution I aim to create could build off the ideas httperf uses (it’s open source), but rather create meaningful scenario testing for my application

Mukherjee, J., Wang, M., & Krishnamurthy, D. (2014, March 31 2014-April 4 2014). Performance Testing Web Applications on the Cloud. Paper presented at the Software Testing, Verification and Validation Workshops (ICSTW), 2014 IEEE Seventh International Conference