About ThruZero

v012915.2112

ThruZero is designed and implemented by George Norman using tz-commons, Java Server Faces (JSF RI) and Bootstrap (from Twitter).

Phase-1 focused on concurrency and stability issues. It also provided a general picture of the site's performance characteristics (sub-second page response-times is a goal of this site). The load tests were executed using JMeter 2.11, targeting a Tomcat 6.0 server, running on the same machine. The operating system was OS X (v10.9) with an Intel 2.6 GHz Core i7 processor.

Each Test Plan accessed the same 22-pages, but used an increasing number of users and ramp-up times. The Ramp-Up Period for each plan was set for N-seconds, where N was the number of users (e.g., the ramp-up period for 15-concurrent users was 15-seconds). The Loop Count was set to 5, so each page would be hit 5 times by each user (e.g., a 50-user test would hit each page 250 times). Total throughput (TTP) is recorded for each test.

1-User (110 total requests; 5.6/sec TTP)

This simple test exposed the following issues with the RSS feed service:

  1. Since all feeds were refreshed sequentially, the total time to render any page with multiple stale RSS feeds was the sum of the time it took to refresh each stale feed panel.
  2. Even if only one panel needed to be refreshed, the lack of a read timeout caused the page to render slowly if the panel had a slow feed.

A further improvement, of the RSS feed panel, could be to call the RSS feed service after the page has been rendered and sent to the browser (i.e., use Ajax to refresh the panel).

At this point, the sequential refresh issue has been fixed with an Executor and a read timeout has been added to the connection. The results below show a blip for the first refresh, of 9-RSS feeds on the Home page, afterwhich no further refreshes are required (each feed is cached and refreshed at individual periods in hours).

5-Users (550 total requests; 27.4/sec TTP)

This simple test exposed two major concurrency issues with the InfoNodeService:

  1. The SimpleRssFeedService assumed non-concurrent updates. So, if a feed needed to be updated and multiple requests were received for that feed at the same time, it would spawn a new connection for each request to update the feed.
  2. When multiple users accessed the InfoNodeService under load, intermittent IndexOutOfBoundsExceptions would occur in the InfoNodeFilterChain class. The code below shows the crash site and line 02 is where the exception occurred.
    1. if (filterIndex < filterList.size()) {
    2. result = filterList.get(filterIndex++).applyFilter(infoNode, this);
    3. }
    An IndexOutOfBoundsException, in the code above, implies a concurrency error. A code inspection led to the InfoNodeService, where it was found to have an InfoNodeFilterChain as a data member. Problem! Services are stateless, yet the InfoNodeService had a state variable. Furthermore, the state variable was of type InfoNodeFilterChain. Moving the filter chain out of the service instance and into the load function, as a local variable, fixed the problem.

The results below were generated without any further exceptions.

10-Users (1,100 total requests; 31.4/sec TTP)

Throughput is continuing to increase, as more users are added.

25-Users (2,750 total requests; 32.3/sec TTP)

Throughput is starting to level-off and the average response time of some pages has exceeded the sub-second response goal.

50-Users (5,500 total requests; 32.1/sec TTP at end)

Throughput has plateaued and average page-response times are nearing 1.5 to 2-seconds.

75-Users (8250 total requests; 23.2/sec TTP at end)

Throughput has decreased and the site is starting to see page response times of about 4-sec, far from the sub-second goal. Need to investigate. There could be a resource contention in my app or maybe need to do some Tomcat tuning (or both).

100-Users (11000 total requests; 7.4/sec TTP at end)

The site has become usable. Throughput has fallen off a cliff and destroyed the page-response times.



Phase-2 tests will focus on improving performance, under loads of 75 or more concurrent users.