Why look at Kubernetes and CoreOS

We are currently operating a service oriented architecture that is ‘dockerized’ with both host and containers running CentOS 7 when deployed straight on top of ec2 instances. We also have a deployment pipline with beanstalk + homegrown scripts. I imagine our position/maturity is similar to a lot of SMEs, we have aspirations of being on top of new technologies/practices but are some where in between old school and new school:

Old School New School
IT and Dev separate Devops (Ops and Devs have the same goals and responsibilities)
Monolithic/Large services Microservices
Big Releases Continuous Deployment
Some Automation Almost total automation with self-service
Static scaling Dynamic scaling
Config Management Image management (with immutable deployments)
IT staff have a high baseline work IT staff have low baseline work, more room for initiatives

This is not about which end of this incomplete spectrum is better… we have decided that for our place in the world, moving further the left is desirable. I know there are a lot of experienced IT operators that take this view:

Why CoreOS for Docker Hosts?

CoreOS: A lightweight Linux operating system designed for clustered deployments providing automation, security, and scalability for your most critical applications –

Our application and supporting services run in docker, there should not be any dependencies on the host operating system (apart from the docker engine and storage mounts).

Some questions I ask myself now:

  • Why do I need to monitor for and stage deployments of updates?
  • Why am I managing packages on a host OS that could be immutable (like CoreOS is, kind of)?
  • Why am I managing what should be homogeneous machines with puppet?
  • Why am I nursing host machines back to health when things go wrong (instead of blowing them away and redeploying)?
  • Why do I need to monitor SE Linux events?

I want a Docker Host OS that is/has:

  • Smaller, Stricter, Homogeneous and Disposable
  • Built in hosts and service clustering
  • As little management as possible post deployment

CoreOS looks good for removing the first set of questions and sufficing the wants.

Why Kubernetes?

Kubernetes: “A platform for automating deployment, scaling, and operations of application containers across clusters of hosts” –

Some questions I ask myself now:

  • Should my deployment, monitoring and scaling completely separate or be a platform?
  • Why do I (IT ops) still need to be around for prod deployments (no automatic success criteria for staged deploys and not automatic rollback)?
  • Why are our deployment scripts so complex and non-portable
  • Do I want a scaling solution outside of AWS Auto-Scaling groups?

I want a tool/platform to:

  • Streamline and rationalise our complex deployment process
  • Make monitoring, scaling and deployment more manageable without our lines of homebaked scripts
  • Generally make our monitoring, scaling and deployment more able to meet changing requirements

Kubernetes looks good for removing the first set of questions and sufficing the wants.

Next steps

  • Create a CoreOS cluster
  • Install Kubernetes on the cluster
  • Deploy an application via Kubernetes
  • Assess if CoreOS and Kubernetes take us in a direction we want to go

Monitoring client side performance and javascript errors

The rise of single page apps (ie AngularJS) present some interesting problems for Ops. Specifically, the increased dependence on browser executed code means that real user experience monitoring is a must.


To that end I have reviewed some javascript agent monitoring solutions:

The solution/s must have the following requirements:

  • Must have:
    • Detailed javascript error reporting
    • Negligible performance impact
    • Real user performance monitoring
    • Effective single page app (AnglularJS support)
    • Real time alerting
  • Nice to have:
    • Low cost
    • Easy to deploy and maintain integration
    • Easy integration with tools we use for notifications (icinga2, Slack)

As our application is a single page Angular app, New Relic Browser requires that we pay US$130 for any single page app capability. The JavaScript error detection was not very impressive as uncaught exceptions outside of the angular app were not reported without angular integration.

Google Analytics with custom event push does not have any real time alerting which disqualifies it as an Ops solution.

AppDynamics Browser was easy to integrate, getting javascript error details in the console was straight forward but getting those errors to communication tools like slack was surprisingly difficult. Alerts are based on health checks which are breaking of metric thresholds – so I can send an alert saying there was more than 0 javascript errors in the last minute. But no details about the error and no direct link to the error. simple to add monitoring, simple to get alerting with click through to all the javascript error info. No performance monitoring.

Conclusion sticking to the Unix philosophy, using for javascript error alerting and AppDynamics Browser Lite for performance alerting. Both have free levels to get started (ongoing, not just 30 day trial).


Getting started with Gatling – Part 2

With the basics of Simulations, Scenarios, Virtual Users, Sessions, Feeders, Checks, Assertions and Reports down –  it’s time to think about what to load test and how.

Will start with a test that tries to mimic the end user experience. That means that all the 3rd party javascript, css, images etc should be loaded. It does not seem reasonable to say our loadtest performance was great but none of our users will get a responsive app because of all those things we depend on (though, yes, most of it will likely already be cached by the user). This increases the complexity of the simulation scripts as there will be lots of additional resource requests cluttering things up. It is very important for maintainability to avoid code duplication and use the singleton object functionality available.

Using the recorder

As I want to include CDN calls, I tried the recorder’s ‘Generate CA’ functionality. This is supposed to generate certs on the fly for each CN. This would be convenient as I could just trust a locally generated CA and not have to track down and trust all sources. Unfortunately I could not get the recorder to generate its own CA, and when using a local CA generated with openssl I could not feed the CA password to the recorder. I only spent 15 mins trying this until reverting to the default self signed cert. Reviewing Firefox’s network panel (Firefox menu -> Developer -> Network ) shows any blocked sources which can then be visited directly and trusted with our fake cert (there are some fairly serious security implications of doing this, I personally only use my testing browser (firefox) with these types of proxy tools and never for normal browsing).

The recorder is very handy for getting the raw code you need into the test script, it is not a complete test though. Next up is:

  1. Dealing with authentication headers –  The recorded simulation does not set the header based on response from login attempt
  2. Requests dependent on the previous response – The recorder does not capture this dependency it only see the raw outbound requests so there will need to be consideration on parsing results
  3. Validating responses

Dealing with authentication headers

The Check API is used for verifying that the response to a request matches expectations and capturing some elements in it.

After half an hour or so of playing around the Check API, it is behaving as I want thanks to good, concise doc.

   .check(headerRegex("Set-Cookie", "access_token=(.*);Version=*").saveAs("auth_token"))

The “.check” is looking for the header name “Set-Cookie” then extracting the auth token using a regex and finally saving the token as a key called auth_token.

In subsequent requests I need to include a header containing this value, and some other headers. So instead of listing them out each time a function makes things much neater:

def authHeader (auth_token:String):Map[String, String] = {
   Map("Authorization" -> "Bearer ".concat(auth_token),
       "Origin" -> baseURL)
   .get(uri1 + "/information-requests")
   .headers(authHeader("${auth_token}")) // providing the saved key value as a string arg

Its also worth noting that to ensure that all this was working as expected I modified /conf/logback.xml to output all HTTP request response data to stdout.


Requests dependent on the previous response

With many modern applications, the behaviour of the GUI is dictated by responses from an API. For example, when a user logs in, the GUI requests a json file with all (max 50) of the users open requests. When the GUI received this, the requests are rendered. In many cases this rendering process involves many more HTTP requests that depending on the time and state of the users which may vary significantly. So… if we are trying to imitate end user experience instead of requesting the render info for the same open requests all of the time, we should parse the json response and adjust subsequent requests accordingly. Thankfully gatling allows for the use of JsonPath. I got stuck trying to get all of the id vals out of a json return and then create requests for each of them. I had incorrectly assumed that the EL Gatling provided ‘random’ function could be called on a vector. This meant I thought the vector was ‘undefined’ as per the error message. The vector was in fact as expected which was clear by printing it.

//grabs all id values from the response body and puts them in a vector accessible via "${answer_ids}" or sessions.get("answer_ids")
.get(uri1 + "/information-requests")
.headers(authHeader("${auth_token}")).check(, jsonPath("$").findAll.saveAs("answer_ids")) 
//prints all vaules in the answer_ids vector
.exec(session => {
    val maybeId = session.get("answer_ids").asOption[String]
    println(maybeId.getOrElse("no ids found"))

To run queries with all of the values pulled out of the json response we can use the foreach component. Again got stuck for a little while here. Was putting the foreach competent within an exec function, where (as below) it should be outside of an exec and reference a chain the contains an exec.

val answer_chain = exec(http("an_answer")
    .get(uri1 + "/information-requests/${item}/stores/answers")
val scn = scenario("BasicLogin")
    .get(uri1 + "/information-requests")
    .headers(authHeader("${auth_token}")).check(, jsonPath("$").findAll.saveAs("answer_ids"))),
.foreach("${answer_ids}","item") { answer_chain }

Validating responses

What do we care about in responses?

  1. HTTP response headers (generally expecting 200 OK)
  2. HTTP response body contents – we can define expectations based on understanding of app behaviour
  3. Response time – we may want to define responses taking more than 2000ms as failures (queue application performance sales pitch)

Checking response headers is quite simple and can be seen explicitly above in .check( In fact, there is no need for 200 checks to be explicit as “A status check is automatically added to a request when you don’t specify one. It checks that the HTTP response has a 2XX or 304 status code.”checks.

HTTP response body content checks are valuable for ensuring the app behaves as expected. They also require a lot of maintenance so it is important to implement tests using code reuse where possible. Gatling is great for this as we can use the scala and all the power that comes with it (ie: reusable objects and functions across all tests).

Next up is response time checks. Note that these response times are specific to the HTTP layer and do not infer a good end user experience. Javascript and other rendering, along with blocking requests mean that performance testing at the HTTP layer is incomplete performance testing (though it is the meat and potatoes).
Gatling provides the Assertions API to conduct checks globally (on all requests). There are numerous scopes, statistics and conditions to choose from there. For specific operations, responseTimeInMillis and latencyInMillis are provided by Gatling – responseTimeInMillis includes the time is takes to fully send the request and fully receive the response (from the test host). As a default I use responseTimeInMillis as it has slightly higher coverage as a test.

These three verifications/tests can be seen here:

package mwc_gatling
import scala.concurrent.duration._
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import io.gatling.jdbc.Predef._

class BasicLogin extends Simulation {
    val baseURL=""
    val httpProtocol = http
        .acceptHeader("application/json, text/plain, */*")
        .acceptEncodingHeader("gzip, deflate")
        .userAgentHeader("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0")
    def authHeader (auth_token:String):Map[String, String] = {
        Map("Authorization" -> "Bearer ".concat(auth_token), "Origin" -> baseURL)

    val answer_chain = exec(http("an_answer")
        .get(uri1 + "/information-requests/${item}/stores/answers")
        .headers(authHeader("${auth_token}")).check(, jsonPath("$..status")))
    val scn = scenario("BasicLogin")
     //... bunch of get requests for JS CSS etc
           .check(headerRegex("Set-Cookie", "access_token=(.*);Version=*").saveAs("auth_token"))
    //... another bunch of get for post auth deps
            .get(uri1 + "/information-requests")
            .headers(authHeader("${auth_token}")).check(, jsonPath("$").findAll.saveAs("answer_ids"))
    //... now that we have a vector full of ids we can request those resources
    .foreach("${answer_ids}","item") { answer_chain }
  //... finally set the simulation params and assertions

That’s about all I need to get started with Gatling! The next steps are:

  1. extending coverage (more tests!)
  2. putting processes in place to notify and act on identified issues
  3. refining tests to provide more information about the likely problem domain
  4. making a modular and maintainable test library that can be updated in one place to deal with changes to app
  5. aggregating results for trending and correlation with changes
  6. spin up and spin down environments specifically for load testing
  7. jenkins integration

Getting started with Gatling – Part 1

With the need to do some more effective load testing I am getting started with Gatling. Why Gatling and not JMeter? I have not used either so I don’t have a valid opinion. I made my choice based on:

Working through the Gatling Quickstart

Next step is working through the basic doc: Pretty simple and straightforward.

Moving on to the more advanced tutorial: This included:

  • creating objects for process isolation
  • virtual users
  • dynamic data with Feeders and Checks
  • First usage of Gatling’s Expression Language (not rly a language o_O)

The most interesting function:

object Search {
    val feeder = csv("search.csv").random
    val search = exec(http("Home")
                .check(css("a:contains('${searchComputerName}')", "href").saveAs("computerURL")))

…Simulation‘s are plain Scala classes so we can use all the power of the language if needed.

Next covered off the key concepts in Gatling:

  • Virtual User -> logical grouping of behaviours ie: Administrator(login, update user, add user, logout)
  • Scenario -> define Virtual Users behaviours ie: (login, update user, add user, logout)
  • Simulation -> is a description of the load test (group of scenarios, users – how many and what rampup)
  • Session -> Each virtual user is back by a Session this can allow for sharing of data between operations (see above)
  • Feeders -> Method for getting input data for tests ie: login values, search and response values
  • Checks -> Can verify HTTP response codes and capture elements of the response body
  • Assertions -> Define acceptance criteria (slower than x means failure)
  • Reports -> Aggregated output

Last review for today was of presentation by Stephane Landelle and Romain Sertelon,  the authors of Gatling:

Next step is to implement some test and figure out a good way to separate simulations/scenarios and reports.

InfoSec Notes ITOps

Transitioning from standard CA to LetEncrypt!

With the go-live of its time to transition from the pricy and manual standard SSL cert issuing model to a fully automated process using the ACME protocol. Most orgs have numerous usages of CA purchased certs, this post will cover hosts running apache/nginx and AWS ELBs, all of these usages are to be replaced with automated provisioning and renewal of letsencrypt signed certs.

Provisioning and auto-renewing Apache and nginx TLS/SSL certs

For externally accessible sites where Apache/Nginx handles TLS/SSL termination moving to letsencrypt is quick and simple:

1 – Install the letsencrypt client software (there are RHEL and Centos rpms – so thats as simple as adding the package to puppet policies or

yum install letsencrypt

2 – Provision the keys and certificates for each of the required virtual hosts. If a virtual host has aliases, specify multiple names with the -d arg.

letsencrypt certonly --webroot -w /var/www/sites/static -d -d

This will provision a key and certificate + chain to the letsencrypt home directory (defaults /etc/letsencrypt). The /etc/letsencrypt/live directory contains symlinks to the current keys and certs.

3 – Update the apache/nginx virtualhost configs to use the symlinks maintained by the letsencrypt client, ie:

# static Web Site

	ServerAlias # <<-- dummy alias for internal site
	ServerAdmin [email protected]

	DocumentRoot /var/www/sites/static
	DirectoryIndex index.php index.html
		AllowOverride all
		Options +Indexes
	ErrorLog /var/log/httpd/static_error.log
	LogLevel warn
	CustomLog /var/log/httpd/static_access.log combined

	ServerAdmin [email protected]

	DocumentRoot /var/www/sites/static
	DirectoryIndex index.php index.html
		AllowOverride all
		Options +Indexes	
	ErrorLog /var/log/httpd/static_ssl_error.log
	LogLevel warn
	CustomLog /var/log/httpd/static_ssl_access.log combined

	SSLEngine on
	SSLHonorCipherOrder on
	SSLInsecureRenegotiation off
	SSLCertificateKeyFile /etc/letsencrypt/live/
	SSLCertificateFile /etc/letsencrypt/live/
	SSLCertificateChainFile /etc/letsencrypt/live/

4 – Create a script for renewing these certs, something like:

# Vars
PROG_ECHO=$(which echo)
PROG_LETSENCRYPT=$(which letsencrypt)
PROG_FIND=$(which find)
PROG_OPENSSL=$(which openssl)

# Main
${PROG_ECHO} "Current expiries: "
for x in $(${PROG_FIND} /etc/letsencrypt/live/ -name cert.pem); do ${PROG_ECHO} "$x: $(${PROG_OPENSSL} x509 -noout -enddate -in $x)";done
${PROG_ECHO} "running letsencrypt certonly --webroot .. on $(hostname)"
${PROG_LETSENCRYPT} renew --agree-tos
systemctl restart httpd
if [ "$LE_STATUS" != 0 ]; then
    ${PROG_ECHO} Automated renewal failed:
    cat /var/log/letsencrypt/renew.log
    exit 1
    ${PROG_ECHO} "New expiries: "
    for x in $(${PROG_FIND} /etc/letsencrypt/live/ -name cert.pem); do echo "$x: $(${PROG_OPENSSL} x509 -noout -enddate -in $x)";done

5 – Run this script automatically everyday with cron or jenkins

6 – Monitoring the results of the script and externally monitor the expiry dates of your certificates (something will go wrong one day)

Provisioning and auto-renewing AWS Elastice Load Balancer TLS/SSL certs

This has been made very easy by Alex Gaynor with a handy python script: This is a great use-case for docker and Alex has created a docker image for the script: To use this with ease I created a layer on top creating a new Dockerfile:

# mwc letsencrypt-aws image
FROM alexgaynor/letsencrypt-aws:latest


[{\"elb\":{\"name\":\"TestExtLB\",\"port\":\"443\"}, \
\"hosts\":[\"\",\"\",\"\"], \
\"key_type\":\"rsa\"}, \
{\"elb\":{\"name\":\"ProdExtLb\",\"port\":\"443\"}, \
\"hosts\":[\"\",\"\",\"\", \
\"\",\"\"], \
\"key_type\":\"rsa\"}], \

ENV AWS_DEFAULT_REGION="ap-southeast-2"

The explanation of these values can be found at Its quite important to create a specific IAM User to conduct the required Route53/S3 and ELB actions. This images need to be build on changes:

sudo docker build -t .
sudo docker push

With this image built another cron or jenkins job can be run daily executing something like:

sudo docker pull
sudo docker run
sleep 10
sudo docker rm $(sudo docker ps -a | grep | awk '{print $1}')

Again, the job must be monitored along with external monitoring of certificates. See a complete SSL checker at


Monolithic to Microservices

Review of ‘AWS re:Invent 2015 | (ARC309) Microservices: Evolving Architecture Patterns in the Cloud’ presentation.

From monolithic to microservices, on AWS.

Screenshot 2015-11-07 22.15.36

Ruby on rails -> Java based functional services.

Screenshot 2015-11-07 22.16.52

Java based functional services resulted in requirement for everything to be loaded into memory – 20 minutes to start services. Still very large services –  with many engineers working on it. This means that commits to those repos take a long time to QA. So a commit is made then start working on something else then a week or two later have to fix what you hardly remember. Then to get specific information out of those big java apps it was required to parse the entire homepage. So…

Screenshot 2015-11-07 22.21.53

Big sql database is still there – This will mean changes with schema change are difficult to do without outages. Included in this stage of micro services was:

  • Team autonomy – give teams a problem and they  can build whatever services they need
  • Voluntary adoption – tools/techniques/processes
  • Goal driven initiatives
  • Failing fast and openly

What are the services? – Email, shipping cost, recommendations, admin, search

Anatomy of a service:

Screenshot 2015-11-07 22.30.51

A service has its own datastore and it completely owns its own datastore. This is where dependency on one big schema is avoided. Services at average 2000 lines of code and 32 source files.

Service discovery – enormously simple? – Discovery is a client needs to get to a services, how is it going to get there. ‘It has the name of the service – look up that URL’.

Screenshot 2015-11-07 22.41.14

Use ZooKeeper as a highly available store.

Moving all this to the cloud via ‘lift and shift’ with AWS Direct Connect. In AWS all services where their own ec2 instance and dockerized.

Being a good citizen in a microservices organisation:

  • Service Consumer
    • Design for failure
    • Expect to be throttled
    • Retry with exponential backoff
    • Degrade gracefully
    • Cache when appropriate
  • Service Provider
    • Publish your metrics
      • Throughput, error rate, latency
    • Protect yourself (throttling)
    • Implementation details private
    • Maintain backwards compatability
    • See Amazon API gateway

Again service discovery – simple with naming conventions, DNS and load balancers. Avoid DNS issues with dynamic service registry (ZooKeeper, Eureka, Consul, SmartStack).

Data management – moving away from the schema change problems. And the other problems (custom stored procedures, being stuck with one supplier, single point of failure). Microservices must include decentralisation of datastores, services own their datastores and do not share them. This has a number of benefit from being able to chose whatever data store technology best meets the service needs, make changes without affecting other services, scale the data stores independently. So how do we ensure transactional integrity? Distributed locking sounds horrible. Do all services really need strict transactional integrity? Use queues to retry later.

Aggregation of data – I need to do analytics on my data? AWS Kenesis firehose, Amazon SQS, custom feed.

Continuous Delivery/Deployment  – Introduced some AWS services Code Deploy — or just use Jenkins.

How many services per container/instance – Problems with monitoring granularity, scaling is less granular, ownership is less atomic, continuos deployment is tougher with immutable containers.

I/O Explosion –  Mapping dependencies between services is tough. Some will be popular/hotspots. Service consumers need to cache were they can. Dependency injection is also an option – You can only make a request from services A if you have the required data from service B and C in your request.

Monitoring –  Logs are key, also tracing request through fanned dependency can be much easier with a requirement for a header that is passed on.

Unique failures – Watched a day in the life of a Netflix engineers… good points on failures. We accept and increased failure rate to maintain velocity. We just want to ensure failures are unique. For this to happen we need to have open and safe feedback.



Scala tail recursion vs non-tail receursion performance testing with dynaTrace

The scala compiler has partial tail call optimization (

Running some like methods with and without allowing the scala compiler to optimize demonstrated the performance improvements gained by this optimization.

First up, simple factoral functions (source:

//this function will be tail recusive optimized
def tFactorial(number: Long) : Long = {
 def tfactorialWithAccumulator(accumulator: Long, number: Long) : Long = {
 if (number == 1) 
 return accumulator
 tfactorialWithAccumulator(accumulator * number, number - 1)
 tfactorialWithAccumulator(1, number)
//Non-tail recursive function
def ntFactorial(number:Long) : Long = {
 if (number == 1)
 return 1
 number * ntFactorial (number - 1)

For explanation of these functions and why/not they are tail recursive see the source above.


Non-Tail recursive average response time: approx 7 ms

Tail recursive average response time: approx 0.7ms

The number of test conducted was limited to about 10 each and the test environment was an Ubuntu VM. The results will not be highly accurate but with a difference this significant it is clear that tail optimized recursion is very different from normal recursion (On a JVM). Looking at the purepath for each method reveals tail-recursion optimization working as expected and saving alot of execution time:

Comparing the two purepaths, the 10x increase in response time is clearly caused by the lack of tail recursion on the left.

The next example looks at what the compiler does if there is extra code within a tail optimized funtion:

// Tail recursive version 
 def execTest3Tail(loopCount: Int): String = {
 def test2WithAccumulator(accumulator: Int, number: Int): String = {
 println("Hello World " + accumulator + "\n")
 if (accumulator >= loopCount)
 return "Completed"
 test2WithAccumulator(loopCount, (accumulator + 1))
 test2WithAccumulator(1, loopCount)
// Non-Tail recursive version
 def execTest3nt(loopCount: Int, cur: Int): String = {
 println("Hello World " + cur + "\n") 
 if (cur >= loopCount)
 return "Completed"
 execTest3nt(loopCount, (cur + 1))

The output of the non-tail recursive function was as expected, printing all of the output expected. The scala compiler optimized over the expected (perhaps incorrectly expected) behaviour of the optimized function. The output was:

enter num of iterations:1000
Hello World 1
Hello World 1000

As the function was optimized not to run through all of the loops – the println’s simply did not occur. The purepath comparision:

Results are similar to the first test. The right (tail-optimized) purepath call hierarchy shows why there were only 2 screen outputs..



Scala vs Java performance – dynaTrace Analysis – Test 1 Fibonacci Sequence cont’

After some initial JVM monitoring yesterday, I looked at response times today. First thing I had to do was modify the respective classes slightly to get a non-constructor method doing all the work. This made it much easier to gather and chart the results within dynaTrace (though I did later find that there was a check box in dynaTrace to monitor constructors).

The tests today were again on 3 implementations of the Fibonacci sequence:

Java – see source

Scala Simple – see source

Scala Advanced – see source

Test environment info

I spent most of the day playing with dynaTraces monitoring and reporting tools so did not actually do that much testing. The tests I did run where quite short. The first being execution of the fib sequence returning the 10,000 value, the second stretching to 100,000. The charts below show the time results for each test, if more than one test was executed for an implementation in a 30 second cycle, the chart shows the average.

*Update 16-JUL: Retested on host machine (not running in VM) results where much more consistent despite having a number of other services operating at the same time. There are some clear results this time:

Java vs Scala vs Scala Adv - Fibonacci
Running on a host machine, the Java implementation has a slight performance edge over the Scala simple. The Scala advanced implementation is slower. These tests were to the 100,000th Fibonacci value


Stretching the test the 1,000,000th Fibonacci value meant that Scala Adv overran the heap but the difference between Scala simple and Java was negligible

From the results it appears that the variations in response times are most likely caused by external factors (ie: OS or other services taking resouces, Garbage collections etc). I will really need to increase the sample size to get a better understanding of what is influencing response times.

Fibonacci sequence to 10,000th value
Fibonacci sequence to 100,000th value

Scala vs Java performance – dynaTrace Analysis – Test 1 Fibonacci Sequence

Trying to make some comparisons between Scala and Java. Using dynaTrace I can get very useful and granular insight into the behaviours of each.  To keep things as comparable as possible the tests will be on simple, short algorithms. Large iterations will be conducted so garbage collection and space complexity can be analysed.

Thanks to Julio Faerman for his post and the implementations I have used from that post – see:

With dynaTrace I can get information about the JVM behaviour but also about the behaviour of specific classes and methods. This will not only allow me to make some comparison between Java and Scala implementations but also understand more about what happens when Scala is compiled to byte code.

To start with, nice and simple test – Fibonacci sequence

Three implementations will be tested:

Java Fibonacci  – see code

Scala Simple Fibonacci  – see code

Scala Advanced Fibonacci  – see code

I am running the tests from a controller class with a main method. From there I can instantiate the classes above. Read more about dynaTrace to see how it uses sensors in the JVM to grab a great deal of useful data about the JVM and about classes and methods.

Just to get things rolling, checked out how running the Fib sequence to the 10 million number in the sequence would affect the JVM.  This tests would have taken a long time to complete so after I saw stable behaviour (ie: two sequences) I kill the JVM and moved on to the next sequence. 

The charts below have been modified for human readability. For example, Garbage Collection in the charts was factored up. The charts are also stacked for readability. Within the dynaTrace client this is much easier to read with mouse over pop-ups.

What can be taken from this fairly simple example is that code may work and appear to be performing well. In reality it may be wreaking havoc on the JVM. Next post will look at the response times of these implementations.

JVM Heap usage and Garbage Collection utilization
Removing the Scala Adv Sequence allows for better comparison of Java and Scala Simple
Garbage Collection utilization and Garbage collection caused suspensions (time)
GC utilization, CPU Utilization and Suspension time cause by GC – without Scala Advanced


Test details:


Virtual Machine: Ubuntu 12.04 LTS 32 bit - 3072MB, 2 Processors
Physical Host: Windows 7 64-bit - 8GB, i7-2620M @ 2.70 GHz


java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.1) (6b24-1.11.1-4ubuntu3)
OpenJDK Server VM (build 20.0-b12, mixed mode)
Scala compiler version 2.9.2

Run script:

java -Xmx2048m -cp $scala_path:. -agentpath:/home/mark/Desktop/performance_testing/dynatrace/agent/lib/,server= PerformanceController