Category: ITOps

Notes from experience working in and studying ITOps

ITOps

Transitioning from standard CA to LetEncrypt!

Post author By mark
Post date May 29, 2016
No Comments on Transitioning from standard CA to LetEncrypt!

With the go-live of https://letsencrypt.org/ its time to transition from the pricy and manual standard SSL cert issuing model to a fully automated process using the ACME protocol. Most orgs have numerous usages of CA purchased certs, this post will cover hosts running apache/nginx and AWS ELBs, all of these usages are to be replaced with automated provisioning and renewal of letsencrypt signed certs.

Provisioning and auto-renewing Apache and nginx TLS/SSL certs

For externally accessible sites where Apache/Nginx handles TLS/SSL termination moving to letsencrypt is quick and simple:

1 – Install the letsencrypt client software (there are RHEL and Centos rpms – so thats as simple as adding the package to puppet policies or

yum install letsencrypt

2 – Provision the keys and certificates for each of the required virtual hosts. If a virtual host has aliases, specify multiple names with the -d arg.

letsencrypt certonly --webroot -w /var/www/sites/static -d static.mwclearning.com -d img.mwclearning.com

This will provision a key and certificate + chain to the letsencrypt home directory (defaults /etc/letsencrypt). The /etc/letsencrypt/live directory contains symlinks to the current keys and certs.

3 – Update the apache/nginx virtualhost configs to use the symlinks maintained by the letsencrypt client, ie:

# static Web Site

	ServerName static.mwclearning.com
	ServerAlias img.mwclearning.com
	ServerAlias registry.mwclearning.ninja # <<-- dummy alias for internal site
	ServerAdmin webmaster@mwclearning.ninja

	DocumentRoot /var/www/sites/static
	DirectoryIndex index.php index.html
        
		AllowOverride all
		Options +Indexes
        
	ErrorLog /var/log/httpd/static_error.log
	LogLevel warn
	CustomLog /var/log/httpd/static_access.log combined



	ServerName static.mwclearning.com
	ServerAlias img.mwclearning.com
	ServerAlias img.mwclearning.ninja
	ServerAdmin webmaster@mchost

	DocumentRoot /var/www/sites/static
	DirectoryIndex index.php index.html
        
		AllowOverride all
		Options +Indexes	
        
	ErrorLog /var/log/httpd/static_ssl_error.log
	LogLevel warn
	CustomLog /var/log/httpd/static_ssl_access.log combined

	SSLEngine on
	SSLCipherSuite ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS:!RC4
	SSLHonorCipherOrder on
	SSLInsecureRenegotiation off
	SSLCertificateKeyFile /etc/letsencrypt/live/static.mwclearning.com/privkey.pem
	SSLCertificateFile /etc/letsencrypt/live/static.mwclearning.com/cert.pem
	SSLCertificateChainFile /etc/letsencrypt/live/static.mwclearning.com/chain.pem

4 – Create a script for renewing these certs, something like:

#!/bin/bash
# Vars
PROG_ECHO=$(which echo)
PROG_LETSENCRYPT=$(which letsencrypt)
PROG_FIND=$(which find)
PROG_OPENSSL=$(which openssl)

#
# Main
#
${PROG_ECHO} "Current expiries: "
for x in $(${PROG_FIND} /etc/letsencrypt/live/ -name cert.pem); do ${PROG_ECHO} "$x: $(${PROG_OPENSSL} x509 -noout -enddate -in $x)";done
${PROG_ECHO} "running letsencrypt certonly --webroot .. on $(hostname)"
${PROG_LETSENCRYPT} renew --agree-tos
LE_STATUS=$?
systemctl restart httpd
if [ "$LE_STATUS" != 0 ]; then
    ${PROG_ECHO} Automated renewal failed:
    cat /var/log/letsencrypt/renew.log
    exit 1
else
    ${PROG_ECHO} "New expiries: "
    for x in $(${PROG_FIND} /etc/letsencrypt/live/ -name cert.pem); do echo "$x: $(${PROG_OPENSSL} x509 -noout -enddate -in $x)";done
fi
# EOF

5 – Run this script automatically everyday with cron or jenkins

6 – Monitoring the results of the script and externally monitor the expiry dates of your certificates (something will go wrong one day)

Provisioning and auto-renewing AWS Elastice Load Balancer TLS/SSL certs

This has been made very easy by Alex Gaynor with a handy python script: https://github.com/alex/letsencrypt-aws. This is a great use-case for docker and Alex has created a docker image for the script: https://hub.docker.com/r/alexgaynor/letsencrypt-aws/. To use this with ease I created a layer on top creating a new Dockerfile:

#
# mwc letsencrypt-aws image
#
FROM alexgaynor/letsencrypt-aws:latest

MAINTAINER Mark

ENV LETSENCRYPT_AWS_CONFIG="{\"domains\": \
[{\"elb\":{\"name\":\"TestExtLB\",\"port\":\"443\"}, \
\"hosts\":[\"test.mwc.com\",\"test-app.mwc.com\",\"test-api.mwc.com\"], \
\"key_type\":\"rsa\"}, \
{\"elb\":{\"name\":\"ProdExtLb\",\"port\":\"443\"}, \
\"hosts\":[\"app.mwc.com\",\"show.mwc.com\",\"show-app.mwc.com\", \
\"app-api.mwc.com\",\"show-api.mwc.com\"], \
\"key_type\":\"rsa\"}], \
\"acme_account_key\":\"s3://config-bucket-abc123/config_items/private_key.pem\"}"

ENV AWS_ACCESS_KEY_ID="<AWS_ACCESS_KEY_ID>"
ENV AWS_SECRET_ACCESS_KEY="<AWS_SECRET_ACCESS_KEY>"
ENV AWS_DEFAULT_REGION="ap-southeast-2"
# EOF

The explanation of these values can be found at https://hub.docker.com/r/alexgaynor/letsencrypt-aws/. Its quite important to create a specific IAM User to conduct the required Route53/S3 and ELB actions. This images need to be build on changes:

sudo docker build -t registry.mwc.ninja:5000/syseng/ao-letsencrypt-aws .
sudo docker push registry.mwc.ninja:5000/syseng/ao-letsencrypt-aws

With this image built another cron or jenkins job can be run daily executing something like:

sudo docker pull registry.mwc.ninja:5000/syseng/ao-letsencrypt-aws
sudo docker run registry.mwc.ninja:5000/syseng/ao-letsencrypt-aws
sleep 10
sudo docker rm $(sudo docker ps -a | grep registry.mwc.ninja:5000/syseng/ao-letsencrypt-aws | awk '{print $1}')

Again, the job must be monitored along with external monitoring of certificates. See a complete SSL checker at https://github.com/markz0r/tools/tree/master/ssl_check_complete.

ITOps

Monolithic to Microservices

Post author By mark
Post date November 7, 2015
No Comments on Monolithic to Microservices

Review of ‘AWS re:Invent 2015 | (ARC309) Microservices: Evolving Architecture Patterns in the Cloud’ presentation.

From monolithic to microservices, on AWS.

Ruby on rails -> Java based functional services.

Java based functional services resulted in requirement for everything to be loaded into memory – 20 minutes to start services. Still very large services – with many engineers working on it. This means that commits to those repos take a long time to QA. So a commit is made then start working on something else then a week or two later have to fix what you hardly remember. Then to get specific information out of those big java apps it was required to parse the entire homepage. So…

Big sql database is still there – This will mean changes with schema change are difficult to do without outages. Included in this stage of micro services was:

Team autonomy – give teams a problem and they can build whatever services they need
Voluntary adoption – tools/techniques/processes
Goal driven initiatives
Failing fast and openly

What are the services? – Email, shipping cost, recommendations, admin, search

Anatomy of a service:

A service has its own datastore and it completely owns its own datastore. This is where dependency on one big schema is avoided. Services at gilt.com average 2000 lines of code and 32 source files.

Service discovery – enormously simple? – Discovery is a client needs to get to a services, how is it going to get there. ‘It has the name of the service – look up that URL’.

Use ZooKeeper as a highly available store.

Moving all this to the cloud via ‘lift and shift’ with AWS Direct Connect. In AWS all services where their own ec2 instance and dockerized.

Being a good citizen in a microservices organisation:

Service Consumer
- Design for failure
- Expect to be throttled
- Retry with exponential backoff
- Degrade gracefully
- Cache when appropriate
Service Provider
- Publish your metrics
  - Throughput, error rate, latency
- Protect yourself (throttling)
- Implementation details private
- Maintain backwards compatability
- See Amazon API gateway

Again service discovery – simple with naming conventions, DNS and load balancers. Avoid DNS issues with dynamic service registry (ZooKeeper, Eureka, Consul, SmartStack).

Data management – moving away from the schema change problems. And the other problems (custom stored procedures, being stuck with one supplier, single point of failure). Microservices must include decentralisation of datastores, services own their datastores and do not share them. This has a number of benefit from being able to chose whatever data store technology best meets the service needs, make changes without affecting other services, scale the data stores independently. So how do we ensure transactional integrity? Distributed locking sounds horrible. Do all services really need strict transactional integrity? Use queues to retry later.

Aggregation of data – I need to do analytics on my data? AWS Kenesis firehose, Amazon SQS, custom feed.

Continuous Delivery/Deployment – Introduced some AWS services Code Deploy — or just use Jenkins.

How many services per container/instance – Problems with monitoring granularity, scaling is less granular, ownership is less atomic, continuos deployment is tougher with immutable containers.

I/O Explosion – Mapping dependencies between services is tough. Some will be popular/hotspots. Service consumers need to cache were they can. Dependency injection is also an option – You can only make a request from services A if you have the required data from service B and C in your request.

Monitoring – Logs are key, also tracing request through fanned dependency can be much easier with a requirement for a header that is passed on.

Unique failures – Watched a day in the life of a Netflix engineers… good points on failures. We accept and increased failure rate to maintain velocity. We just want to ensure failures are unique. For this to happen we need to have open and safe feedback.

ITOps

Scala tail recursion vs non-tail receursion performance testing with dynaTrace

Post author By mark
Post date July 31, 2012
No Comments on Scala tail recursion vs non-tail receursion performance testing with dynaTrace

The scala compiler has partial tail call optimization (http://blog.richdougherty.com/2009/04/tail-calls-tailrec-and-trampolines.html).

Running some like methods with and without allowing the scala compiler to optimize demonstrated the performance improvements gained by this optimization.

First up, simple factoral functions (source: http://myadventuresincoding.wordpress.com/2010/06/20/tail-recursion-in-scala-a-simple-example/):

//this function will be tail recusive optimized
def tFactorial(number: Long) : Long = {
 def tfactorialWithAccumulator(accumulator: Long, number: Long) : Long = {
 if (number == 1) 
 return accumulator
 else
 tfactorialWithAccumulator(accumulator * number, number - 1)
 }
 tfactorialWithAccumulator(1, number)
}

//Non-tail recursive function
def ntFactorial(number:Long) : Long = {
 if (number == 1)
 return 1
 number * ntFactorial (number - 1)
}

For explanation of these functions and why/not they are tail recursive see the source above.

Results:

Non-Tail recursive average response time: approx 7 ms

Tail recursive average response time: approx 0.7ms

The number of test conducted was limited to about 10 each and the test environment was an Ubuntu VM. The results will not be highly accurate but with a difference this significant it is clear that tail optimized recursion is very different from normal recursion (On a JVM). Looking at the purepath for each method reveals tail-recursion optimization working as expected and saving alot of execution time:

tail-recursion — Comparing the two purepaths, the 10x increase in response time is clearly caused by the lack of tail recursion on the left.

The next example looks at what the compiler does if there is extra code within a tail optimized funtion:

// Tail recursive version 
 def execTest3Tail(loopCount: Int): String = {
 def test2WithAccumulator(accumulator: Int, number: Int): String = {
 println("Hello World " + accumulator + "\n")
 if (accumulator >= loopCount)
 return "Completed"
 else
 test2WithAccumulator(loopCount, (accumulator + 1))
 }
 test2WithAccumulator(1, loopCount)
 }

// Non-Tail recursive version
 def execTest3nt(loopCount: Int, cur: Int): String = {
 println("Hello World " + cur + "\n") 
 if (cur >= loopCount)
 return "Completed"
 else
 execTest3nt(loopCount, (cur + 1))
 }

The output of the non-tail recursive function was as expected, printing all of the output expected. The scala compiler optimized over the expected (perhaps incorrectly expected) behaviour of the optimized function. The output was:

enter num of iterations:1000
Hello World 1

Hello World 1000

Completed

As the function was optimized not to run through all of the loops – the println’s simply did not occur. The purepath comparision:

tail-recursion2 — Results are similar to the first test. The right (tail-optimized) purepath call hierarchy shows why there were only 2 screen outputs..

ITOps

Scala vs Java performance – dynaTrace Analysis – Test 1 Fibonacci Sequence cont’

Post author By mark
Post date July 12, 2012
No Comments on Scala vs Java performance – dynaTrace Analysis – Test 1 Fibonacci Sequence cont’

After some initial JVM monitoring yesterday, I looked at response times today. First thing I had to do was modify the respective classes slightly to get a non-constructor method doing all the work. This made it much easier to gather and chart the results within dynaTrace (though I did later find that there was a check box in dynaTrace to monitor constructors).

The tests today were again on 3 implementations of the Fibonacci sequence:

Java – see source

Scala Simple – see source

Scala Advanced – see source

Test environment info

I spent most of the day playing with dynaTraces monitoring and reporting tools so did not actually do that much testing. The tests I did run where quite short. The first being execution of the fib sequence returning the 10,000 value, the second stretching to 100,000. The charts below show the time results for each test, if more than one test was executed for an implementation in a 30 second cycle, the chart shows the average.

*Update 16-JUL: Retested on host machine (not running in VM) results where much more consistent despite having a number of other services operating at the same time. There are some clear results this time:

Java vs Scala vs Scala Adv - Fibonacci — Running on a host machine, the Java implementation has a slight performance edge over the Scala simple. The Scala advanced implementation is slower. These tests were to the 100,000th Fibonacci value

Test1Loop.NoAdv_JMVmon_1.000.000_HOST — Stretching the test the 1,000,000th Fibonacci value meant that Scala Adv overran the heap but the difference between Scala simple and Java was negligible

From the results it appears that the variations in response times are most likely caused by external factors (ie: OS or other services taking resouces, Garbage collections etc). I will really need to increase the sample size to get a better understanding of what is influencing response times.

Test1Loop.All3_ResponseTime_v10000 — Fibonacci sequence to 10,000th value

Test1Loop.All3_ResponseTime_v100000 — Fibonacci sequence to 100,000th value

ITOps

Scala vs Java performance – dynaTrace Analysis – Test 1 Fibonacci Sequence

Post author By mark
Post date July 11, 2012
No Comments on Scala vs Java performance – dynaTrace Analysis – Test 1 Fibonacci Sequence

Trying to make some comparisons between Scala and Java. Using dynaTrace I can get very useful and granular insight into the behaviours of each. To keep things as comparable as possible the tests will be on simple, short algorithms. Large iterations will be conducted so garbage collection and space complexity can be analysed.

Thanks to Julio Faerman for his post and the implementations I have used from that post – see: http://www.infoq.com/articles/scala-java-myths-facts

With dynaTrace I can get information about the JVM behaviour but also about the behaviour of specific classes and methods. This will not only allow me to make some comparison between Java and Scala implementations but also understand more about what happens when Scala is compiled to byte code.

To start with, nice and simple test – Fibonacci sequence

Three implementations will be tested:

Java Fibonacci – see code

Scala Simple Fibonacci – see code

Scala Advanced Fibonacci – see code

I am running the tests from a controller class with a main method. From there I can instantiate the classes above. Read more about dynaTrace to see how it uses sensors in the JVM to grab a great deal of useful data about the JVM and about classes and methods.

Just to get things rolling, checked out how running the Fib sequence to the 10 million number in the sequence would affect the JVM. This tests would have taken a long time to complete so after I saw stable behaviour (ie: two sequences) I kill the JVM and moved on to the next sequence.

The charts below have been modified for human readability. For example, Garbage Collection in the charts was factored up. The charts are also stacked for readability. Within the dynaTrace client this is much easier to read with mouse over pop-ups.

What can be taken from this fairly simple example is that code may work and appear to be performing well. In reality it may be wreaking havoc on the JVM. Next post will look at the response times of these implementations.

Test1Loop.All3_memory_v1 — JVM Heap usage and Garbage Collection utilization

Test1Loop.NoAdv_memory_v1 — Removing the Scala Adv Sequence allows for better comparison of Java and Scala Simple

Test1Loop.All3_GC_v1 — Garbage Collection utilization and Garbage collection caused suspensions (time)

Test1Loop.NoAdv_GC_v1 — GC utilization, CPU Utilization and Suspension time cause by GC – without Scala Advanced

Test details:

Environment:

Virtual Machine: Ubuntu 12.04 LTS 32 bit - 3072MB, 2 Processors

Physical Host: Windows 7 64-bit - 8GB, i7-2620M @ 2.70 GHz

Versions:

java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.1) (6b24-1.11.1-4ubuntu3)
OpenJDK Server VM (build 20.0-b12, mixed mode)

Scala compiler version 2.9.2

Run script:

#!/bin/bash
scala_path=/usr/share/java/scala-library.jar
java -Xmx2048m -cp $scala_path:. -agentpath:/home/mark/Desktop/performance_testing/dynatrace/agent/lib/libdtagent.so=name=TestJVM_scala_vs_java,server=192.168.242.60:9998 PerformanceController