Started of looking at the basic conflict between ITOps and Dev. In essence the risk adversity of ITOps, why its there and why its good and bad. First hand experience of this with being woken up many nights after a ‘great’ new release makes this quite pertinent.
ITOps needs to run systems that are tested and tightly controlled – This means that when a release is coming that requires new or significantly changed components ITOps needs to be included in discussing and aware so they can ensure stability in Production
Dev needs to adopts, trial and use new technologies to create software solutions that meet user and business requirements in a effective manner
Performance testing needs to be conducted throughout the development iterations and this is impossible if the development environments are significantly different to production
How can performance tests be conducted throughout the development of new releases, particularly if these release become more regular?
Proposed answer 1 – is a ‘Golden Image’ – A set image that is used for developing, testing and operating services. This includes Apps, Libs and OS. Docker makes this more practical with containers.
Proposed answer 2 – is to apply configuration management to all machines (not sure how practical this could be).
Installed Virtualbox, Vagrant, git, ssh, Packer.
Vagrant configures VMs, Packer enabled building of ‘golden images’.
Packer template Variables:
Builders take source image.
Provisions install and configure software within running machine (shell, chef and puppet scripts).
Review of ‘AWS re:Invent 2015 | (ARC309) Microservices: Evolving Architecture Patterns in the Cloud’ presentation.
From monolithic to microservices, on AWS.
Ruby on rails -> Java based functional services.
Java based functional services resulted in requirement for everything to be loaded into memory – 20 minutes to start services. Still very large services – with many engineers working on it. This means that commits to those repos take a long time to QA. So a commit is made then start working on something else then a week or two later have to fix what you hardly remember. Then to get specific information out of those big java apps it was required to parse the entire homepage. So…
Big sql database is still there – This will mean changes with schema change are difficult to do without outages. Included in this stage of micro services was:
Team autonomy – give teams a problem and they can build whatever services they need
Voluntary adoption – tools/techniques/processes
Goal driven initiatives
Failing fast and openly
What are the services? – Email, shipping cost, recommendations, admin, search
Anatomy of a service:
A service has its own datastore and it completely owns its own datastore. This is where dependency on one big schema is avoided. Services at gilt.com average 2000 lines of code and 32 source files.
Service discovery – enormously simple? – Discovery is a client needs to get to a services, how is it going to get there. ‘It has the name of the service – look up that URL’.
Use ZooKeeper as a highly available store.
Moving all this to the cloud via ‘lift and shift’ with AWS Direct Connect. In AWS all services where their own ec2 instance and dockerized.
Being a good citizen in a microservices organisation:
Design for failure
Expect to be throttled
Retry with exponential backoff
Cache when appropriate
Publish your metrics
Throughput, error rate, latency
Protect yourself (throttling)
Implementation details private
Maintain backwards compatability
See Amazon API gateway
Again service discovery – simple with naming conventions, DNS and load balancers. Avoid DNS issues with dynamic service registry (ZooKeeper, Eureka, Consul, SmartStack).
Data management – moving away from the schema change problems. And the other problems (custom stored procedures, being stuck with one supplier, single point of failure). Microservices must include decentralisation of datastores, services own their datastores and do not share them. This has a number of benefit from being able to chose whatever data store technology best meets the service needs, make changes without affecting other services, scale the data stores independently. So how do we ensure transactional integrity? Distributed locking sounds horrible. Do all services really need strict transactional integrity? Use queues to retry later.
Aggregation of data – I need to do analytics on my data? AWS Kenesis firehose, Amazon SQS, custom feed.
Continuous Delivery/Deployment – Introduced some AWS services Code Deploy — or just use Jenkins.
How many services per container/instance – Problems with monitoring granularity, scaling is less granular, ownership is less atomic, continuos deployment is tougher with immutable containers.
I/O Explosion – Mapping dependencies between services is tough. Some will be popular/hotspots. Service consumers need to cache were they can. Dependency injection is also an option – You can only make a request from services A if you have the required data from service B and C in your request.
Monitoring – Logs are key, also tracing request through fanned dependency can be much easier with a requirement for a header that is passed on.
Unique failures – Watched a day in the life of a Netflix engineers… good points on failures. We accept and increased failure rate to maintain velocity. We just want to ensure failures are unique. For this to happen we need to have open and safe feedback.
With all of the emerging technology solutions and paradigms emerging in the IT space, it can be difficult to get a full understanding of everything; particularly before developing biases. So… from the perspective of an infosec and ops guy I will list out some notes on my own review of current direction of devops. This review is based primarily on Udacity – Intro to DevOps and assorted blogs.
Why do it?
Reduce wastage in software development and operation workflows. Simply, more value, less pain.
What is is?
Most of the definitions out there boil down to communication and collaboration between Developers, QA and IT Ops throughout all stages of the development lifecycle.
No more passing the release from Dev to IT Ops
No more clear boundaries between Dev and IT Ops people/environments/processes and tools
No more inconsistency between Dev and Prod environments
Before setting out, getting some basic concepts about snort is important.
This deployment with be in Network Intrusion Detection System (NIDS) mode – which performs detection and analysis on traffic. See other options and nice and concise introduction: http://manual.snort.org/node3.html.
Again drawing from the snort manual some basic understanding of snort alerts can be found:
116 – Generator ID, tells us what component of snort generated the alert
Eliminating false positives
After running pulled pork and using the default snort.conf there will likely be a lot of false positives. Most of these will come from the preprocessor rules. To eliminate false positives there are a few options, to retain maintainability of the rulesets and the ability to use pulled pork, do not edit rule files directly. I use the following steps:
Create an alternate startup configuration for snort and barnyard2 without -D (daemon) and barnyard2 config that only writes to stdout, not the database. – Now we can stop and start snort and barnyard2 quickly to test our rule changes.
Open up the relevant documentation, especially for preprocessor tuning – see the ‘doc’ directory in the snort source.
Have some scripts/traffic replays ready with traffic/attacks you need to be alerting on
Iterate through reading the doc, making changes to snort.conf(for preprocessor config), adding exceptions/suppressions to snort’s threshold.conf or PulledPork’s disablesid, dropsid, enablesid, modifysid confs for pulled pork and running the IDS to check for false positives.
If there are multiple operating systems in your environment, for best results define ipvars to isolate the different OSs. This will ensure you can eliminate false positives whilst maintaining a tight alerting policy.
From doc: HttpInspect is a generic HTTP decoder for user applications. Given a data buffer, HttpInspect will decode the buffer, find HTTP fields, and normalize the fields. HttpInspect works on both client requests and server responses.
Global config –
Writing custom rules using snorts lightweight rules description language enables snort to be used for tasks beyond intrusion detection. This example will look at writing a rule to detect Internet Explorer 6 user agents connecting to port 443.
Rule Headers -> [Rule Actions, Protocols, IP Addresses and ports, Direction Operator,
general – informational only — msg:, reference:, gid:, sid:, rev:, classtype:, priority:, metadata:
payload – look for data inside the packet —
content: set rules that search for specific content in the packet payload and trigger a response based on that data (Boyer-Moore pattern match). If there is a match anywhere within the packets payload the remainder of the rule option tests are performed (case sensitive). Can contain mixed text and binary data. Binary data is represented as hexdecimal with pipe separators — (content:”|5c 00|P|00|I|00|P|00|E|00 5c|”;). Multiple content rules can be specified in one rule to reduce false positives. Content has a number of modifiers: [nocase, rawbytes, depth, offset, distance, within, http_client_body, http_cookie, http_raw_cookie, http_header, http_raw_header, http_method, http_uri, http_raw_uri, http_stat_code, http_stat_msg, fast_pattern.
non-payload – look for non-payload data
post-detection – rule specific triggers that are enacted after a rule has been matched
So here’s a simple script that will pull the cert chain from a [domain] [port] and let you know if it is invalid – note there will likely be come bugs from characters being encoded / return carriages missing:
# chain_collector.sh [domain] [port]
# output to stdout
# assumes you have a directory with desired trust anchors at ~/trustanchors
server random: 1b:97:2e:f3:58:70:d1:70:d1:de:d9:b6:c3:30:94:e0:10:1a:48:1c:cc:d7:4d:a4:b5:f3:f8:78 = 1988109383203082608
Interestingly the negotiation with youtube.com and chromium browser resulted in Elliptic Curve Cryptography (ECC) Cipher Suitesfor Transport Layer Security (TLS) as the chosen cipher suite.
Note that there is no step mention here for the client to verify then certificate. In the past most browsers would query a certificate revocation list (CRL), though browsers such as chrome now maintain either ignore CRL functionality or use certificate pinning.
Chrome will instead rely on its automatic update mechanism to maintain a list of certificates that have been revoked for security reasons. Langley called on certificate authorities to provide a list of revoked certificates that Google bots can automatically fetch. The time frame for the Chrome changes to go into effect are “on the order of months,” a Google spokesman said. – source: http://arstechnica.com/business/2012/02/google-strips-chrome-of-ssl-revocation-checking/
Issue caused by having iptables rule/s that track connection state. If the number of connections being tracked exceeds the default nf_conntrack table size  then any additional connections will be dropped. Most likely to occur on machines used for NAT and scanning/discovery tools (such as Nessus and Nmap).
Symptoms: Once the connection table is full any additional connection attempts will be blackholed.
This issue can be detected using:
nf_conntrack:table full,dropping packet.
nf_conntrack:table full,dropping packet.
nf_conntrack:table full,dropping packet.
nf_conntrack:table full,dropping packet.
Current conntrack settings can be displayed using:
To check the current number of connections being tracked by conntrack:
Options for fixing the issue are:
Stop using stateful connection rules in iptables (probably not an option in most cases)
Increase the size of the connection tracking table (also requires increasing the conntrack hash table)
Decreasing timeout values, reducing how long connection attempts are stored (this is particularly relevant for Nessus scanning machines that can be configured to attempt many simultaneous port scans across an IP range)
Making the changes in a persistent fashion RHEL 6 examples:
<ejbca_home>/bin/ejbca.sh ca importca<caname>existingCA1.p12
Step 3 – Verify import
<ejbca_home>/bin/ejbca.sh ra adduser
### IMPORTANT ###
Distinguished name order of openssl may be opposite of ejbca default configuration – http://www.csita.unige.it/software/free/ejbca/ … If so, this ordering must changed in ejbca configuration prior to deploying (can’t be set on a per CA basis)
Have not been able to replicate this issue in testing.
Import existing TinyCA CA
Basic Admin and User operations
Create and end entity profile for server/client entities
Step 2 – Sign CSR using the End Entity which is associated with a CA
Importing existing certificates
EJBCA can create endentities and import their existing certificate one-by-one or in bulk (http://www.ejbca.org/docs/adminguide.html#Importing Certificates). Bulk inserts import all certificates under a single user which may not be desirable. Below is a script to import all certs in a directory one by one under a new endentity which will take the name of the certificate CN.