Ansible and Inspec

Configuration management and validation with the push model.

Bryan
4 min readMay 22, 2020

One thing I love about the Devops world is the general availability of such a wide range of tooling. This article isn’t for the novice user of either tool, I’ll leave the introductions for either tool to better minds than myself.

This document is more about describing and important workflow in the Infrastructure as Code space. Specifically our “top of the pyramid” testing.

This project was aimed at hitting the E2E slice of the pyramid. We’re aiming at hitting two goals with this project:

  1. Verify that the Ansible run has done everything that we think it should have done.
  2. Provide us with a tool that we can use as a “is this thing running the way it should be?”

The beauty of using Ansible on one side of the process and Inspec on the other is that we’re using two completely different tools to get this done. On one side we have the clumsiness of YAML to describe infrastructure requirements, and on the other we have the elegance and sophistication of an actual DSL written in Ruby.

This means that our tests are actually more sophisticated than the engine that creates the infrastructure. That’s like writing a website in bash, but writing the automated tests to exercise the entire shopping cart tests in golang.

For our use we have a bare metal resource running in a vehicle that needs to connect to our AWS resources. We’re testing out AWS::VPN services to see if this is feasible. So far we’ve found it to be a workable solution, but there are things about how this works that we’re not super excited about.

In order to make the connection we need to roll out 4 files:

  • OpenVPN config ( config.ovpn )
  • Cert
  • Key
  • Systemd file

Let’s take a look at an example of how this works. First let’s pull down our openvpn config from AWS::SecretManager:

- name: AWS VPN config
copy:
mode: '0400'
owner: root
group: root
dest: /etc/openvpn/aws_vpn/config.ovpn
content: "{{ lookup('aws_secret', '/renovo/trp/'+ env_name +'/vpn/config.ovpn' )}}"

This is a config we’re using to connect to an AWS::VPN rig. This allows this node access to our various cloud things living in the VPC we have setup for this ( InfluxDB and others ).

In this case env_name is just a variable we have sitting in a group var and attached to the nodes in this group.

Now let’s look at our test for this file:

control 'trp-hog-ops-04' do
impact 1.0
title 'Check AWS VPN config'
describe file("/etc/openvpn/aws_vpn/config.ovpn") do
it { should exist }
its('mode') { should cmp '0400' }
its('md5sum') { should eq '72bd0718692f4a80af4abc574fe7ca69' }
end
end

Our use case for this would be something like this. Let’s say Bob is working on the VPN connection at the same time that Bobby is working on something else on that same vehicle machine. Bobby comes to me and says that something is wrong with the box, his microservice can’t connect to the InfluxDB instance.

The first thing I do is run inspec…

A problem!

This tells me that the openvpn config has changed. If this wasn’t the case, I would get something like this:

OpenVPN config is not changed

The obvious problem here is, what if someone legitimacy changes the openvpn file? We hope that along with the merge request to change the file, we also see a merge request to update the test, just like we’d expect with any software development process. ( Or at least we hope! ;) )

So now we know that something has changed and I as the DevOps guy can start pinging people on slack. We find out pretty quickly that Bob is working on the VPN tunnel, which explains why Bobby can’t connect to InfluxDB. Problem solved, to some degree, at least we know what’s going on and we can ask Bobby for an ETA.

Monitoring and Alarming

For the most part, the developers have a really amazing way of handling service level alarming very quickly, like near real time as events are happening. I think they use a combination of prometheus and custom “gue” code to get this done.

This tooling is more geared for a longer tail of operations consistency. We want to give ourselves a way of saying “is this a sane environment?” before we even touch the box or start trouble shooting. Also, there’s nothing stopping us from allowing developers and engineers from using this to test their own things.

The dream is to start having developers submit their own Inspec tests to help us further validate the operating environment. And of course, eventually roll this into our CI/CD rigs for automated validation when we get to that point.

Integration with Sensu is another possible approach. We could wire up a job to run a few times during the day to see where we’re at and maybe just have it bark at slack if anything looks wrong.

--

--