Tuning Ansible
Ansible Configuration Management
Special Thanks: This article was made possible by support from Linux Professional Institute
Among configuration management tools, the most well known are Puppet, Chef, and Ansible, the latter of which was acquired by Linux powerhouse Red Hat at the end of 2015. Years ago I worked with Puppet on a large server estate and embraced its ability to automate away much of the daily sweat and grind associated with running hundreds of servers. At times, however (and this applies to all configuration management software to one degree or another), hitting the big, red execute button induced a level of stress because the configuration did not always behave as expected. With lots of testing and a healthy amount of experience, however, it is possible to build up enough trust to run these tools on your production estates.
My tool of choice these days is without question Ansible. Be warned, though, that it is a contentious topic, mainly because engineers (never mind businesses) have invested heavily in one particular tool and tend to extol its virtues vociferously over others. The company behind the mighty Puppet, for example, has been around since 2005, and that is enough time to invest in a massive number of manifests (Puppet scripts) that might be so tightly integrated with your production environment that the mere thought of extricating them is painful. In this article, I am going to explore getting the best of Ansible, in terms of performance and maintaining your sanity. To achieve this, I will be using a couple of simple plugins that are bundled with Ansible; if you then plan on using Ansible for idempotency (more later on), you will see how to tune SSH connections.
You Say Potahto
I turn to Ansible nowadays for two reasons, even though I am fully aware that Ansible might not necessarily have all the functionality or sophistication of Puppet or Chef. First, it is simple: The playbooks sometimes leave me scratching my head for a little while, but after consulting my favorite online search tool, I eventually find the answer. Second, I enjoy how lightweight it is at both the server and the client levels. If you are not aware, Ansible is a few relatively simple Python packages (Table1) on the server side, and by default, it simply uses an SSH daemon that already exists on my destination client machine(s).
Table 1 : The Server Side of Ansible on Ubuntu 16.04
ansible |
python-markupsafe |
ieee-data |
python-minimal |
libpython-stdlib |
python-netaddr |
libpython2.7-minimal |
python-paramiko |
libpython2.7-stdlib |
python-pkg-resources |
python |
python-selinux |
python-crypto |
python-six |
python-ecdsa |
python-yaml |
python-httplib2 |
python2.7 |
python-jinja2 |
python2.7-minimal |
My main gripe with constantlyevolving software these days is backward compatibility. Having chosen, invested in, and sungthepraisesof one of the configuration management tools, I quite rightly expect it to behave in a sane way:With a major version bump, the last thing I want is downtime on my server estate. Sadly, this still happens if you are not using these tools on a regular basis. If you miss an announcement, suddenly some new syntax in a “play” in your playbook could break everything horribly.
Ansible is not exempt from this behavior, but the online search I just mentioned usually at least provides a number of alternative ways to achieveyour goal. If the syntax trips you up (or the YAML formatting causes eyestrain), chances are someone else has written about it in one context or another.
No Callback Jokes Please
A lesserknown fact is that Ansible provides a simple plugin type referred to as a callback p lugin . According to the online Ansible docs:
Callback plugins enable adding new behaviors to Ansible when responding to events. By default, callback plugins control most of the output you see when running the command line programs, but can also be used to add additional output, integrate with other tools and marshall the events to a storage backend.
I am sure you will be glad to know that the enabling callback plugins are extremely simple, and to disable them, you can just comment the config line out and run your playbook without any other changes. No reboots or daemon restarts are needed in this case, which is refreshing. Although the functionality I am going to look at might not appear to be the most cuttingedge, trust me when I say that I have found it to be a lifesaver when debugging lengthy playbooks.
If you decide to look further into the Ansible docs about callback plugins, be aware that only one plugin can “control” the output that is visible while a playbook is being run. You can change what is visible and where the information goes by altering STDOUT withanother config if you need to. See the docs page link mentioned a few moments ago for more detail.
It Is All About Timing
My favorite callback plugin was discovered when I was first learning Ansible and dealing with, I will just say, some challenges in the environment in which I was working.
One frequently occurring issue, for example, was DNS servers timing out (in some cases,you could see a two-second timeout before an unresponsive Primary Name Server trips over to a Secondary Name Server), and I wasnot finding anything particularly helpful when trying to come to grips with this new tool and its syntax.
Additionally, sometimes network firewall rules would change, unbeknownst to me, and certain SSH sessions would simply stop working, only to start working again miraculously a few minutes later.Another headache I had was slow disks, sothat fluctuating disk I/O (input/output) performance made it difficult to be certain that my infrastructure-as-code wasnot the cause of slowdown problems.
As you might imagine, none of the intermittent, recurring problems made for an ideal testing environment.
After searching online with my favorite hunter-gatherer tool, I resorted to making a very helpful but simple change in the main Ansible config file /etc/ansible/ansible.cfg . The line I changed (I am using Ansible version 2.0.0.2, so your defaults might be slightly different) went from
#callback_whitelist = timer, mail
to
callback_whitelist = timer, mail, profile_tasks
which involved uncommenting the line, making sure timer existed as a callback plugin, and loading the profile_tasks plugin. When I executed my playbooks, then, I saw a highly useful and colorful output relating to each task in a readable, human-friendly summary (Figure 1). I leave the mail plugin in place in case I want to receive email about errors; you may remove it if you want. You can find more information about the mail pluginin the Ansible docs.
As demonstrated in Figure 1, the callback plugin output fortimer and profile_tasks has some unusual facets. If you cannot remember what the output usually looks like, simply add the comment back into the config file and rerun your playbook for comparison. I used this command to run my playbook with -vv for added verbosity. Remove one or both of these flags to display extra detail.
For clarity, having adjusted the config file, the standard command is:
$ ansible-playbook -vv site.yaml
First, note that using long name lines in your playbooks makes this output look unnecessarily illegible. That said, sometimes you of course need to have descriptive entries; otherwise, the next poor soul who tries to fathom what the code is doing will be lost. I try and keep these fields trimmed so I can at least tell the difference between one line and the next in the output. For example, if I am copying one file with a certain name over a second file with a similar name, I will adjust the name description to display the filenames at the start of the description, instead of putting them at the end where they disappear off the end of the text line, along the lines of:
name: Copying djbdns_file1 to /etc/tinydns
Second,note the output at the end of each line in Figure 1that shows how long each task took to complete (thanks to the profile_tasks plugin). If you scroll back up your console history, you will see more detail for each task, especially if you preserve the -vv option. When I was troubleshooting the aforementioned DNS issues, I found that checking if and when each task was hanging was incredibly easy. I noticed that I also got so used to the approximate runtimes of each task, that even relatively significant variations were no longer a problem to discern. Be warned, however, that almost no task will always complete with the same runtime, and you should expect variations, because computers and networks are rarely perfectly consistent (e.g., because of load).
Armed with the two pieces of information above, the helpful and sophisticated timer callback plugin also reports the total playbook completion time which is shown at the bottom of the output in the Playbook run took …line. Combining this timing datareally helps you come to grips with both infrastructure and coding issues.
Hashtag Fail
I will not go into how “changes” (Figure 2) are possible even if you logically might not expect them when running your playbooks. However, the color scheme of thechanged and failed output become recognizable if you have run a playbook a few times (and quickly if you have written the playbook yourself). Again, familiarity lets you know whether something is horribly broken somewhere.
If you want to see more than a summary of tasks displayed at the end of the output, then either snip your playbook shorter or do some online hunter-gatherer activity. I still need to explore further whether adding a larger summary is possible. If you are interested, you can find a little more information,but not a huge amount of detail,in the timer plugin docs.
Fork That
Ansible repeatedly runs your playbooks over your servers (every 20 minutes or so), meaning their config is exactly as you expect it to be, or idempotent, which provides greater security and predictability. Anunquestionably useful tuning tip for idempotency on your serversrelates to theforks parameter, which is concerned with how Ansible copes with speaking to many remote systems at once. A great blog post notes:
Ansible works by spinning off forks of itself and talking to many remote systems independently. The forks parameter controls how many hosts are configured by Ansible in parallel. By default, the forks parameter in Ansible is a very conservative 5. This means that only 5 hosts will be configured at the same time, and it’s expected that every user will change this parameter to something more suitable for their environment. A good value might be 25 or even 100.
The default value shown at the top of the main config file, as mentioned above, is:
#forks = 5
As per the recommendation, make sure you increase it as you see fit to meet your parallelism requirements.
Pipe Cleaners
Another handy hint applies to pipelining ,which is also mentioned in the blog post mentioned earlier. This setting needs to be tweaked for concurrency (with OpenSSH daemons, apparently even double the speed of execution!)and when you are working on a server estate that uses long hostnames (e.g., like most corporate estates or even AWS estates with long dynamic DNS names). Figure 3 shows the comments in the main config file. Again, tune as needed.
For reference, the official Ansible docs report:
If running with OpenSSH, the “ pipelining ” setting will further double the speed of operations by optimizing the way Ansible modules are transferred. This is not enabled by default because it can’t run absolutely everywhere (different tty policies with sudo, etc), but almost everywhere can, so you should definitely try it out.
The End
As with all things mentioned so far,you should definitely test these Ansible tweaks first on a sandbox host or, ideally, in a lab. You have been warned!
One of the first changes I make to a fresh Ansible installation these days is enabling the timer and profile_task callback modules. They are perfect for being able to say, “that task was slow, but not to worry because it had to perform a DNS lookup on flaky DNS resolvers” or “that machine’s transatlantic connectivity has always been dubious,” and not question whether the syntax you have used in a playbook is the cause of a large or small performance issue.
Along with the idempotency tweaks, these super-simple tuning tips should help you keep an eye on your playbooks while under development, as well as when they are in use in the wild.