Protecting the production environment
Methuselah
The Configuration Cycle
The controlling centralized server is called the Puppet master. On each of the servers to be configured (the nodes), you then install a Puppet agent, which runs as a daemon and connects to the master every 30 minutes (configurable). This TLS-encrypted connection is authenticated by a certificate on both sides. The master usually also assumes the certification authority (CA) role for certificate management.
After the connection is established, the agent transmits the facts it has determined to the server (Figure 2), which are then available there as variables in a scope superordinate to the classes, the top scope. Facts contain important information about the type and version of the operating system (e.g., Listing 4, line 17), the hardware, mounted filesystems, or the content of custom facts.
The server uses the facts to determine the manifest for the querying node. This information can reside on the server as:
node "certname.node.one" { include role::cms } node "certname.node.two" { include role::webserver }
In most cases, though, it is obtained over an interface from an external node classifier (ENC) such as Foreman or a configuration management database (CMDB). Puppet compiles the manifest of classes, variables, templates, and resources determined in this way on the server to create a catalog. It does not contain additional classes, templates, or variables and is sent back to the agent.
The agent receives the desired state for its node in the form of the catalog, uses the RAL to check the current state of resources contained in the catalog, and if necessary, converts them to the desired state. Finally, the agent sends the log of this Puppet run as a report to the master, which passes it on with a corresponding handler. Possible destinations for a report include logfiles, PuppetDB, or Foreman.
Maintaining Your Code
Your code, which will usually be a mixture of your own modules and many upstream modules from Puppet Forge, is best maintained in a Git repository. The default is to run a control repository that stores a list of modules (in the Puppetfile
file) along with the module versions to use. These modules can come from the Forge or from other Git repositories.
Self-written or patched upstream modules are usually maintained in the same Git as a separate repository with their own versions. If you don't use an ENC, the control repository in manifests/site.pp
contains the declaration of the individual nodes, as defined above.
In a branch of the control repository, you can store other module versions, such as newer, untested versions. When you push them into one of the branches, the r10k
software [7] (in the puppet-bolt
package) assembles a separate Puppet environment on the server from each branch and the information about modules there. The ENC controls which environment the agent requests. In this way, any number of different test scenarios can be set up and tested extensively before being transferred to production. A push into a Git repository belonging to the module also triggers an r10k
call, which then only updates the version of the module – but in all environments. Admins typically refer to these as tagged versions, branches, or commits.
The integration of r10k with a Git and the necessary hooks are not part of the open source variant of Puppet but can be easily replicated by experienced admins.
Hiera: Separating Code and Data
Hiera also is usually maintained directly in the control repo. This hierarchical key-value store queries its dataset according to search parameters. The idea is to determine different values for $package_name
, $config_file
, and $service_name
on different platforms (e.g., for the code in Listing 2), depending on what the agent submitted and using facts as the operating system.
To do this, first store a hierarchy in the hiera.yaml
file in the control repository (Listing 5). In the simplest case, the data is also in YAML format in the control repository – in ./data/
. Puppet's automatic parameter lookup feature automatically performs a Hiera lookup for values for its parameters each time a class is declared. The names of the keys in Hiera must match the namespace of the respective class (Listing 6).
Listing 5
Defining a Hiera Hierarchy
version: 5 defaults : datadir : data data_hash: yaml_data hierarchy: - name: 'Node specific' path: 'nodes/%{trusted.certname}.yaml' - name: 'Operating System Family' path: '%{facts.os.family}.yaml' - name: 'common' path: 'common.yaml'
Listing 6
Hiera Files for Red Hat and Debian
$ cat ./data/RedHat.yaml apache::package_name: httpd apache::config_file: /etc/httpd/httpd.conf apache::service_name: httpd $ ./data/Debian.yaml apache::package_name: apache2 apache::config_file: /etc/apache2/apache2.conf apache::service_name: apache2
Because a value assignment can also be a declaration or a set default, the question arises as to the order of evaluation. The default is considered the weakest link in the chain only if neither an explicit declaration was made nor a Hiera lookup returns a value. The assignment in a class
declaration is strongest, leaving the place in the middle for the automatic parameter lookup.
The lookup parameter allows a class to be declared with include
or contain
, as long as the lookup provides a value for the parameters (e.g., with include apache
). Although it can override defaults, it does not necessarily have to. Hiera can also be run within a module, but it only stores key-values for the namespace of the module itself. All values there can be overwritten again in the Hiera environment of the control repository if necessary.
Buy this article as PDF
(incl. VAT)