« Previous 1 2 3 Next »
OpenStack observability with Sovereign Cloud Stack
Guard Duty
Components and Tools
SCS relies on the Open Source Infrastructure and Service Manager (OSISM) [2] as a tool for deployment and day 2 operations for OpenStack. OSISM itself relies heavily on Kolla Ansible [3] and on OpenStack-Ansible, a collection of Ansible playbooks for deploying OpenStack. One of the main focuses of OSISM is to simplify the operation of OpenStack-based systems and, in particular, upgrades from one OpenStack version to the next. The goal is to be able to install updates on a system at any time.
Kolla Ansible comes with a Prometheus-based [4] monitoring stack out of the box, which coincided very well with the Monitoring SIG favoring an OpenMetrics-based approach from the outset. Initially, the use of traditional monitoring software, such as Zabbix or Icinga for service state monitoring, was considered. However, it became clear relatively quickly in the discussions that these scenarios could just as easily be covered by Prometheus' Blackbox exporter. With a view to reducing complexity, it makes sense to rely on the Blackbox exporter instead of a completely independent software solution. These changes were incorporated into OSISM; Zabbix, which had previously been included, was dropped.
As a first step, additional dashboards were provided for Grafana (Figure 4) and integrated into the Kolla Ansible project. Additionally, various exporters for Prometheus, which are currently not part of Kolla Ansible, should be included.
Alerting
Alerting is an important component in any monitoring setup. It quickly became clear in the Monitoring SIG that every CSP that is not just starting to commission a corresponding environment already has an alerting system in operation. Ideally, the monitoring supplied with SCS would dock onto it.
Therefore, the decision was made to opt for the Prometheus Alert Manager, which is already integrated in Kolla Ansible, and to document best practices [5] for connecting to external alerting systems. The open source Alerta [6] software provides an alternative for aggregating alert occurrences at this point. Initially, the idea of integrating it directly was considered; however, for the time being, Alert Manager was deemed sufficient.
Alert rules are an important part of Prometheus monitoring. To create a good starting point, several rule sets have been adopted from the Awesome alert rules [7] project, and they are now also making their way into Kolla Ansible.
There's Monitoring and Then There's Monitoring
When the talk turns to monitoring, people tend to talk first about simple process monitoring. Does the Foo process exist, and does the Bar service respond on port 42? Often, instead of simply checking whether a service responds on a port, it is a good idea to use test scenarios that carry out a functional check of the service.
For example, in an environment like OpenStack, it's helpful to know whether the Horizon web front end is being delivered correctly to the browser or whether an API should be used to check that VMs can be started. However, checking a network component such as Open Virtual Network (OVN) for correct functionality can become complex.
To monitor OVN efficiently, the SIG is currently working on integrating the OVN exporter [8] upstream to provide various OVN metrics for Prometheus. Figure 5 illustrates where the exporter needs to reside to capture data from the redundant components and detect failures.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)