Tracking down problems with Jaeger
Hunter
Administrators facing a container-based setup with distributed applications for the first time in their career might hark back to the past and secretly think the old ways were better. From their perspective, at least, you might understand how this misconception comes about. People who used to be responsible for troubleshooting had a few fairly obvious starting points. Large monolithic programs such as MySQL simply output error messages. A look at the logfile was therefore often all it took to get at least a hint of where to look.
If nothing useful could be found in the logfile, you still had the level below it as the starting point. For example, if communication between server and client did not work as described in the documentation, many an admin would turn to tools such as Tcpdump (Figure 1), which lets you read data traffic down to the lowest levels of a network connection for subsequent visualization with Wireshark to check for potential issues. Also, the client could see potential errors and output appropriate messages on the terminal, if need be.
Admins and developers can only dream of such simple debugging mechanisms in more modern applications. If you have ever experienced the frustration of tracking down problems in a distributed application, you will be fully aware of the complexity of this task. Realistically, the job can only be done if you are tackling a reasonably simple component with just a few microapplications.
Application developers are therefore strongly advised to take a closer look at the Jaeger implementation of the Open Telemetry standard.
Modern Applications
As a reminder, the cloud-native architecture does have some advantages, such as implicit redundancy and the option to integrate dynamically external solutions such as Istio. On the downside, though, the complexity of the individual application has grown exponentially. A direct comparison of the old and new worlds quickly illustrates this, and a database, as mentioned earlier, is an ideal candidate.
Clients establish persistent connections to a database, establishing a connection once and then using it continuously until either one side officially terminates it or an error of some kind kills off all communication. In all of these cases, the server and client immediately notice that the other side can no longer be reached and acknowledge this with a clear-cut message.
Cloud-native distributed applications are totally thrown by this scenario; even the idea of the connection is alien to them. Cloud-native applications are built as microcomponents instead of large monoliths. In a cloud-ready environment, no single application handles all tasks. Instead, a number of small and highly specific applications designed for a single task is active.
Several approaches compete for the role as the gold standard for communication between these components. RESTful APIs that are based on HTTP(S) are widely used today. Solutions such as a high-performance Remote Procedure Call (e.g., gRPC) framework also play a role. What they have in common is that they do not rely exclusively on stateful connections, like the database example.
As the number of microcomponents in an application increases, the number of potential communication interfaces increases exponentially. Recent cloud-native applications in particular are anything but frugal in terms of the number of microapplications they contain.
Many Apps
One microapplication then serves as a point of contact (for example, for communication with clients). A second microapplication in the background receives input forwarded from the first, evaluates it, and sends it to a third microapplication, which then stores the data somewhere on a disk.
A fourth application could monitor the content of the stored data and sound the alarm if certain content appears or certain events occur during a write. Microapplication five could be used to deliver the alerts generated by the fourth component in the form of text messages by email, SMS, or a messenger service.
What this relatively simple example already shows is that data can travel long distances on the wide network of a microservices architecture, undergoing regular transformations in a variety of ways and ending up with various recipients as fragments.
Complex Network
The end of the line is still far off in terms of complexity. In the past, ADMIN regularly featured solutions such as Sidecar and Istio that do their best to expand on this chaos.
Istio, for example, comes with a Sidecar component that dynamically engages in the communication of microarchitecture components. While doing so, it handles a whole gamut of tasks: Istio can implement firewall rules, add SSL encryption to the communication endpoints on the fly, and distribute the incoming load for each of the application's individual components among the available instances (Figure 2).
From the client's point of view, it remains unclear with which instance of an application's component it is currently communicating. Because most components in distributed applications also exchange data among one another, the majority of communication paths remain completely in the dark for the admin and developers of an application.
Buy this article as PDF
(incl. VAT)