Application Observability and What it Means for Software Delivery
There was a relatively recent announcement that occurred back in May of 2019 introducing a new Open Source project called OpenTelemetry. This Cloud Native Computing Foundation (CNCF) project would merge two existing open-source projects: the CNCF’s OpenTracing and Google’s OpenCensus. The two projects carry a lot of influence in the space of Application Performance Monitoring (APM) solutions, and so many eyes are turned towards OpenTelemetry as it claims its stake.
APM solutions are solving the challenges of observability for modern enterprise applications. As enterprises lean into DevOps, CI/CD, and cloud technologies, it becomes crucial that stakeholders understand their services. OpenTelemetry, “a wider and deeper project,” is what The New Stack calls it. But what is this OpenTelemetry project, why the buzz, and how does it affect enterprise software delivery?
To answer these questions, I’ll take three steps back and discuss APMs, OpenTracing, and OpenCensus.
The goal of an Application Performance Management or APM is to collect state data on live applications to understand and manage its performance. APMs are a solution to the challenge of observability. They offer user-interfaces and additional features for understanding your application.
What is Observability?
Observability pertains to Metrics, Logs, and Tracing. The figure below has the main takeaways for each of the concepts related to observability.
Metrics are aggregatable. Gathering benchmarks can be useful for improving processes and application performance. You do not have to constrain yourself to only collecting runtime metrics either. Some tools expose and collect an application’s platform or infrastructure data.
Logs are events, reasonably straightforward, but ensure you have standards around logging format.
Tracing is observability through the point of view of a request, effectively tracing takes on the perspective of a client or user of your service. Here is a short snippet regarding Tracing from another blog post I wrote:
“It is important in distributed landscapes that we can still observe requests being handled by the application. Consider an e-commerce application, for example. A single checkout request may be passed to tens or hundreds of services before the application is finished handling that process; whether in development or production environments, developer and support teams need tools to understand and debug issues that may arise within their services.”Red Hat Developers Blog Post on: Building and understanding reactive microservices using Eclipse Vert.x and distributed tracing
Metrics, Logs, and Tracing are the three core principles of application observability. I highly recommend understanding these components if you plan to invest in an APM solution.
What are APM solutions?
APM solutions solve the challenges of observability through metrics, logs, and tracing. APMs offer control over the data observed of a particular service or function within your application’s ecosystem. This granularity of scope is particularly useful for determining bottlenecks when serving SLAs. Another benefit of using APMs is increased visibility. It enables quicker troubleshooting of applications should any issues occur. For example, if a service request fails within the call chain of services, it’d be quite useful to have a dashboard that includes the services called and context propagation that lead to the failure.
Similarly, having a pulse on the internals of my application reduces the time for performance optimizations for issues such as memory leaks. The goal is to acquire full context into which service failed and what caused the error. APM solutions enable developer teams to go into their tools, reports, or dashboards to find these issues.
These are some use cases for APM solutions. Observability and APM solutions solve difficult challenges around distributed or polyglot environments.
How do APMs relate to OpenTracing and Open Source?
OpenTracing provides a vendor-neutral standard for tracing applications in a distributed architecture. Specifically, it was an API specification for distributed tracing. Standardization became useful as more distributed tracing solutions and tools are leveraged, and teams need to communicate with the same terminology for tracing. Within the same organization, you could have a wide variety of applications written in different frameworks or languages that need tracing implemented. It also helped to have guidance on design and a standard from the community that even involved APM vendors.
If you are interested in learning more about distributed tracing, check out these OpenTracing Guides.
APM vendors used the specification built by OpenTracing and implemented their solutions for collecting metrics on applications. Tools that support the OpenTracing API specification for tracing include, CNCF Jaeger along with the vendor APM solutions.
What about OpenCensus?
The OpenCensus project by Google was another solution for observability. It provides APIs for metrics collection and tracing. Support includes backends that would implement the APIs, supporting languages like Java, Go, Node.js, and Python. And there is also support for HTTP, gRPC, and frameworks built on top of a variety of programming languages. Should you have additional APM solutions: third party solutions like Azure, DataDog, Jaeger, and Prometheus are all examples of maintained exporters and backends for exporting with OpenCensus.
Google Cloud shared a technical presentation of OpenCensus on Youtube here.
API specifications are like contracts or blueprints for software. You’ll need to build out what’s on the paper to get the desired functionality. OpenCensus is a well-supported implementation that provides a comprehensive look into applications.
The buzz for OpenTelemetry
Across competing vendors, it’s always great to see a commitment to the problem. Support for the OpenTelemetry project is just that. One of the promises of OpenTelemetry is a “single standard for observability instead of two competing standards.” This project means standardizing a higher bar for the observability of your applications. Across the observability space, you won’t be locked into a single platform or solution to gather and use telemetry data.
It’ll be also interesting to see what the next few years hold for APM vendors (like AppDynamics, NewRelic, and DataDog), especially as we move to automate and enable software excellence. In an industry report, Gartner predicted that across enterprise organizations the number of applications monitored with APM tools would increase from 5% in 2018 to 20% by 2021. Vendors have also made firm commitments to OpenTelemetry, paving this path for standardization.
It’s fair to expect the solution space will also grow to expand into more parts of the software delivery life cycle (SDLC) as well.
Now that we understand more about the concepts behind OpenTelemetry and observability, we can use these concepts to reduce the risk associated with delivering software. Continuous delivery relies on automated operational and monitoring tasks.
According to a Riverbed-sponsored study by Enterprise Management Associates (EMA), 63% of the organizations surveyed had less than 50% continuous deployment processes automated. Only 6% of organizations reported 90% -100% percent automation for their operations. For those deploying more software and more often, post-deployment activities involving operationalizing and monitoring applications do not scale. This includes the process of having one or two engineers monitor dashboards for any regressions in code performance and quality post-deployment.
Reducing risk within the SDLC process is what software delivery is all about. This means optimizing the stages: plan, develop, build, test, release, deploy, operate, and monitor. For organizations adopting DevOps practices, the cycle for software delivery is the figure below:
What can scale your organization is automating post-deployment activities. This also the purpose of continuous verification. Continuous verification is the process of using intelligent data to verify deployments. Use continuous verification to trigger a rollback based on rules and regressions you define.
If you’re looking for a platform that provides this Harness’s out of the box connectors allow you to easily use your APM solutions so that should any regressions occur, a rollback of your deployment is triggered. What Harness does is provide a way of doing release testing and rollbacks so that your business can scale. If you are interested, try Harness for free.