Enter the Mesh:
In 2019 we saw a lot of buzz for Service Mesh technology as it dominated conference tracks. Popular service mesh technologies include Istio, Linkerd, Maesh, and HashiCorp Consul. Service Meshes provide many features for distributed applications that are well worth considering these include: service discovery, routing, service identity, authorization, service retries, circuit breaking, and observability.
Even if you do not decide to utilize a service mesh today it is still worth understanding the concepts behind mesh solutions. This blog post will cover service mesh fundamentals and offer a range of references and descriptions to expand your knowledge.
The Pillars of Service Mesh:
The goal of service meshes is to provide a smart network for services. There are four pillars that offer the smart network or rather mesh: Connect, Secure, Control and Observe.
Connection offers resilience for applications, providing configurations for traffic between services. This function provides resilience and can enable fault injection, traffic mirroring, A/B Testing, and deployments such as canary and staged rollouts.
Security offers application-independent security. The idea is to take the responsibility of security out of the application and back into the infrastructure. This can include service-to-service encryption, and service-to-service authentication (transport and origin authentication).
Control offers a uniform abstraction for policy control. Configure policies to allow traffic redirection in response to real-time events and rule-based processing based on headers.
Observability offers visibility into application deployments. Visibility features allow for end-to-end monitoring, logging, metrics, and distributed tracing bundled into the service mesh level.
Now that we know the pillars of service mesh let’s discuss how each of these components works in detail.
How does Service Mesh work?
Service mesh provides a proxy for every service in the mesh. The proxy interacts with services to provide end-to-end resilience, security, policy control, and visibility. Istio’s network proxy is an Envoy Proxy instance. Envoy is a layer 7 network proxy. Each proxy is deployed as a sidecar container. Having a Layer 7 or L7 proxy for service-to-service communication provides features like traffic shaping, service discovery and network policy control. For other mesh implementations, the proxy technology could be an L4 network proxy or proxy that is deployed as DaemonSet.
Mesh implementations provide a control plane to configure these proxies. The control plane’s components interact with the data plane to provide the four pillars of a service mesh. This control plane can be thought of as similar to a Kubernetes control plane. Control plane components will manage certificates and keys for authentication, enable additional plugin configurations via adapters. Importantly, it also will expose APIs for Command Line Interface (CLI) use. Istio provides isitoctl as its CLI utility.
There are two components of a mesh: the control plane and the data plane. As shown below, service mesh users have a CLI tool to interact with the service meshes’ Control plane. These service mesh configurations are written in YAML and are applied by said CLI tool, which interacts with the Control Plane’s APIs to modify the data plane.
Author’s Note: You will notice explanations around how it “works with Istio,” or insert other service mesh implementations. We acknowledge that service mesh implementations can differ so this blog post will hopefully provide an insight into service mesh concepts. Service mesh is an emerging technology that will mature with time and adoption, our hope is that the technology continues to provide fundamental features to create the four pillars discussed in this post.
Istio is a popular service mesh implementation, trending the adoption of service mesh due to its feature set and production readiness. This de-facto nature leads to many references and architectures that are biased towards Istio. This blog post will caveat these Istio Service Mesh specific details.
To the point of differing mesh implementations, the Service Mesh Interface (SMI) was released in 2019 for mesh interoperability. SMI is a “specification for service meshes that run on Kubernetes. It defines a common standard that can be implemented by a variety of providers. This allows for both standardization for end-users and innovation by providers of Service Mesh Technology.” as defined by specification here. The SMI allows the reuse of service mesh configurations across solutions.
Now let’s continue with how service meshes provide the four pillars in more detail.
Providing Resilience through Connect:
Service meshes allow for Traffic HTTP/TCP routing and traffic management. TrafficSplit defined by the SMI allows “incrementally direct percentages of traffic between various services. It will be used by clients such as an ingress controller or proxy sidecars to split the outgoing traffic to different destinations.” The traffic splits need three specifications: a root service that clients use to direct traffic to services. Two Kubernetes service resources that potentially have a different selector and type. And weights for the traffic split for each of those Kubernetes services.
SMI policies around retries, timeouts, and rate limits are currently out of the scope of the SMI. Istio proves these additional features. We will discuss some policies that are of importance. Timeouts allow services to terminate on a request after a defined amount of time when calling another service. The Envoy proxy can be configured to wait a predefined amount of time before quitting and returning a 504 status code.
Another feature is application circuit breaking. Circuit breakers act as a wrapper for function calls, should the circuit breaker trip due to some failure it prevents the application from performing a function call that is bound to fail. Istio will take the pod out from the Envoy address pool when tripping the circuit breaker.
Fault injection allows the definition of throughput and latency delays. Linkerd 2.x is a service mesh implementation which also provides fault injection. “Fault injection is a form of chaos engineering where the error rate of a service is artificially increased to see what impact there is on the system as a whole. Traditionally, this would require modifying the service’s code to add a fault injection library that would be doing the actual work. Linkerd can do this without any service code changes, only requiring a little configuration.” Read about fault injection defined by Linkerd here.
Resilient systems are able to cope with failures in downstream systems. Service mesh aims to build resilience through connection configurations.
Providing Security through Control:
Service meshes provide control for policy scoped to three levels: the service mesh level, the namespace level, and the service level. The service mesh level is respectively the broadest scope and allows policies to be applied service mesh scope wide.
Control over configuring routes is dependent on a custom resource called TrafficTarget. There are two parts of a TrafficTarget. A Service Role is a set of rules, like GET or POST, and a Service Role Binding, which binds the roles to Kubernetes service accounts.
The SMI’s specification on Traffic Spec specifies how to configure the mesh to define rules based on the types of traffic that flow through the mesh. Currently, this spec only covers HTTP/1 and HTTP/2 protocols.
Ingress Gateways allow verified traffic from outside a cluster into the service mesh. It ensures end-to-end encryption for incoming traffic. Implementations will differ on the kind of ingress as for now, as SMI does not specify. Some mesh solutions that may not have an opinionated approach to ingress. Solo’s Gloo product is Ingress Controller built on top of Envoy that can be used as an API gateway for instances where you do not get an out of the box implementation. In an Istio architecture, this component is a standalone instance of Envoy.
Encrypted traffic enters the Ingress Proxy as shown. The traffic is authenticated with TLS termination at the proxy. The request is then re-encrypted with the internal service mesh encryption and sent to the targeted traffic within the mesh. This target is a virtual service or another gateway that will route the traffic to its destination within the mesh.
Isitio uses a service mesh control plane component called Citadel for key and certificate management. Citadel will handle the creation and rotation of certificates that are used for encrypted communications between services in the mesh. The service handles this on a service account level based on Kubernetes namespaces managed by the service mesh. Currently, the cert gets mounted as volume as a Kubernetes secret resource for the Envoy proxy to use. In the future, there will be support for Secret Discovery Service or SDS, which is a more secure method for identity provisioning.
Service meshes provide controllable security for application systems.
Monitoring and Tracing for Observability:
Service meshes also provide pluggable backends for telemetry capture. This is done using an instance of Prometheus. Service mesh solutions like Istio also use an instance of application tracing with Jaeger. An added benefit with service mesh observability is that it ensures commercially off the shelf (COTS) applications have non-zero visibility regarding performance.
Monitoring support is provided through a Prometheus instance living in the control plane. This instance is dedicated to the service mesh, so there may be other instances of Prometheus running in the cluster. The Prometheus instance living in the control plane targets what and where to scrape automatically for the services in your data plane. This auto-discovery utilizes Kubernetes APIs. An idea here is that you’re able to expand your monitoring environment through federation. Applications within the mesh are currently responsible for exposing app-specific data to the Prometheus instance. Traffic metrics are specified by the SMI.
A service mesh implementation that implements tracing capabilities is Red Hat’s OpenShift Service Mesh which uses the Cloud Native Cloud Foundation’s Jaeger project. The service mesh runs a version of an all in one image, which includes the Collector, UI, and query components. Here Envoy acts as the Jaeger Agent component. Implementation of Distributed Tracing for the service mesh can be found here.
Finally, another aspect of observability is visibility into the service mesh’s data plane. Tools like Kiali can be installed on top of existing service meshes to additional service mesh observability and configuration. The diagram below shows a service topology graph within the Kiali interface which also contains wizards for easily configuring Istio routing. Solo’s Service Mesh Hub is a similar service mesh level tool.
Service meshes provide observability of systems involving technologies such as Prometheus, Jaeger, and Kiali.
Service mesh technologies support applications through added resilience, security, control, and observability at the mesh infrastructure level. While a service mesh isn’t needed to implement these pillars across a distributed environment, meshes do take the burden off of developers by offering an appealing set of capabilities. Application developers would otherwise have to implement these features into their software applications. And this process produces redundant or infrastructure specific code and often additional dependencies.
There is quite a bit to consider when adopting and utilizing a service mesh. Service mesh implementations should continue to mature and manage the complexity associated with use. One heavy consideration for adopting service mesh is cost. Service meshes use to compute and memory resources, and then there the cost of latency.
There are also caveats around application architecture and design when adopting service mesh solutions. One example is an application that utilizes Kubernetes StatefulSets since the resources use direct pod-to-pod communication and are not good candidates for service mesh tenants. I highly recommend ensuring a plan for installing service mesh and onboarding applications. I shared some DevOps lessons learned in this blog post, where I talk about developing a big picture diagram for architecting away from your current state.
In summary, this post was my write up on the fundamentals of service mesh. I had the opportunity to attend a pilot of Red Hat’s Advanced OpenShift Service Mesh boot camp two months ago back in December 2019. All in all, it was great to add to the working experience I had with service mesh. More importantly, the opportunity to speak to the engineering business unit responsible for Red Hat’s OpenShift Service Mesh provided much insight into the future of meshes. It got me thinking about the need for simplicity amongst the complexity of service mesh. I hope to see the Service Mesh Interface continue to expand in scope and specification to promote this. I expect we may see growing differences in meshes implementations this year. I highly recommend taking a look and perhaps contributing to servicemesh.es, a site that compares service mesh implementations.
If you’d like to learn how Istio integrates with Harness, catch our three-part series here. Software delivery is at the heart of delivering business value and staying competitive in the marketplace if you solidify that process, you create many opportunities to innovate and explore other technologies.