Development and Operations (DevOps) is a combination of culture, tools, and practices that increases an organization’s ability to deliver its products and services to clients. There are many tools and stacks under the DevOps umbrella. You can select any combination of these, master them, and they should align with the organization’s needs. That’s the great thing about DevOps – there’s something for everything – and that’s what makes DevOps adoption a breeze. But we’re not here to talk about tools today – we have plenty of posts on that already!
There are best practices that DevOps engineers/teams should follow to create an efficient environment. These practices will allow them to deliver products and services with high availability, and will enable feedback from multiple places long before the product or service goes GA. In this blog post, we’ll go over DevOps adoption at Harness. We’ll give you a deep look into which strategies we implemented over time to embrace DevOps principles and culture.
Code Verification Process
Shift Left Approach
Harness takes advantage of hooks provided by Git to verify certain things, even before developers push the code to create PRs. Git hooks are nothing but scripts that get invoked by different Git commands. These can be copied to every developer’s machine locally. These scripts can contain any kind of logic (for example: to verify the latest code on a developer’s machine before pushing, commit message format and size) and they can be written in any programming or scripting language. This helps identify certain issues related to PRs, like merge conflict or improper commit messages – and saves a lot of time and build/compute resources.
Nobody likes buggy or vulnerable code. To prevent this, all code that goes into release or master branches should be verified properly before they merge to either of these branches. At Harness, as soon as developers raise a PR (pull request) in GitHub, a series of jobs start running in Harness CIE (For example: Bazel Build, UnitTests, FunctionalTests, StaticChecks, and other misc checks). If all these jobs pass, then the PR will go under manual review by at least 2 senior developers, and if they both give approval, then the code changed/added in the PR is allowed to merge to either the release or master branch.
At Harness, on a daily basis, over 100 PRs are raised and verified by CI pipelines within 35 minutes (depending on the changeset). Below is a high-level diagram of the PR process that Harness follows.
The best way to deliver a stable and bug-free product or service is to test it properly in an environment similar to the production environment. This can only be achieved by creating multiple environments where people can test new code and get feedback properly. Here are 4 environments that Harness uses to test out its products and services before releasing to clients:
- PR Environment: This is the first environment where developers test their bug fixes and new features.
- QA Environment: Here, the testing team runs automations and manual test cases for upcoming releases. Beta testers use this environment as well.
- Stage/Staging Environment: This environment runs 1 or 2 releases behind the current release and is used to deploy a production environment.
- Production Environment: If a product or service passes in all above environments, then it is deployed here and made GA to all Harness customers.
Backup and Failover Strategies
Data gathered over time is valuable to all organizations. Therefore, it is very important to have backup strategies in place for databases and storage disks. If your databases and storage disks are on the cloud, take advantage of regular backups or the snapshot function provided by cloud vendors. If they are not on the cloud, set up a cron job (use scp or rsync utility) to take backups on regular intervals. At Harness, we use the snapshot feature provided by GCP to back up all storage disks used by servers.
No one can provide 100% uptime. However, 99.99999% is possible, and this is achievable by putting redundancies in place. Both primary and redundant systems should be in different zones (Cloud Availability Zones), at 100% parity, and completely isolated from each other. If the primary system goes down, then a redundant one takes its place and serves clients.
Infrastructure as Code
Setting up the entire infrastructure to host a product/service is not an easy task. It involves a whole lot of components stacked together to create an ecosystem. Infrastructure includes Virtual Machines, Databases, Storage Disks, Load Balancers, VPC, Subnets, Firewalls, etc. The greater the infrastructure is, the more difficult it will be to recreate it if something goes south. Hence, it is important to document all the components in the form of scripts.
Terraform and Ansible are good open-source infrastructure provisioning and configuration management tools. These tools can create multiple environments with minimal clicks on the keyboard, and can upgrade or downgrade your infrastructure on an as-needed basis. They definitely make DevOps engineers’ lives easier!
Version Controlled Scripts
When we manage infrastructure in the form of IaC, that means it’s also like any other code: there’s a chance it can break at any time. It’s better to manage IaC with a Version Control System (VCS) tool like Git, SVN, etc.
VCSs record every change done to the scripts by developers. This is helpful in case you must revert breaking changes back to the last stable state, along with maintaining different versions of infrastructure for different environments.
Verify IaC Scripts
IaC scripts should also undergo static checking, like any other code, to verify before merging to master/release branch – along with a manual review from senior members. This ensures scripts are bugfree and that they follow proper linting/coding conventions.
Nowadays, all organizations are moving/migrating their infrastructure to the cloud and following a SaaS concept, typically known as subscriptions. Think: Uber, Netflix, Spotify, etc.
Using the cloud removes the headache of purchasing and maintaining infrastructure to host applications or services. The cloud provides tons of services to choose from, like compute resources, database services, monitoring services, logging services, and high internet connectivity.
Cloud is managed by vendors, and they are only responsible for providing security to infrastructure that their customers use – not for the applications running on them. Everybody loves when vendors provide secure products or services hosted on the cloud.
At Harness, we take security extremely seriously. Here are some of the best practices that we follow in that realm. I know this is more of SecOps than DevOps, but in some organizations, security falls under DevOps as well.
Secured Application Endpoints
- VPCs and Subnets: In the cloud, it is impossible to deploy anything without a VPC. VPCs segregate your environment, which is a good practice to implement. There can be multiple subnets that host your servers, databases, and storage disks. Recommendation: Public subnet for frontend servers, and private subnet for backend server.
- HTTPS Instead of HTTP: Frontend servers that serve public-facing URLs to access an organization’s products/services, or any communication using HTTP protocol, should be accessible via secured HTTP. It is advisable to install certificates to use secured HTTP instead of normal HTTP.
Secured Server Endpoints
- Minimal Access to Backend Servers: Backend servers, like databases or servers running the organization’s business logic, should have minimal access to any other machine (both ingress and egress) and should be tightly controlled by firewall rules, security groups, or NACL. These servers should not be visible or accessible to anyone other than the organization’s employees responsible for them.
- Use Bastion Host: Direct login to backend servers or frontend servers should be avoided. Always use another clandestine host, known as a Bastion Host, to log in securely to those servers via SSH. Note: SSH runs on port 22. This port should only be allowed for Bastion Host to access the organization’s servers.
Monitoring and Observability
Once the product or service is rolled out to production, always monitor its metrics (Incoming/Outgoing Traffic, Server Load, etc.). Set up threshold values to prevent outages. At Harness, we use multiple systems like Prometheus, Grafana, and GCP monitoring services.
Numbers are good, but graphs are better! Graphs make it easier to understand trends (for example, build timings, pull request changeset size, lines of code, etc.) happening over time within the organization. These metrics help reduce flaws in existing processes and allow the creation of more efficient processes. At Harness, various metrics from GitHub and Harness CI are ingested into ElasticSearch. From there, the graphs are created in Kibana for visualization. Harness also used Looker to build dashboards containing various metrics.
Dockerization and Orchestration
Application architecture has evolved from monolithic to microservices, which is good. However, microservices architecture comes with its own baggage since it involves many different components running with different configurations. It can be tricky to manage all components’ configurations. The best way to manage all configurations is in the form of Docker images. This way, you can document what’s in the component along with the versions of any third-party dependencies. It’s way easier to ship your components from one place to another. At Harness, we write all component configurations in the form of DockerFiles. We deploy them as Docker images in every environment.
Since all components run independently and on different machines, there is a need to make sure all components (in the form of Docker containers) work as expected. There are many orchestration tools available on the market today, like Docker Swarm, Kubernetes, and more. At Harness, we take advantage of GKE to deploy our services. All environments deploy to GKE, which provides high availability.
Regular Meetings With Teams
DevOps isn’t only responsible for one team. They work in many areas of a product. Hence, discussing what they need, what issues they face, and what can be done to improve their developer experience is necessary. At Harness, we have open forums on a regular basis. Leads from different teams come to discuss their issues and requirements over a Zoom call.
Slack Groups For Developers
We created several Slack groups where everyone can discuss solutions for an issue and make suggestions. In these groups, anyone can participate and create threads. This keeps everyone aware of the solution to a solved problem.
DevOps engineers help the business in a tremendous way, and they help keep products and services secure and bug-free. They enable faster feedback at every step of Development and Operations. Following some of the above best practices in your DevOps adoption process will improve and enhance everything related to infrastructure, enabling high availability of your products/services, and less downtime to recover from failures.
It seems you like reading best practices pieces! Have you read our CI/CD Best Practices article? It’s a great place to start when you’re ready to dig deeper into CI/CD.