AIOps is the intersection of Artificial Intelligence and Operations. You might be saying to yourself “one more operations related phoneme for us like DevOps or GitOps” and you would not be wrong. AIOps has been gaining momentum in the last few years. 

As our systems become more complex and the operational load from decades gone by with engineer to system burden ratios of one to tens physical machines to one to thousands and beyond today with containerization, help is certainly needed. Our systems and platforms have grown so complex, a “Fog of Development” starts to appear which AIOps will help clear. 

Fog of Development

An item that I tend to speak about is what I call the Fog of Development. The Fog of Development is based on the military term Fog of War which relates to uncertainty in situational awareness. The systems we work on similarly to a military campaign are so complex, no one person has an entire end-to-end view. 

Boiling into the systems and platforms we deal with today, as we work on feature development can seem we only focus on a few features/endpoints which is part of a much larger topology. With the microservice bloom, the topologies we deal with are certainly becoming more complex because the number of endpoints is increasing. With seemingly every member of the team creating new endpoints, someone has to maintain those endpoints and starting to be more often than not, you write it, you run it aka Full Lifecycle development. 

Full Lifecycle development is here

The nature of the beast is changing. There Full Lifecycle developer which has been made popular by Netflix represents a paradigm shift to an “operate what you build” model. Call the beast by another name, “run what you write” is just that. You write the code, you know best to run and operate the code. 

Platforms are certainly in place such as Kubernetes that can be viewed as a great equalizer bridging the application infrastructure divide; simply explain what you need in a trusty YAML file and you are off to the races. As anyone who is currently or has gone through a modernization effort to get your workload on Kubernetes or another Cloud Native technology can tell you there are lots of moving pieces and untangling decisions that were most likely made before your time on a team or project can rack up the story points

One of my favorite places to learn about the developer community as a whole is Stack Overflow’s yearly survey. Taking a look at the recently published 2019 survey and comparing it to years gone by, we can ascertain a few trends. One trend that is obvious is that we are spending less and less time on projects; it makes sense to change projects to gain new skill sets and challenges.  A second less obvious trend is that the amount of time for a software engineer to get ramped up is on the rise. We have to deal with a plethora of new tools and platforms outside of our development languages which most are related to getting our ideas into the wild aka outside of our laptop. Shouldn’t the machines be helping us reduce the time to value? AI or Artificial Intelligence is the first part of the AIOps phoneme.  

AI

Artificial Intelligence can be one of those over-used terms. If you are like myself when someone mentions AI I immediately think back to Minority Report which came out in 2002 starring Tom Cruise. A group of “Precogs” with enough cognitive power to predict crimes before they happen. 

We certainly don’t live in a dystopian future where AI has taken over. For semantics, Artificial Intelligence can be defined as systems that mimic human cognitive ability e.g making a decision similar to how we would make a decision. If you look at a traditional operations team, a lot of decisions are by the book aka in some sort of runbook. Though modern operations teams are starting to look a lot like software engineering teams because of the innovation needed. 

Ops

The role of your operations team is certainly shifting with the times. More often now operations teams can include engineering efficiency items such as platform engineering/DevOps resources and resiliency resources such as an SRE.

A good majority of newer operations jobs focus on action. Slowly fading away is the stigma of the old stogy system administrator not allowing any changes except for operating system patches during a maintenance window. 

Though even the folks who are in charge of the innovation of our infrastructure can face their own fog like the Fog of Development; the Fog of System Development. If software engineers face endpoint over-run, the infrastructure folks have the same trouble with the infrastructure and engineering efficiency items needed to support and scale the endpoints. 

A reason for decision fatigue with operations centric roles is that traditionally there is not a lower environment e.g a dev environment to fully vet the changes. Since scale and environmental concerns are core to infrastructure, a lot of times changes are very production impacting e.g made on production. What if there was a way to reduce decision fatigue? That my friends is AIOps. 

AI + Ops = AIOps

A major focus of AIOps is to reduce decision fatigue. If the system itself can make decisions and take action on the mundane and assist operators in times of reduced or violated SLAs, having a robust system is easier and incidents can be avoided. 

AppDynamics has an excellent piece that was written a year ago about their vision of AIOps. AppDynamics breaks AIOps into three pillars; insight, analysis, and action. Combining all three meaning taking in data, making a decision, and having some sort of action on the data. 

As an engineering resource think of AIOps like autopilot in your car or an airplane. Being an airline captain takes thousands of flying hours to achieve. Most of the time when you are flying the autopilot is engaged since the autopilot system can react to items quicker than a human pilot. The human pilot is still in charge but in split-second decisions can have more time focused on making the right choice. 

Not an airline pilot? Most new cars have some sort of collision avoidance/city safety system. The Audi that I own has a slew of those features. I have cross-traffic detection when I backup if an object starts to cross out of my field of vision behind me, my car will apply the brakes. In the three years that I have owned my car this system has rightfully triggered more times than I would like to admit. I have much more confidence backing up my car than I used to knowing the watchful eye is there. With all of the application and infrastructure changes, you should feel more confident that Harness is there. 

AIOps and Harness, a future together

A big part of an AIOps system is to react and react similarly to a human. As the forward-thinking human engineer that we are, running changes through our Continuous Delivery pipeline is a natural way of building confidence. One of the main purposes of your Continuous Delivery pipeline is to orchestrate all the confidence-building steps needed to deliver your changes. 

A core part of an AIOps system requires systemic confidence in potential changes/adjustments/interventions being made. Running the changes through a Continuous Delivery pipeline is the same cognitive approach an AIOps system will need to take to be similar to a human engineer. 

The Harness Platform has lots of building blocks in place to start moving towards cognitive automation. Continuous Verification is there to take the cognitive load off of engineers wondering what is actually happening during and after a deployment by taking insights from the best of breed tools. As our responsibilities increase, Harness will be there instilling confidence in the changes we make in the applications and infrastructure we design and maintain. 

Cheers!

-Ravi

Keep Reading