What you should know before installing an agent, and how it affects your code
When building a scalable server-side application, we spend a considerable amount of time thinking about how we’ll monitor, operate and update our code in production. A new breed of tools has evolved to help Java and Scala developers do just that. Many of them are built on one of the most powerful ways in which external code can integrate with the JVM at runtime – Java agents.
Agents are OS native or Java libraries (we’ll describe the differences below) to which the JVM provides capabilities that aren’t available to normal application code. To get a sense of how fundamental these are, let’s look at just a few of the tools we use that rely on them:
As I alluded above, there are two kinds of agents – Java and native. While both are loaded into JVM in almost the same manner (using a special JVM startup argument), they’re almost completely different in how they’re built and what they’re meant to do.
Let’s look at the two.
Java agents are .jar files that define a special premain static function which will be invoked by the JVM before the application’s main function is invoked. The magical part comes in with the Instrumentation object, which is passed as an argument to this function by the host JVM. By holding on to this object the agent’s code (which otherwise behaves as any Java code loaded by the root class loader) can do some really powerful things.
The most powerful capability given to the agent is the ability to dynamically rewrite the contents of a target class’ method at run-time class (field structures are immutable). This process, known as bytecode instrumentation, enables the agent to essentially rewrite the contents of a method while the code is running.
Some examples include adding calls to specific methods to profile performance (e.g. endtime – starttime) or record parameters values (e.g. the URL passed to a servlet). Another example would be to reload a new version of a class without restarting the JVM, as done by JRebel.
For an agent to modify the code or a loaded class it essentially triggers a process of a class reload by the JVM, where the class’s bytecode is replaced with a new version. This requires that the agent be able to provide the JVM with new bytecode that is verifiable (i.e. conforms to the JVM specification). Unfortunately, generating correct bytecode at runtime isn’t very simple – there are a lot of requirements and edge-cases. For this, agents usually use a library for reading and writing bytecode. That library enables them to load an existing class’s bytecode into a DOM-like structure, modify it by adding things like profiling calls, and then save the DOM back to raw bytecode.
ASM is a popular library for this, that offers an all purpose Java bytecode manipulation and analysis framework. It’s one of the most popular libraries for bytecode manipulation, that can be used to modify existing classes or dynamically generate classes, directly in binary form. It’s so popular, that it’s in fact used by some of Sun’s internal code to parse bytecode in Java.
Native agents are completely different beasts. If you thought Java agents can let you do cool things, hold onto your socks, because native agents operate on a whole different level. Native agents are not written in Java, but mostly in C++, and are not subjected to the rules and restrictions under which normal Java code operates. Not only that, they’re provided with an extremely powerful set of capabilities called the JVM Tooling Interface (JVMTI).
This set of APIs exposed by jvmti.h enables a C++ library dynamically loaded by the JVM to obtain an extremely high level of visibility into the real-time working of the JVM. This spans across a wide field of areas, including GC, locking, code manipulation, synchronization, thread management, compilation debugging and much more.
The JVM TI has been designed to make the JVM as transparent as possible while still maintaining the design flexibility to allow JVM vendors to provide different underlying implementations. This set of APIs is very wide, containing literally hundreds of callbacks and functions into the JVM. You can use these to do extremely powerful things that Java agents can’t do, such as writing your own debugger, or building low-level, real time error analysis tools.
The vast majority of monitoring performed by Harness Service Reliability Management (SRM) takes place in native code, at a level lower than most other agents. This means that for the most part, SRM is not even visible to other agents at the bytecode level.
Since SRM has a minimal CPU, memory, and network footprint on your server, the performance of other agents will not be affected by its monitoring activity. If you want to learn more about how SRM can work with your existing tools and agents.
Back to the JVMTI – here’s a callback that the JVMTI provides to the agent, so that whenever an exception is thrown anywhere inside the JVM, the agent will receive the bytecode location in which the exception was thrown, the owner thread, the exception object, and if / where it will be caught. Powerful stuff indeed.
If everything I described sounds absolutely peachy, you can ask, why aren’t all agents written as native agents? There are a few reasons to be aware of, so here they are (in no particular order):
The first is that the JVMTI API is very complex, with a lot of small moving wheels. For the most part, if you’re not writing an agent that requires very low-level capabilities you’re fine with the Java agent API, which is more straightforward and can help you get the job done more quickly.
Since native agents are written and compiled as native libraries (.so / .dll) they need to be compiled and tested across any number of operating systems you want to support. If you look at Windows, OSX and the different flavors Linux comes in, that can translate to a lot of work. Compare that with Java agents which are executed as Java code by the JVM, and therefore are inherently portable by design.
Since native agents are usually written in C++, it means they can’t use tried and true Java bytecode manipulation libraries (such as ASM) directly without going back into the JVM using JNI, which does take some of the fun out.
The JVM provides strong safeguards to prevents code from doing the kinds of things that will cause an angry OS to terminate the process. Memory access violations that would under normal circumstances cause a SIGSEV and crash the process, get wrapped us a nice NullPointerException. Since native agents run at the same level of the JVM (vs. Java agents whose code is executed by it), any mistakes they make can potentially terminate the JVM.
Hope this helps highlight some of the differences between the two kinds. It’s good to know what are agents and how they’re built, as even if you never end up writing one, you may be relying on one or more of them to power your application today.