The modern software industry is advancing rapidly. In a monolithic world, it was easy to put everything on the same machine, store the application state in a couple of databases (relational/non-relational), and scale servers horizontally. As the scale and the variety of use cases increased, the industry started adapting and introduced microservices – one service does a concentrated job and communicates just enough to other services to get the ball rolling.

How does  state transfer happen with multiple services? One easy answer would be to make every service expose REST endpoints and make HTTP calls as and when required to the relevant services. This will definitely work well for some kinds of applications, but not all. If your application communicates synchronously (user-service as well as service-service communication), there is no need for additional complexity and you can just rely on HTTP calls. However, dealing with such a pattern can be challenging in the longer run for reasons like: 

  • Every service needs to know how to talk to all other services.
  • How do you handle single service degradation?
  • How many retries do you make in case an HTTP call fails? 

These are some of the issues which event-driven architecture fundamentally tries to address by making sure each service is doing what it is supposed to do and when it is supposed to do.

Event-driven architecture uses events to communicate among various independent microservices. It enables applications to act on these events as they occur.

Implementing Event-Driven Architecture

There are various options available to implement event-driven architecture. Some of them are:

  • Apache Kafka
  • Redis Streams
  • Cloud options, like:
    • Google Pub/Sub
    • AWS EventBridge

The most suitable one can be selected based on time to production, ease of adoption, message ordering, resilience, event replay, persistence, retries, at-least/at-most once delivery, cost etc of your use case.

We will discuss event-driven architecture at Harness using Redis Streams in this blog. Before diving into event-driven architecture, it is important to understand what Redis and Redis Streams are.

What Is Redis?

Redis is an open source in-memory data structure store, used as a database, cache, and message broker.

Important features of Redis:

  • Feature-rich: Redis can help with caching, distributed locking, pub-sub, real time analytics, geospatial querying, and the list goes on.
  • Production grade: On top of flexibility of fitting into a variety of use cases, Redis also offers various deployment methods to use it in production with native high-availability & scalability solutions.
  • Simplicity: Getting started with Redis is as simple as three commands (equivalent available in other operating systems).
    $ brew install redis
    $ brew services start redis
    $ redis-cli
  • Wide client support: You can connect to Redis via 40+ clients in your favorite language of choice (a lot of them are officially maintained).

Redis Streams

Redis Streams is an append-only data structure that helps in various streaming use cases like real-time messaging (RTM), message broker, etc. Before diving into Redis Streams, let us first go through one of the other constructs which Redis provides for inter service communication: Redis Pub/Sub.

Pub/Sub

Redis Pub/Sub implements the Publish/Subscribe messaging paradigm. Let’s play around a bit with multiple terminal windows. Open four terminal tabs and enter redis-cli in all of them. We will PUBLISH to a channel from one window (Redis client) and SUBSCRIBE to that channel from the other three.

As you can see, the first client is publishing a message “hello world“ on channel1. The other three clients are endlessly subscribed to channel1 and receive the message sent via the publisher. A quick note here is that the subscribe command is a forever blocking operation and endlessly waits for a new message to appear on the channel.

Key things to note about Pub/Sub:

  • Fan-out only. All active subscribers get all messages instantly.
  • No message is persisted in memory (fire-and-forget, or at-most-once delivery).
  • If a client subscribes after a message is published, previous messages are not delivered.

Streams

Redis Streams was introduced as a new data structure in Redis 5.0, which models an append-only log file like construct. Note the key difference between Redis Streams and Apache Kafka here is, Streams is merely an append-only list data structure in Redis with advanced operations, while on the other hand, Kafka is an entire platform of various components. Let us discuss the operations that we can perform on a Stream:

  1. Add data to stream (returns a unique ID)
    XADD * key1 value1 key2 value2
  2. Read data from stream
    XREAD BLOCK <blocktimeout> COUNT <count> STREAMS <streamname> <id>
  3. Consumer groups
    • Create a group
      XGROUP CREATE <streamname> <groupname> $
    • Read as a consumer
      XREADGROUP GROUP <groupname> <consumername> COUNT <count> STREAMS <streamname> >
    • Acknowledge a message with a specific ID
      XACK <streamname> <groupname> <id>
  4. Range queries
    XRANGE <streamname> <start-id> <end-id>
  5. Truncate the stream
    XTRIM <streamname> MAXLEN <maxlen>

Pub/Sub With Streams

Event-driven architecture: pub/sub

PUBLISH can be realized by XADD channel1 * message "hello world" and

SUBSCRIBE can be realized by XREAD BLOCK 5000 STREAMS channel1 $ and then keep passing the last fetched id in this loop.

Consumer Group Reading With Streams

Event-driven architecture: streams

  1. Push a message to a stream
    XADD channel1 * message "hello world"
  2. Create a consumer group
    XGROUP CREATE channel1 group1 0-0
  3. Read messages from the stream as a consumer of a group
    XREADGROUP GROUP group1 consumer1 COUNT 5 STREAMS channel1 > # Read an unread message of group1

Other commands like XPENDING, XCLAIM, XTRIM, etc. are discussed in the Redis Stream Intro page.

Key things to note about Streams:

  • Pull mechanism for message delivery.
  • Persistence of messages until truncated explicitly.
  • Consumer groups can come at a later stage to fetch messages after a certain ID.
  • Explicit acknowledgement is required for messages, else the messages stay in a pending queue for re-delivery.

Event-Driven Architecture at Harness

We at Harness have recently moved to a microservices world for a coming release. Multiple services publish events, and the others selectively consume them by listening to a particular stream to drive end-user functionality.

Why Redis Streams Over Other Options?

  • Infrastructure setup: We already have Redis in our infra for other use cases, like caching and distributed locking. This enabled us to focus on faster delivery as far as our rollout to production was considered because local, staging, and on-prem environments didn’t require any special handling.
  • Scale of messages: Based on our estimations from various use cases across our services, we found our initial scale to be around a few thousand messages per minute. Introducing Kafka or a similar new tool at the very beginning would be overkill. Nevertheless, our end-user APIs were made generic so that the backend data store could be switched at a later stage with minimal changes to the adopters.
  • Throughput: Since Redis is an in-memory data store, it gave us a very fast performing backend, which helped stream consumers to act immediately as and when messages were published.

Implementation

I will give you a sneak peek into how easy it is to get it up and running with just a Redis instance and a few lines of code:

Here’s a gist with the above code.

Challenges

While implementing event-driven architecture at Harness using Redis Streams, these are some of the challenges that we faced:

  • Monitoring:
    • Commands that can be used for monitoring streams:
      • XINFO, XLEN: Can be used for basic monitoring over the streams.
      • MEMORY USAGE: Can be used for memory usage per stream (costly).
    • Apart from the above, it’s a little tricky to check some of the other aspects of a streaming architecture – how far behind the consumer groups are from the head of the stream, speed of consumption/acknowledgements, etc. – and might require a custom monitoring application implemented on top of it.
  • Persistence:
    • Since Redis is an in-memory datastore, persisting a lot of events would be costly. Proactive monitoring is required to keep a tab on the inflow of events.
    • There are a few ways to ensure total memory usage is not spiking:
      • Event payload size: Get the total memory usage of the stream (MEMORY USAGE) and divide it by length of the stream (XLEN). Make sure this metric doesn’t drastically increase after any release (this is costly on the Redis server if the stream size is huge).
      • Stream length: You can use the XADD command with a MAXLEN or MINID argument to ensure the stream is capped at all times and the oldest entries are cleared automatically (XTRIM for reactively truncating manually).
  • Resiliency:
    • With Redis Streams, you will have to worry about a few things from the beginning of your architecture design:
      • Deployment of Redis (preferably a hosted solution or a self-hosted Sentinel deployment for high availability).
      • Head-of-line blocking: You will have to use XAUTOCLAIM (or a combination of XPENDING and XCLAIM) to figure out the messages that are being processed again due to any error in the consumer processing. To avoid such issues, based on a metric (delivered count/time elapsed after first delivery/custom metric), you would need to drop the message at the top of the stream because new messages might starve to be processed. This message can either be completely forgotten or be stored in a secondary storage for further triaging.

Conclusion

We achieved an event-driven architecture by using Redis Streams and have deployed it to production for more than a few months now. Redis Streams is a more lightweight solution for implementing event-driven architecture, as compared to advanced solutions like Apache Kafka. You need to decide which would be the best implementation based on your use case and the features that you expect out of an event-driven architecture.

It seems you enjoy reading technical deep dives! How about reading our piece on Architecting Feature Flags for Performance at Scale, or Audit Trails 201: Technical Deep Dive?

Thanks for reading, and I hope you’re excited to use Redis Streams in your projects!

-Raj