Event Streaming Platform

Companies are rarely built on a single data store and a single application to interact with it. Typically, a company may have hundreds or thousands of applications, databases, data warehouses, or other data stores. The company's data is spread across these resources, and the interconnection between them is immensely complicated. In larger enterprises, multiple lines of business can complicate the situation even further. Modern software architectures, such as microservices and SaaS applications, also add complexity, as engineers are tasked with weaving the entire infrastructure together cohesively.

Furthermore, companies can no longer survive without reacting to Events within their business in real time. Customers and business partners expect immediate reactions and rich interactive applications. Today, data is in motion, and engineering teams need to model applications to process business requirements as streams of events, not as data at rest, sitting idly in a traditional data store.

Problem

What architecture can we use to model everything within our business as streams of events, creating a modern, fault-tolerant, and scalable platform for building modern applications?

Solution

event streaming platform

We can design business processes and applications around Event Streams. Everything, from sales, orders, trades, and customer experiences to sensor readings and database updates, is modeled as an Event. Events are written to the Event Streaming Platform once, allowing distributed functions within the business to react in real time. Systems external to the Event Streaming Platform are integrated using Event Sources and Event Sinks. Business logic is built within Event Processing Applications, which are composed of Event Processors that read events from and write events to Event Streams.

Implementation

Apache Kafka® is the most popular Event Streaming Platform, designed to address the business requirements of a modern distributed architecture. You can use Kafka to read, write, process, query, and react to Event Streams in a way that's horizontally scalable, fault-tolerant, and simple to use. Kafka is built upon many of the patterns described in Event Streaming Patterns.

Fundamentals

Data in Kafka is exchanged as events, which represent facts about something that has occurred. Examples of events include orders, payments, activities, and measurements. In Kafka, events are sometimes also referred to as records or messages, and they contain data and metadata describing the event.

Events are written to, stored in, and read from Event Streams. In Kafka, these streams are called topics. Topics have names and generally contain "related" records of a particular use case, such as customer payments. Topics are modeled as durable, distributed, append-only logs in the Event Store. For more details about Kafka topics, see the Apache Kafka 101 course.

Applications that write events to topics are called Producers. Producers come in many forms and represent the Event Source pattern. Reading events is performed by Consumers, which represent Event Sinks. Consumers typically operate in a distributed, coordinated fashion to increase scale and fault tolerance. Event Processing Applications act as both event sources and event sinks.

Applications which produce and consume events as described above are referred to as "clients." These client applications can be authored in a variety of programming languages, including Java, Go, C/C++, C# (.NET), Python, Node.JS, and more.

Stream Processing

Event Processing Applications can be built atop Kafka using a variety of techniques.

ksqlDB

The streaming database ksqlDB allows you to build Event Processing Applications using SQL syntax. It ships with an API, command line interface (CLI), and GUI. ksqlDB's elastic architecture decouples its distributed compute layer from its distributed storage layer, which uses and tightly integrates with Kafka.

Kafka Streams

The Kafka client library Kafka Streams allows you to build elastic applications and microservices on the JVM, using languages such as Java and Scala. An application can run in a distributed fashion across multiple instances for better scalability and fault-tolerance.

Data Integrations

The Kafka Connect framework allows you to scalably and reliably integrate cloud services and data systems external to Kafka into the Event Streaming Platform. Data from these systems is set in motion by being continuously imported and/or exported as Event Streams through Kafka connectors. There are hundreds of ready-to-use Kafka connectors available on Confluent Hub. On-boarding existing data systems into Kafka is often the first step in the journey of adopting an Event Streaming Platform.

Source Connectors pull data into Kafka topics from sources such as traditional databases, cloud object storage services, or SaaS products such as Salesforce. Advanced integrations are possible with patterns such as Database Write Through and Database Write Aside.

Sink Connectors are the complementary pattern to Source Connectors. While source connectors bring data into the Event Streaming Platform continuously, sinks continuously deliver data from Kafka streams to external cloud services and systems. Common destination systems include cloud data warehouse services, function-based serverless compute services, relational databases, Elasticsearch, and cloud object storage services.

Considerations

Event Streaming Platforms are distributed computing systems made up of a diverse set of components. Because building and operating such a platform requires significant engineering expertise and resources, many organizations opt for a fully-managed Kafka service such as Confluent Cloud, rather than self-managing the platform, so that they can focus on creating business value.

References

  • This pattern is derived from Message Bus in Enterprise Integration Patterns, by Gregor Hohpe and Bobby Woolf.
  • Confluent Cloud is a cloud-native service for Apache Kafka®.
  • The Apache Kafka 101 course provides a primer on what Kafka is and how it works.