A single Event Stream or Table can be used by multiple Event Processing Applications, and its Events may go through multiple processing stages along the way (e.g., filters, transformations, joins, aggregations) to implement more complex use cases.
How can a single processing objective for a set of Event Streams and/or Tables be achieved through a series of independent processing stages?
We can compose Event Streams and Tables in an Event Streaming Platform via an Event Processing Application to a create a pipeline—also called a topology—of Event Processors, which continuously process the events flowing through them. Here, the output of one processor is the input for one or more downstream processors. Pipelines, notably when created for use cases such as Streaming ETL, may include Event Source Connectors and Event Sink Connectors, which continuously import and export data as streams from/to external services and systems, respectively. Connectors are particularly useful for turning data at rest in such systems into data in motion.
Taking a step back, we can see that pipelines in an Event Streaming Platform help companies build a "central nervous system" for data in motion.
As an example we can use Apache Flink® SQL to run a stream of events through a series of processing stages, thus creating a Pipeline that continuously processes data in motion.
CREATE TABLE orders (
customer_id INTEGER NOT NULL,
order_id STRING NOT NULL,
item_id STRING,
price DOUBLE
);
We'll also create a (continuously updated) customers table that will contain the latest profile information about each customer, such as their current home address.
CREATE TABLE customers (
customer_id INTEGER NOT NULL,
customer_name STRING,
address STRING
);
Next, we create a new stream by joining the orders stream with our customer table:
CREATE TABLE orders_enriched AS
SELECT o.customer_id AS cust_id, o.order_id, o.price, c.customer_name, c.address
FROM orders o
LEFT JOIN customers c
ON o.customer_id = c.customer_id;
Next, we create a stream, where we add the order total to each order by aggregating the price of the individual items in the order:
CREATE TABLE orders_with_totals AS
SELECT cust_id, order_id, sum(price) AS total
FROM orders_enriched
GROUP BY cust_id, order_id;
This pattern was influenced by Pipes and Filters in Enterprise Integration Patterns by Gregor Hohpe and Bobby Woolf. However, it is much more powerful and flexible because it is using Event Streams as the pipes.