Projection Table

One of the first questions we want to ask of a stream of events is, "Where are we now?"

If we have a stream of sales events, we'd like to have the total sales figures at our fingertips. If we have a stream of login events, we'd like to know when each user last logged in. If our trucks send GPS data every minute, we'd like to know where each truck is right now.

How do we efficiently roll up data? How do we preserve a complete event log and enjoy the fast queries of an "update-in-place" style database?

Problem

How can a stream of change events be efficiently summarized to give the current state of the world?

Solution

Projection Table

We can maintain a projection table that behaves just like a materialized view in a traditional database. As new events come in, the table is automatically updated, constantly giving us a live picture of the system. Events with the same key are considered related; newer events are interpreted, depending on their contents, as updates to or deletions of older events.

As with a materialized view, projection tables are read-only. To change a projection table, we change the underlying data by recording new events to the table's underlying stream.

Implementation

Apache Flink® supports dynamic tables as a core concept in its Table API and SQL support. A continuous query on a dynamic table in Flink is very similar to a materialized view in a traditional database.

As an example, imagine that we are shipping packages around the world. As a package reaches each point on its journey, it is logged with its current location.

Let's start with a stream of package check-in events:

CREATE TABLE package_checkins (
    package_id INT,
    location STRING
);

To track each package's most recent location:

CREATE TABLE current_package_locations AS
  SELECT
    package_id,
    LAST_VALUE(location) OVER w AS location
  FROM package_checkins
  WINDOW w AS (
    PARTITION BY package_id
    ORDER BY $rowtime
    ROWS BETWEEN UNBOUNDED PRECEDING
      AND CURRENT ROW
  );

As new data is inserted into the package_checkins table, the current_package_locations table is updated, so we can see the current location of each package without scanning through the event history every time.

References

Flink SQL [window functions][https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/hive-compatibility/hive-dialect/queries/window-functions/]
See also the State Table pattern.

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog