Command Query Responsibility Segregation (CQRS)

Databases conflate the writing of data and the reading of data in the same place: the database. In some situations, it is preferable to separate reads from writes. There are several reasons to do this but the most prevalent is that the application can now save data in the exact form in which it arrives, accurately reflecting what happened in the real world, while reading it in a different form, one that is optimized for reading.

For example, a user adding and removing items from their cart would all be recorded as a stream of immutable events: t-shirt added, t-shirt removed, etc. These are then summarized into a separate view that used to serve reads, for example summarizing the various user events to represent the accurate contents of the cart.

Problem

How can we store and hold data in the exact form in which it arrived but read from a summarized and curated view?

Solution

command-query-responsibility-segregation

Represent changes that happen in the real world as Events - an order is shipped, a ride is accepted, etc. - and retain these events as the system of record. Subsequently, aggregate those Events into a view that summarizes the events to represent the current state, allowing applications to query the current values. So for example, the current balance of an account would be the total of all the payment events that added money to or removed it from the account. The system of record is the stream of payment events. The view we read from would be the account balance.

Implementation

The streaming database ksqlDB can implement a CQRS using an Event Stream and Table.

Event Streams are built into to the streaming database design. Creating a new stream is straightforward:

CREATE STREAM purchases (customer VARCHAR, item VARCHAR, qty INT WITH (kafka_topic='purchases-topic', value_format='json', partitions=1);

Events can be directly using familiar SQL syntax.

INSERT INTO purchases (customer, item, qty) VALUES ('jsmith', 'hats', 1);
INSERT INTO purchases (customer, item, qty) VALUES ('jsmith', 'hats', 1);
INSERT INTO purchases (customer, item, qty) VALUES ('jsmith', 'pants', 1);
INSERT INTO purchases (customer, item, qty) VALUES ('jsmith', 'sweaters', 1);
INSERT INTO purchases (customer, item, qty) VALUES ('jsmith', 'pants', 1);
INSERT INTO purchases (customer, item, qty) VALUES ('jsmith', 'pants', -1);

We can create a Materialized View of the data as a Table:

CREATE TABLE customer_purchases WITH (KEY_FORMAT='JSON') AS
  SELECT customer, item, SUM(qty) as total_qty from purchases GROUP BY customer, item emit changes;

And continuously query for changes to the state of the customer_purchases table:

SELECT * FROM customer_purchases EMIT CHANGES;

Considerations

  • CQRS adds complexity over a traditional simple CRUD database implementation.
  • High performance applications may benefit from a CQRS design. Isolating the load of writing and reading of data may allow us to scale those aspects independently and properly.
  • Microservices applications often use CQRS to scale-out with many views provided for different services. The same pattern is applicable to geographically dispersed applications such as a flight booking system which are read heavy across many locations.
  • A write to a CQRS system is eventually consistent. Writes cannot be read immediately as there is a delay between the write of the command Event and the query-model being updated. This can cause complexity for some client applications, particularly online services.

References