Get Started Free
‹ Back to courses
course: Building Apache Flink® Applications in Java

Working with Keyed State in Flink

5 min
Wade Waldron

Wade Waldron

Staff Software Practice Lead

Working with Keyed State in Flink

Overview

One of the powerful features of Flink is its ability to maintain state in a datastream. This state can be kept local to the operation being performed which can improve performance by eliminating network hops. In this video, we'll introduce keyed state in Flink and show you how you can use it to maintain state across messages and even windows.

Topics:

  • Keyed State
  • Value/List/Map State
  • Reducing State/Aggregating State
  • Descriptors
  • Loading/Updating State

Code

Descriptors

private ValueStateDescriptor<MyClass> descriptor;

@Override
public void open(Configuration parameters) {
	descriptor = new ValueStateDescriptor<>(
		"Identifier", 
		MyClass.class
	);
	descriptor.enableTimeToLive(ttlConfig);
	...
}	

Accessing/Updating State

public void process(...) {
	ValueState<MyClass> state = context
		.globalState()
		.getState(descriptor);
	MyClass value = state.value();

	if(value == null) {
		...
	} else {
		...
	}

	state.update(value);
}

Resources

Use the promo codes FLINKJAVA101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Working with Keyed State in Flink

Hi, I'm Wade from Confluent. In this video, we'll be discussing how to work with Keyed State in Flink. (upbeat music) We've been focusing on stateless operations so far. These operations treat each message as independent. They don't worry about past or future messages. However, there have been some exceptions. The reduce operator looks at the previous message and combines it with the current message to produce something new. But reduce is a relatively simple operation and is only looking at two adjacent messages. What if we wanted to do something more complex, that involved a deeper history? Let's consider the example of fraud detection. If you've ever had your credit card temporarily denied due to an odd pattern in spending, you'll know what I mean. Credit card fraud detection doesn't look for a single transaction. Instead, it looks for patterns in the transactions. For example, making a single small purchase to test the card, followed by a series of large purchases. In this sort of situation we need to be able to keep some kind of state that persists beyond the current message. It might be as simple as setting a flag, or it might be a more complex object that is several layers deep. If our streaming operation is windowed, we may even need to persist the state beyond the current window. Flink offers built-in support for stateful operations. It includes a mechanism for storing state that is both durable and fast. It does this using an embedded key-value store. The keys are determined using the keyBy operation in Flink. One of the advantages to this is that Flink also uses keyBy for distribution and parallelism. Ensuring these keys match means the state can be kept local to the task manager. This allows for in-memory caching and speeds up disk access. Essentially, it prevents the necessity for slower network hops. Flink supports several different types of state storage, including: ValueState which stores a single object. ListState which stores a list of objects. And MapState which stores a map of key-value pairs. Just remember, the state is already keyed using the keyBy operator. The intent of the MapState would be to handle objects that include a secondary key of some kind. Flink also supports more complex states such as ReducingState and AggregatingState. These will automatically combine states based on a reduce or aggregate function. To use our state, the first thing we need is a StateDescriptor. It contains a name that uniquely identifies the type of state we will be storing. Think of it like a database table where the descriptor identifies the column, and the keyBy operator identifies the row. The descriptor also defines important details about the state itself. This is dependent on what type of State you are using. For ValueState it's simply the class of object that will be contained in the state. For ReducingState, it includes the class of object, but also the reduce function that is applied on each insert. The descriptor also contains other functionality such as setting a timeToLive value. This allows the state objects to expire after a certain period. Usually, the descriptor is initialized once when the function using it is created. Often, this would be done by overriding the open method in a KeyedProcessFunction or similar. Once we have our descriptor, the next step is to make use of the state inside our KeyedProcessFunction or whatever function we are implementing. The state is available through the context object in the function. It provides two different storage mechanisms. WindowState persists only while the current window is open. Once the window closes, the state will be cleared. GlobalState persists beyond the life of the window. This is useful when you have state that needs to persist even after the window has closed. To get access to the state, use the getState method on the state store. This returns a container for the stored value. The actual value can be obtained using the value method. Be aware that if this is the first message for a given key, then there may not be an existing state object. In that case, you may get back a null and you will have to handle that appropriately. The state object also contains an update method for updating the stored state. Keyed state can be quite powerful. However, it does require a bit of caution when using it. If your key space is large or even unbounded, and you aren't careful, you can see an explosion of state objects. These objects can eat through your disk and memory. This is especially true when working with global state rather than window state. You can mitigate some of the risk using the time-to-live settings. Otherwise, be careful to use window state where possible, and when using global state try to keep your objects small, and your keyspace limited. Now, let's try these out in an exercise. If you aren't already on Confluent Developer, head there now using the link in the video description to access the rest of this course and its hands-on exercises.