Get Started Free
‹ Back to courses
course: ksqlDB 101

Converting Data Formats with ksqlDB

2 min
Allison

Allison Walther

Integration Architect (Presenter)

Converting Data Formats with ksqlDB

There are different ways to serialize data written to Apache Kafka topics. Common options include Avro, Protobuf, and JSON.

You can use ksqlDB to create a new stream of data identical to the source but serialized differently. This can be useful in several cases:

  • If the source stream is JSON or delimited (CSV), then it does not have an explicit schema, which can make it more difficult for consumers to work with. Using ksqlDB, you can apply a schema to the data and write it in a format more suitable for all consumers.
  • You may be using a format, such as Avro, but a consuming application may insist on another format, such as delimited (CSV).

To write a stream of data from its CSV source to a stream using Protobuf, you would first declare the schema of the CSV data:

CREATE STREAM source_csv_stream (ITEM_ID INT, 
                                 DESCRIPTION VARCHAR, 
                                 UNIT_COST DOUBLE, 
                                 COLOUR VARCHAR, 
                                 HEIGHT_CM INT, 
                                 WIDTH_CM INT, 
                                 DEPTH_CM INT) 
                          WITH (KAFKA_TOPIC ='source_topic', 
                                VALUE_FORMAT='DELIMITED');

Then you would use a continuous query to write all of these events to a new ksqlDB stream serialized as Protobuf:

CREATE STREAM target_proto_stream 
WITH (VALUE_FORMAT='PROTOBUF') 
AS SELECT * FROM source_csv_stream

Use the promo code KSQLDB101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Converting Data Formats with ksqlDB

Hi, I'm Allison Walther with Confluent. Let's talk about converting data formats with ksqlDB. Now, depending on your use case, you want your data in different formats, and that's exactly what we'll cover in this lesson. ksqlDB provides a way for us to specify the data format for our event streams and to convert event data from one format to another. Many event streams flowing through Kafka are on Avro format, which is very efficient and useful, but we may have a legacy system or a particularly masochistic co-worker that needs this event data in a comma-delimited format. We can do that while creating a new stream and using the ValueFormat property in a WITH clause. Our new stream will mirror the existing Avro stream, but the data will be comma-separated. The future enables ksqlDB to provide data for a wide variety of applications. That's it for this lesson, let's move into an exercise.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.