Enhance your career, get your certificate as a Data Streaming Engineer | Get your Certificate

Tutorial

How to handle heterogeneous JSON with ksqlDB

Suppose you have a topic with records formatted in JSON, but not all the records have the same structure and value types. In this tutorial, we'll demonstrate how to work with JSON of different structures.

Set Up

For context, imagine you have three different JSON formats in a Kafka topic:

  "JSONType1": {
    "fieldA": "some data",
    "numberField": 1.001,
    "oneOnlyField": "more data", 
    "randomField": "random data"
  }

  "JSONType2": {
    "fieldA": "data",
    "fieldB": "b-data",
    "numberField": 98.6 
  }

  "JSONType3": {
    "fieldA": "data",
    "fieldB": "b-data",
    "numberField": 98.6,
    "fieldC": "data",
    "fieldD": "D-data"    
  }

From these three different JSON structures you want to extract oneOnlyField, numberField, and fieldD from JSONType, JSONType2, and JSONType3 respectively.

Your first step is to create a stream and use a VARCHAR keyword to define the outermost element of the JSON types.

CREATE STREAM data_stream (
    JSONType1 VARCHAR,
    JSONType2 VARCHAR,
    JSONType3 VARCHAR
) WITH (KAFKA_TOPIC='data_stream',
        VALUE_FORMAT='JSON',
        PARTITIONS=1);

Then you can access the fields using the EXTRACTJSONFIELD keyword and cast into the appropriate types by selecting from data_stream:

SELECT EXTRACTJSONFIELD (JSONType1, '$.oneOnlyField') AS special_info,
       CAST(EXTRACTJSONFIELD (JSONType2, '$.numberField') AS DOUBLE) AS runfld,
       EXTRACTJSONFIELD (JSONType3, '$.fieldD') AS description
FROM data_stream
EMIT CHANGES;

Running the example

Prerequisites

Docker running via Docker Desktop or Docker Engine
Docker Compose. Ensure that the command docker compose version succeeds.

Run the commands

Clone the confluentinc/tutorials GitHub repository (if you haven't already) and navigate to the tutorials directory:

git clone git@github.com:confluentinc/tutorials.git
cd tutorials

Start ksqlDB and Kafka:

docker compose -f ./docker/docker-compose-ksqldb.yml up -d

Create the data_stream topic:

docker exec -it broker kafka-topics --bootstrap-server localhost:29092 --create --topic data_stream

Open a console producer:

docker exec -it broker kafka-console-producer --bootstrap-server localhost:29092 --topic data_stream

Ever the following four events at the prompt:

{ "JSONType1": { "fieldA": "some data", "numberField": 1.001, "oneOnlyField": "more data", "randomField": "random data" }, "JSONType2": { "fieldA": "data", "fieldB": "b-data", "numberField": 98.6 }, "JSONType3": { "fieldA": "data", "fieldB": "b-data", "numberField": 98.6, "fieldC": "data", "fieldD": "D-data" }}
{ "JSONType1": { "fieldA": "some data", "numberField": 2.001, "oneOnlyField": "more data", "randomField": "random data" }, "JSONType2": { "fieldA": "data", "fieldB": "b-data", "numberField": 99.6 }, "JSONType3": { "fieldA": "data", "fieldB": "b-data", "numberField": 98.6, "fieldC": "data", "fieldD": "D-data-2" }}
{ "JSONType1": { "fieldA": "some data", "numberField": 3.001, "oneOnlyField": "more data", "randomField": "random data" }, "JSONType2": { "fieldA": "data", "fieldB": "b-data", "numberField": 100.6 }, "JSONType3": { "fieldA": "data", "fieldB": "b-data", "numberField": 98.6, "fieldC": "data", "fieldD": "D-data-3" }}
{ "JSONType1": { "fieldA": "some data", "numberField": 4.001, "oneOnlyField": "more data", "randomField": "random data" }, "JSONType2": { "fieldA": "data", "fieldB": "b-data", "numberField": 101.6 }, "JSONType3": { "fieldA": "data", "fieldB": "b-data", "numberField": 98.6, "fieldC": "data", "fieldD": "D-data-4" }}

Next, open the ksqlDB CLI:

docker exec -it ksqldb-cli ksql http://ksqldb-server:8088

Enter the following statement. This will create a stream backed by the data_stream topic.

CREATE STREAM data_stream (
    JSONType1 VARCHAR,
    JSONType2 VARCHAR,
    JSONType3 VARCHAR
) WITH (KAFKA_TOPIC='data_stream',
        VALUE_FORMAT='JSON',
        PARTITIONS=1);

Now you can access the fields using the EXTRACTJSONFIELD function. Note that we first tell ksqlDB to consume from the beginning of the stream.

SET 'auto.offset.reset'='earliest';

SELECT EXTRACTJSONFIELD (JSONType1, '$.oneOnlyField') AS special_info,
       CAST(EXTRACTJSONFIELD (JSONType2, '$.numberField') AS DOUBLE) AS runfld,
       EXTRACTJSONFIELD (JSONType3, '$.fieldD') AS description
FROM data_stream
EMIT CHANGES;

The query output should look like this:

+------------------------+------------------------+------------------------+
|SPECIAL_INFO            |RUNFLD                  |DESCRIPTION             |
+------------------------+------------------------+------------------------+
|more data               |98.6                    |D-data                  |
|more data               |99.6                    |D-data-2                |
|more data               |100.6                   |D-data-3                |
|more data               |101.6                   |D-data-4                |
+------------------------+------------------------+------------------------+

When you are finished, exit the ksqlDB CLI by entering CTRL-D and clean up the containers used for this tutorial by running:

docker compose -f ./docker/docker-compose-ksqldb.yml down

Do you have questions or comments? Join us in the #developer-confluent-io community Slack channel to engage in discussions with the creators of this content.

Apache Iceberg ™

Kafka® 101

Apache Flink® SQL

Apache Flink® Table API: Processing Data Streams in Java

Designing Event-Driven Microservices

Apache Flink® 101

Building Flink® Apps in Java

Kafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Articles

Patterns

FAQs

NEWBlog

Streamables

Learn More

Language Guides

Tutorials

Demos

Meetups

Community Slack

Community Catalysts

Community Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2026

Past Current and Kafka Summit events

How to handle heterogeneous JSON with ksqlDB

How to handle heterogeneous JSON with ksqlDB

Set Up

Running the example

Prerequisites

Run the commands