course: Mastering Production Data Streaming Systems with Apache Kafka®

Automate the Road to Production

16 min

Gilles Philippart

Software Practice Lead

Early automation saves time and money. GitOps improves CI/CD pipeline, enhancing operations & traceability. Learn to use GitOps for data streaming platforms & applications.

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo codes KAFKAPROD101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Get Started

Automate the Road to Production

If you need to remember one thing before you start building the platform it would be this:

Investing early on in automation will save you a lot of time and money.

Now I have your attention ;-).

Engineering practices such as Continuous Integration, Continuous Delivery and now GitOps have become the defacto approaches for many teams to cope with the fast-paced changes in software development.

In this module we’re going to see what GitOps mean and what a modern automated CI/CD pipeline should look like when building a data streaming system.

GitOps extends the principles of DevOps, utilizing Git to describe and automate the desired state of infrastructure and applications, enhancing practices like Infrastructure as Code and CI/CD pipelines.

By adopting GitOps, organizations can achieve numerous benefits, including repeatable deployments, easier auditability, increased developer and operational productivity, improved stability and reliability, and stronger security guarantees.

There are 4 principles for GitOps :

It’s Declarative: In GitOps, the system's desired state is expressed declaratively using configuration files.

Instead of outlining the steps to achieve a state, these files describe the final configuration.

It’s Versioned and Immutable: The desired state is stored in a way that enforces immutability, versioning and retains a complete history, which is great for auditability.

It’s Pulled Automatically: Software agents automatically pull the desired state declarations from the source.

It’s also Continuously Reconciled: Software agents continuously observe actual system state and attempt to apply the desired state.

You will need the following components to implement GitOps:

a Git Hosting service to manage your git repositories a Container Platform, to run, manage, and scale containerized applications.

Usually that will be Kubernetes.

a Container Image Registry, to store your application images.

A CI Server to build the application, package it up in an image and publish it to the Image Registry.

And finally, a GitOps Agent, which is a piece of software that continuously monitors a Git repository and applies changes to the target environment.

CI/CD pipelines are automated sequences to build, test, and deploy software, with customizable stages to perform security checks or performance tests for example.

In data streaming systems, it’s best to create two kinds of pipelines.

The platform pipeline is owned by Operators to create and update the platform infrastructure and manage control plane resources.

And the application pipeline, owned by developers for building and deploying applications but also managing the data plane resources such as topics, schemas, connectors and business metadata.

Alright, let’s see first how we can create a CI/CD pipeline to deploy Apache Kafka on Kubernetes in the cloud.

First, we will need a Git Hosting Service.

GitHub or GitLabs are two very popular options.

We will also need a CI service to run the CI pipeline.

There are plenty of options like TravisCI, CircleCI, Drone, but Github and GitLab can also provide that service.

For the CD part, we will need a few tools from the Cloud Native community: Terraform, Kustomize and FluxCD or ArgoCD.

Terraform is an open source infrastructure-as-code tool that lets you build, change, and version your cloud data infrastructure in a safe and efficient way.

With Terraform you can write human-readable, declarative configuration files that you can version, reuse, share, and deploy in your CI/CD pipelines.

Kustomize is a kubernetes tool for composing manifests tailored for different environments.

FluxCD and ArgoCD are GitOps agents designed to sync the state between Git and Kubernetes in real time.

Both are Cloud Native Computing Foundation projects so there’s a large community behind each tool, committed to provide support and help developers get started.

This pipeline will automate the creation of the Kubernetes Cluster, the installation of the GitOps agent, and then the creation and maintenance of our Streaming Platform based on Apache Kafka.

The first step is to create a Git repository to store all the infrastructure configuration files and start defining the CI part of the pipeline as code using a CI tool, for example GitHub Actions.

The CI can for example verify the validity of the terraform and kubernetes files.

With Terraform, we will be able to create the Kubernetes cluster in our cloud provider of choice.

Next up, in the bootstrap phase, you must install ArgoCD or FluxCD once the cluster has been provisioned.

Note that FluxCD can be installed with a dedicated Flux Terraform provider.

Once the GitOps agent has been installed in the Kubernetes cluster, you just have to configure it to start tracking the changes in our Git repository.

We can now create a “/prod” folder in Git and Flux or Argo will automatically create Kafka resources: brokers, Zookeeper, Kafka Connect, the Schema Registry etc.

This can be done via the Kubernetes operator for Kafka of your choice.

There are a few free options available with varying levels of support, so you might want to check which one works best for you.

When you open a pull request in the repository and push a commit, the CI pipeline will trigger on the branch and perform quality checks.

Whenever you merge a pull request in the repository, the GitOps agent will detect that and update the target environment.

You can build a similar CI/CD pipeline when using Confluent Cloud for your Data Streaming Platform infrastructure.

As you can see, the setup is even simpler as you don’t need Kubernetes.

It’s just Terraform to create and update all the infrastructure.

With the Terraform Provider for Confluent Cloud, you can choose your cloud provider, manage all resources including Environments, Clusters, API keys, Role Bindings, Service Accounts, Quotas, and more.

You can leverage Terraform workspaces to create multiple environments, for example: development, staging and production.

So, for example, whenever you make a change in the prod folder, the pipeline triggers and will update the production environment with Terraform.

Now, let’s talk about the CI/CD pipeline for streaming applications.

The important first thing is that GitOps best practices recommend having two repositories: one for the application code and another one for the application configuration.

The application code repository contains the code, schema definitions, and ksqlDB migration scripts.

The application config repository contains Kubernetes files, property files, and Terraform files, organized in distinct folders such as "staging" and "prod".

You need at least two Kubernetes clusters, staging and production, with a GitOps agent, either ArgoCD or FluxCD installed.

The agent in each environment will monitor the Git repository for changes in the matching folder.

The developer workflow involves committing in the application source code repository, which triggers the CI pipeline to build the app, run the tests, check the quality, assess vulnerabilities and package the app in a container image with a new tag.

For deploying to staging, a pull request is opened against the application config repository, updating the Kubernetes deployment manifest with the new image tag and other required changes for deployment, such as topics, schemas, connectors or ACLs.

It’s up to you to decide how you want to make the schema definitions available to this repository, either by copying them manually over from the application code repo or by using git submodules.

After merging this pull request, the GitOps agent updates the staging environment as per the new desired state.

Pre-deploy hooks can allow additional tasks such as infrastructure changes with Terraform or running ksqlDB migrations.

Likewise, you can specify hooks after the sync, for example for running acceptance tests or performance tests.

To promote changes to Production, a new pull request on the application config repo is required.

You need to copy the relevant files to the "/prod" folder.

The GitOps agent will then sync the changes to the production environment.

Rollbacks are as easy as reverting a commit.

Here’s a few tips to manage environments.

Keep your environments secure and organized by clearly separating Dev, Staging, Production, and DR.

The GitOps community recommends the folder-per-environment approach in the main branch which is more flexible than using different Git branches.

You can promote across environments by copying files across directories.

Just use your favorite diff tool for that.

You want to avoid having to do any manual edits or conflict resolution.

So, it’s best to keep the types of configuration parameter files separate.

For example, some files contain values that will never change whatever the environment, but other files must absolutely have different values.

Just keep them in different subfolders.

You can find more best practices on various GitOps community websites.

Secrets can also be stored safely in the repository with a utility tool like sealed-secrets, but if your infosec department requires it, you can also store secrets externally and use the ‘external-secrets’ operator to integrate with services such as AWS Secrets Manager or Hashicorp Vault.

Another important point is that you should never use non-deterministic tags such as ‘latest’ when referring to images in your deployment config as they prevent reproducible deployments.

Last but not least, just let the GitOps agent manage the cluster, so don’t run manual ‘kubectl’ commands or scripts unless you have a very good reason to do so.

In case rogue commands are being issued on the cluster, configure the GitOps agent to automatically fix those changes and restore the desired state.

Let’s focus a bit on the testing stages.

There’s a variety of testing tools that you can use in the Kafka space.

For the unit tests, you can use MockProducer, MockConsumer if you’re using Java or ‘Rdkafka_mock’ if you’re using C, .Net or Python.

For the integration tests, TestContainers can run lightweight and disposable instances of kafka clusters, and is easier to use than tweaking docker-compose files.

The rub is that unit and integration tests use a mocked or local Kafka broker, so for example, the records are not sent across a network. For later stage application testing, it’s vital to test the actual client application by sending messages to a real cluster.

It’s really up to you to decide how to do this, maybe by spinning up ephemeral test environments or using a shared staging environment which can be reset from time to time.

Most GitOps agents have a deploy hook mechanism.

With a post-sync hook you can run the tests when all deployments are done and all services are ready.

You can report back the test results by updating the commit status with your Git Hosting Service, for example GitHub or Gitlab.

When you have ksqlDB code, there’s this question of how you should create and update the persistent queries in the target environment.

Confluent has built a tool to do exactly that: ksql-migrations. It’s very straightforward to set up.

Run the tool’s ‘create’ command in your application code repository to write your first ksqldb migration file and then add the ksqlDB statements you need.

If you’re using ArgoCD, you can just configure a pre-sync hook.

Configure this hook with the target ksqldb server and have it run the ksqlDB migration tool with your app’s ksqlDB migration files.

If you’re using FluxCD, you can arrive at the same end result with pre-deployment jobs running in kubernetes pods.

Schema validation is an important step in the development and deployment process.

First, you must understand the various schema compatibility types like FORWARD, BACKWARD, FULL, and the transitive variants of those.

So, make sure to read the documentation to know which changes you are allowed to make and in which order you should update your producers and consumers.

To have better control over the evolution and registration of schemas, it is recommended to register schemas outside the client application.

We really recommend setting the ‘auto.register.schemas’ config to ‘false’.

Now, to check the schema compatibility you have 2 options:

First, you can call the Schema Registry ‘compatibility’ endpoint directly.

But most of the time, it’s just easier to use the Schema Registry Maven Plugin.

It has a maven goal to run the compatibility check locally, without a schema registry. This can be helpful to get fast feedback on a feature branch.

For example, if you’ve just committed a schema breaking change in your application code repository you would like to know this as soon as possible and not wait for the deployment.

Now, when the time comes to deploy your application, you just need to run the “test-compatibility” goal prior to that.

This will call the environment’s Schema Registry API to verify that your schema updates didn’t introduce a breaking change.

Let’s finish this module by having a look at a few more GitOps tips.

Avoid using the UI or the CLI to create or update the Platform infrastructure for staging or production environments.

The UI or the CLI are fine though for querying or for fiddling with a development environment.

Keep in mind that GitOps requires discipline and collective effort.

Start with a small and straightforward process, as too many rules and steps might deter developers and operators from following them.

Avoid creating your own tools, as they’re hard to maintain as ecosystems mature.

Use a publicly available tool designed to be extended instead.

Finally, create infrastructure using service accounts only, not user accounts.

The classic problem is that when the person who created the infrastructure just left the company, everyone is locked out.

A variant of that being, he is on holiday and his password just expired.

If you aren’t already on Confluent Developer, head there now using the link in the video description to access other courses, hands-on exercises, and many other resources.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 10