Get Started Free

Monitoring Kafka FAQs

Frequently asked questions and answers about Kafka monitoring, best tools, and more.

How do I monitor Kafka?

When monitoring Kafka, you need to be able to answer questions such as:

  • Are my applications receiving all data?
  • Are my business applications showing the latest data?
  • Why are my applications running slowly?
  • Do I need to scale up?
  • Can my data get lost?

There are many monitoring options for your Kafka cluster and related services. If you are using Confluent, you can use Confluent Health+, which includes a cloud-based dashboard, has many built-in triggers and alerts, has the ability to send notifications to Slack, PagerDuty, generic webhooks, etc., and integrates with other monitoring tools.

To use Health+, you'll need to enable Confluent Telemetry Reporter, which is already part of Confluent Platform, or you can install it separately:

yum install confluent-telemetry

This documentation page shows the collected metadata that powers Health+.

There are also various open source tools that can be combined to build powerful monitoring solutions, such as Prometheus and Grafana, or Beats, Elasticsearch, and Kibana, as well as various other tools discussed here.

How to enable JMX in Kafka

You can enable remote JMX-based monitoring tools such as Prometheus to connect to your Kafka services, including brokers, Kafka Connect workers, etc., as well as to your clients, e.g. producers and consumers. Be sure to start the services with the JMX port open.

For example, to start a broker with JMX port 9999 open, run the following from your prompt:

export JMX_PORT=9999

For additional documentation on JMX, see the documentation.

How to monitor Kafka consumer lag

Consumer lag is an important performance indicator. It tells you the offset difference between the producer’s last produced message and the consumer group’s last commit. A large consumer lag, or a quickly growing lag, indicates that the consumer is not able to keep up with the volume of messages on a topic.

The key metrics to monitor for consumer lag is the MBean object: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=<client_id>

To see consumer lag in action, see the scenario in this example.

What tools are there for monitoring Kafka?

There are a wide variety of tools that can monitor your Kafka cluster and applications. If you're already using Confluent Cloud or Confluent Platform, a good place to start is Confluent Health+, which offers a cloud-based dashboard and numerous built-in triggers, alerts, and integrations.

For Apache Kafka deployments, you can consider JMX-based monitoring tools or you can build your own integration with other open source tools such as Datadog or Prometheus.

Learn more with these free training courses

Apache Kafka® 101

Learn how Kafka works, how to use it, and how to get started.

Spring Framework and Apache Kafka®

This hands-on course will show you how to build event-driven applications with Spring Boot and Kafka Streams.

Building Data Pipelines with Apache Kafka® and Confluent

Build a scalable, streaming data pipeline in under 20 minutes using Kafka and Confluent.

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free