Frequently asked questions and answers about Kafka monitoring, best tools, and more.
When monitoring Kafka, you need to be able to answer questions such as:
There are many monitoring options for your Kafka cluster and related services. If you are using Confluent, you can use Confluent Health+, which includes a cloud-based dashboard, has many built-in triggers and alerts, has the ability to send notifications to Slack, PagerDuty, generic webhooks, etc., and integrates with other monitoring tools.
To use Health+, you'll need to enable Confluent Telemetry Reporter, which is already part of Confluent Platform, or you can install it separately:
yum install confluent-telemetry
This documentation page shows the collected metadata that powers Health+.
There are also various open source tools that can be combined to build powerful monitoring solutions, such as Prometheus and Grafana, or Beats, Elasticsearch, and Kibana, as well as various other tools discussed here.
You can enable remote JMX-based monitoring tools such as Prometheus to connect to your Kafka services, including brokers, Kafka Connect workers, etc., as well as to your clients, e.g. producers and consumers. Be sure to start the services with the JMX port open.
For example, to start a broker with JMX port 9999 open, run the following from your prompt:
export JMX_PORT=9999
./bin/kafka-server-start.sh
For additional documentation on JMX, see the documentation.
Consumer lag is an important performance indicator. It tells you the offset difference between the producer’s last produced message and the consumer group’s last commit. A large consumer lag, or a quickly growing lag, indicates that the consumer is not able to keep up with the volume of messages on a topic.
The key metrics to monitor for consumer lag is the MBean object: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=<client_id>
To see consumer lag in action, see the scenario in this example.
There are a wide variety of tools that can monitor your Kafka cluster and applications. If you're already using Confluent Cloud or Confluent Platform, a good place to start is Confluent Health+, which offers a cloud-based dashboard and numerous built-in triggers, alerts, and integrations.
For Apache Kafka deployments, you can consider JMX-based monitoring tools or you can build your own integration with other open source tools such as Datadog or Prometheus.
Learn how Kafka works, how to use it, and how to get started.
This hands-on course will show you how to build event-driven applications with Spring Boot and Kafka Streams.
Build a scalable, streaming data pipeline in under 20 minutes using Kafka and Confluent.