Senior Developer Advocate (Presenter)
There are two broad ways to monitor Kafka Connect:
From the Confluent Cloud UI, there are several views for monitoring an individual connector’s status. A good place to start is the Connector Overview window. Here you can find a connector’s current status, how many messages it has processed, whether there is any lag occurring, and also whether any potentially problematic messages have been written to the dead letter queue.
The Overview window also includes an option to open the Stream lineage window which shows where the connector fits within related event streams.
Stream lineage provides a graphical UI of event streams and data relationships with both a bird’s eye view and drill-down magnification for answering questions like:
Viewing a connector in Stream lineage lets you easily identify its relationship within event streams.
Mousing over the connector displays a popup with its details. If the connector or other elements of the event stream are clicked, a corresponding connector overview tab opens.
The Stream lineage connector overview tab displays much of the same details that the primary Connector Overview window displays.
A Consumers and a Tasks tab is also available.
The Consumers tab displays the list of consumer clients being used by the connector instance to read from the Kafka topic partitions. If a connector instance has multiple tasks, this list will contain multiple consumer clients.
The Tasks tab shows the connector’s tasks and their status.
You can view consumption lag information related to the connector by navigating to the Clients window under Data Integration. To see the consumer lag for a particular connector, navigate to the Consumer lag tab and select the consumer group whose ID includes the connector ID.
The window that appears shows the current lag for each partition of the topic being consumed by the connector.
Most of the connector information that is provided by the Confluent Cloud UI is available in the Confluent Control Center for self-managed connectors being run in conjunction with a Confluent Platform Kafka cluster.
Datadog and Grafana Cloud provide integrations and Dynatrace provides an extension that allows users to input a Cloud API key, select resources to monitor, and see metrics in minutes in prebuilt dashboards.
Prometheus servers can scrape the Confluent Cloud Metrics API directly by making use of the export endpoint. This endpoint returns the single most recent data point for each metric, for each distinct combination of labels in the Prometheus exposition or Open Metrics format.
The Confluent Cloud Metrics API provides actionable operational metrics about your Confluent Cloud deployment. This is a queryable HTTP API in which the user will POST a query written in JSON and get back a time series of metrics specified by the query.
The object model for the Metrics API is designed similarly to the OpenTelemetry standard.
Kafka Connect exposes various data for monitoring over JMX and REST, and this collection is ever expanding (see, for example, KIP-475). This is an option when Confluent tools are not available, e.g. with self-managed Kafka Connect. To use JMX, you just need to be familiar with the MBeans that are exposed, and you need a tool for gathering data from them and visualizing it.
In the JMX-based setup in this image, we see data for total messages read by a sink connector, as well as information about Connect error totals, source records read, and dead letter queue requests. With a little bit of tooling, you can build alerting on this data and expand your own observability framework. All of this setup can be a lot of work, though, so if you can do it in a fully managed way, it is far easier.
Kafka Connect also exposes information about the status of tasks and connectors on its REST interface, which we covered in the previous module.
Hi, I'm Danica Fine, here to introduce you to methods for monitoring your Kafka Connect Instances. Now that you know a bit more about how to configure and run Kafka Connect, the next step is to understand how to monitor connect clusters and connectors and act on any irregularities you may encounter. Let's dive in. Depending on how you're running Kafka Connect, there are two main ways to monitor it. Within the Confluent Kafka ecosystem, the Confluent Cloud Console and Confluent Platform Control Center are the easiest options to get started with monitoring a connector instance. The Confluent Metrics API is another option that can be used to collect metrics that may then be integrated with third party monitoring tools, such as Datadog, Dynatrace, Grafana, and Prometheus. Another option that's compatible, both with Confluent Kafka and Apache Kafka, is to monitor data exposed directly by Kaka Connect, such as JMX and REST. From the Confluent Cloud UI, there are several views for monitoring an individual connector status. A good place to start is the Connector Overview window. Here you can find a connector's current status, how many messages it has processed, whether there is any lag occurring, and also, whether any potentially problematic messages have been written to the dead letter queue. The Overview Window also includes an option to open the Stream Lineage window, which shows where the Connector fits within related event streams. Using Stream Lineage, you can view a graphical UI of event streams and data relationships with both a bird's eye view and drill down magnification. With it, you can easily answer questions like, "Where is my data coming from? Where is it going, and who is consuming it? Where, when, and how has it been transformed?" Viewing the Connector in Stream Lineage is probably the easiest way to identify its relationship within event streams and the wider context. For source connectors, you can see which topic or topics the Connector is producing records to. For sink Connectors, you can see which topic or topics the Connector is consuming records from. Hovering over the Connector displays a popup with more details. If the Connector or other elements of the event stream are clicked a corresponding Connector Overview tab opens. This tab shows much of the same details that the primary Connector Overview window displays, but you'll also find the convenient Consumers and Tasks tabs. The Consumers tab displays the list of consumer clients being used by the Connector instance to read from the Kafka topic partitions. If a Connector instance has multiple tasks this list will contain multiple consumer clients. The Tasks tab shows the Connector's tasks and their status. You can view Consumption Lag information related to the Connector by navigating to the client's window under Data Integration. To see the Consumer Lag for a particular Connector, navigate to the Consumer Lag tab and select the consumer group whose ID includes the Connector ID. The window that appears shows the Current Lag for each partition of the topic being consumed by the Connector. Most of the Connector information that is provided by the Confluent Cloud UI is available in the Confluent Control Center for self-managed Connectors being run in conjunction with a Confluent Platform Kafka Cluster. Now, these tools are great, but sometimes it's nice to export your metrics to do more detailed monitoring. Thankfully, third party integrations do just that. Datadog and Grafana Cloud provide integrations and Dynatrace provides an extension that allows users to input a Cloud API key, select resources to monitor, and see metrics in minutes in pre-built dashboards. Prometheus servers can scrape the Confluent Cloud Metrics API directly, by making use of the Export Endpoint. This Endpoint returns the single most recent data point for each metric, for each distinct combination of labels in the Prometheus Exposition or open metrics format. To get metrics outside of the Confluent Cloud Console the Confluent Cloud Metrics API is there to provide actionable, operational metrics about your Confluent Cloud Deployment, wherever you need them. This is a queryable HTTP API, in which the user will post a query written in JSON and get back a time series of metrics specified by the query. The object model for the Metrics API is designed similarly to the Open Telemetry Standard. Metrics API endpoints are available to list metric descriptors, list resource descriptors, query metric values, export metric values, and query label values. For use across any deployment, Kafka Connect exposes various data for monitoring over JMX and REST. And this collection is ever expanding. As an example, check out KIP 475. These metrics are a great option when Confluent Tools are not available. For example, with self-managed Kafka Connect. To use JMX, you just need to be familiar with the MBeans that are exposed. And you need a tool for gathering data from the metrics and visualizing them. In the JMX base setup seen here, we see data for total messages read by a sink connector, as well as information about Connect error totals, source records read and dead letter queue requests. With a little bit of tooling, you can build alerting on top of this data and expand your own observability framework. All of this setup can be a lot of work though, so if you can do it in a fully managed way, it might be a bit easier. Kafka Connect also exposes information about the status of Tasks and Connectors on its REST interface, which we covered in a previous module. Regardless of what your monitoring needs are, you should have enough to get on out there and start monitoring your own Kafka Connect Instance. See you in the following modules, where we'll discuss further methods for debugging and error handling of Kafka Connect.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.