Get Started Free
Danica Fine

Danica Fine

Senior Developer Advocate (Presenter)

Metrics and Monitoring for Kafka Connect

metrics-and-monitoring-for-kafka-connect

There are two broad ways to monitor Kafka Connect:

  • Within the Confluent Kafka ecosystem, the Confluent Cloud Console and Confluent Platform Control Center are the easiest options to get started with monitoring a connector instance. The Confluent Metrics API is another option that can be used to collect metrics that may then be integrated with third-party monitoring tools such as Datadog, Dynatrace, Grafana, and Prometheus.
  • Another option compatible with Confluent Kafka and Apache Kafka is to monitor data exposed directly by Kafka Connect, such as JMX and REST.

Managed Connector Overview

managed-connector-overview

From the Confluent Cloud UI, there are several views for monitoring an individual connector’s status. A good place to start is the Connector Overview window. Here you can find a connector’s current status, how many messages it has processed, whether there is any lag occurring, and also whether any potentially problematic messages have been written to the dead letter queue.

The Overview window also includes an option to open the Stream lineage window which shows where the connector fits within related event streams.

Confluent Stream Lineage

stream-lineage

Stream lineage provides a graphical UI of event streams and data relationships with both a bird’s eye view and drill-down magnification for answering questions like:

  • Where did data come from?
  • Where is it going?
  • Where, when, and how was it transformed?

Viewing a connector in Stream lineage lets you easily identify its relationship within event streams.

  • For source connectors, you can see what topic the connector is producing records to.
  • For sink connectors, you can see what topic or topics the connector is consuming records from.

Mousing over the connector displays a popup with its details. If the connector or other elements of the event stream are clicked, a corresponding connector overview tab opens.

Stream Lineage - Connector Overview

connector-overview

The Stream lineage connector overview tab displays much of the same details that the primary Connector Overview window displays.

A Consumers and a Tasks tab is also available.

Stream Lineage - Connector Consumers

connector-consumers

The Consumers tab displays the list of consumer clients being used by the connector instance to read from the Kafka topic partitions. If a connector instance has multiple tasks, this list will contain multiple consumer clients.

Stream Lineage - Connector Tasks

connector-tasks

The Tasks tab shows the connector’s tasks and their status.

Confluent Consumer Lag Tab

consumer-lag

You can view consumption lag information related to the connector by navigating to the Clients window under Data Integration. To see the consumer lag for a particular connector, navigate to the Consumer lag tab and select the consumer group whose ID includes the connector ID.

Confluent Consumer Lag Details

consumer-lag-details

The window that appears shows the current lag for each partition of the topic being consumed by the connector.

Most of the connector information that is provided by the Confluent Cloud UI is available in the Confluent Control Center for self-managed connectors being run in conjunction with a Confluent Platform Kafka cluster.

Third-Party Monitoring Integration

third-party-monitoring-integration

Datadog and Grafana Cloud provide integrations and Dynatrace provides an extension that allows users to input a Cloud API key, select resources to monitor, and see metrics in minutes in prebuilt dashboards.

Prometheus servers can scrape the Confluent Cloud Metrics API directly by making use of the export endpoint. This endpoint returns the single most recent data point for each metric, for each distinct combination of labels in the Prometheus exposition or Open Metrics format.

Confluent Metrics API

The Confluent Cloud Metrics API provides actionable operational metrics about your Confluent Cloud deployment. This is a queryable HTTP API in which the user will POST a query written in JSON and get back a time series of metrics specified by the query.

The object model for the Metrics API is designed similarly to the OpenTelemetry standard.

  • Metrics API endpoints are available to:
  • List metric descriptors
  • List resource descriptors
  • Query metric values
  • Export metric values
  • Query label values

Monitoring Self-Managed Kafka Connect

monitoring-self-managed-connect

Kafka Connect exposes various data for monitoring over JMX and REST, and this collection is ever expanding (see, for example, KIP-475). This is an option when Confluent tools are not available, e.g. with self-managed Kafka Connect. To use JMX, you just need to be familiar with the MBeans that are exposed, and you need a tool for gathering data from them and visualizing it.

In the JMX-based setup in this image, we see data for total messages read by a sink connector, as well as information about Connect error totals, source records read, and dead letter queue requests. With a little bit of tooling, you can build alerting on this data and expand your own observability framework. All of this setup can be a lot of work, though, so if you can do it in a fully managed way, it is far easier.

Kafka Connect also exposes information about the status of tasks and connectors on its REST interface, which we covered in the previous module.

Use the promo code 101CONNECT to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.