Staff Software Practice Lead
The Confluent Stream Catalog provides a set of tools that make it easier to discover and use streams. Using the Catalog, we can tag streams with relevant details to make them easier to find. We can search through schemas to determine exactly what information is stored in a given stream. And, once we locate the stream, we can add business metadata to help us determine who is responsible for the stream, and what its purpose is. In this video, we'll explore how to use the Stream Catalog. We'll see how to tag streams, add business metadata, search for streams, and use the REST and GraphQL APIs.
Topics:
curl -u <API-KEY>:<API-SECRET> \
--request GET \
--url '<SCHEMA-REGISTRY-URL>/catalog/v1/search/basic?type=sr_field&query=orderId'
curl -u <API-KEY>:<API-SECRET> \
--request POST \
-H 'content-type: application/json' \
--url '<SCHEMA-REGISTRY-URL>/catalog/graphql' \
--data '{ "query":"{ sr_field(tags: [\"PII\"]) { name qualifiedName } }" }'
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
The Confluent Stream Catalog provides a set of tools that make it easier to discover and use streams. They allow us to tag our streams and apply business metadata to them. This is combined with rich search capabilities so that we can easily find the streams that we need. Before we look at tagging and metadata, let's look at some of the search basics. The Confluent Cloud UI includes a search box where you can enter any keyword you wish to find. It will search environments, topics, schemas and more, to find the search term. It will then provide a set of results based on your search. For example, if we had a topic named "Customers", and a schema attached to that topic, searching for "Customers" would return both in the results, but we can do more than just search for a schema. We can even search within the schema for a specific field. For example, we could search for the "address" and it would show us that the "Customers-value" schema contains that field. These searches even include older versions of the schema. However, sometimes we need to search for something less specific. For example, what if we wanted to search for all Personally Identifiable Information, or PII? It's unlikely there would be a field or a topic named "PII", and as a result, our search would fail, but what if we did want to search for categories rather than specific values? The Stream Catalog provides support for tags that can fulfill our need to categorize the data. We can create freeform tags for any category we want. However, there are some built in tags, such as PII, that can be found in the "Recommended" section. Once a tag has been created, it can be applied to a topic, a schema record, or even in two individual fields. For example, we could apply the PII tag to the "address" field in the "Customers-value" schema, or we could apply it to the schema record itself. This would allow anyone searching for the PII tag to locate the specific field in the schema. Once we've located a stream, it can be helpful to provide some additional business data for that stream. As an example, we may want to include contact information such as a name and email address. That way if someone is looking at the stream, they will know who to talk to if they have questions. Business metadata is different from tags in a couple of ways. First, it can contain multiple custom values rather than just a name and description. Second, it is applied to the schema rather than an individual field. Once we have created the metadata, we can attach it to our schema or topic. If someone is viewing the schema, they will be able to see the metadata we defined. We can even apply tags directly to the schema at the same level as the metadata. These tags will be visible to anyone viewing the schema. Up to this point, we've been interacting with the catalog entirely through the Confluent Cloud Console. However, many of the operations we have have seen can also be performed through APIs. The REST API supports operations such as creating and managing tags and business metadata, as well as searching the catalog. Meanwhile, the GraphQL API provides a more flexible and powerful form of search. All of these operations can be performed using a standard HTTP client. Let's put all of this together. Imagine we have created a "Customers" stream. We have added the appropriate PII tags, and potentially others, to the stream. We have also recorded our name and email address in the business metadata. Now, another user has gone to the catalog, perhaps looking to see if they can get access to the customer data such as the address. Using the search, they look up address and quickly find our stream. Opening the schema, they can see that this stream contains personal information. They can also see the name and email address of the person responsible for the data. At this point, they could approach us with questions but maybe they don't need to. They already have access to the name of the topic and the schema for the data. They could easily download the schema, use it to generate application code, and immediately start consuming the stream. This means that in a span of a few minutes, they could go from having no idea where the data lives to consuming that data in their application, all without ever needing outside help. This kind of self-serve discovery is the primary goal of the Stream Catalog. If you aren't already on Confluent Developer, head there now using the link in the video description to access the rest of this course and its hands-on exercises.