Learn how the Data Portal and Apache Flink® in Confluent Cloud can help developers and data practitioners find the data they need to quickly create new data products.

Software Practice Lead
Learn how the Data Portal and Apache Flink® in Confluent Cloud can help developers and data practitioners find the data they need to quickly create new data products.
Hey, I’m Gilles from Confluent.
Have you already been in a situation  where there are so many topics in your
Data Streaming Platform, that you spend a lot of  time searching for the data you want to process.
If so, you might have a Data  Discoverability problem.
Stick around and I'll show you how the  Data Portal in Confluent Cloud can fix it.
Data Portal Overview
If you have only a handful of topics, you  probably know their names, which team owns them,
and you can probably find the fields you're  looking for just by browsing the topics around.
But once you get in the dozens, it becomes quite  hard to remember which one is which, and it's kind
of cumbersome to have to open each topic to see  what the data looks like, or if it contains PII.
As a developer, I've often spent  a lot of time trying to find the
right data that I needed to  consume in my applications.
I remember documenting which data  my team produced in Wiki pages.
Unfortunately, no one knew where  to find the documentation in the
first place and it often went out  of date quickly, so it's not ideal.
But today, there's a better way.
I'm going to show you how the Data Portal  can make the lives of developers and data
practitioners much easier when they want to  discover which data is available for consumption.
Let's get into a quick demo.
I have created a fresh Basic Cluster with  the Advanced Stream Governance package.
I've created several topics and  produced data in each one to
simulate a Data Streaming Platform  after a few months in production.
The Data Portal is located in  the sidebar for quick access.
Let's open it.
It displays the recently added  and recently modified topics.
If you have deployed connectors,
there's also a section here which displays  source and sink topics grouped by connector.
You can tell quickly if all your  connectors are running properly
or if the data isn’t going through because an  error occurred or the connector has been closed
Let's view all the topics  by clicking on this link.
First, you need select an environment  for example Staging, QA or Production.
I can filter by tag or business metadata.
We will get to that in a moment.
I can also filter by Cloud provider, Region,
Cluster, Creation date, Modified  date and even retention period.
I also have some sorting options.
I want them sorted by name.
Finally, I can type a few  characters in the textbox
and the matching topic names  will show up in the dropdown.
I can click the preview icon to get a  sneak peek of each topic and check the
last message produced and the associated schema.
Of course, I can click on any  tile below to do the same.
Query with Flink
But maybe having just the last  message might not be enough to
have a good sense of what the data looks like.
So I'm going to show you how easy it  is to query the data with Flink SQL.
If I click on the "Query" button,  I can write a Flink query.
I'm not limited to just querying  the topic I've selected.
I can join these two topics together for example:
Note that I am producing data to both topics as
I speak so this query is streaming  the results to the UI in real time.
Requesting access to data
Of course, you'll need to be granted access  to the data before being able to query it.
The required role is DeveloperRead.
It might have been assigned to you  by your environment admin – and that
was the case in the example I've  shown before – but if it hasn't,
it's very easy to request access  to a particular topic from the UI.
Let me show you how.
For example, let's say William from the Audit team
wants to check out the data  in the insurance_offer topic.
He can request access directly by sending a
message to the topic owner  who will receive an email.
In this case, the topic owner is me.
Once I've received the notification email, I can  go to the Access Requests tab in the settings,
select the line, and click  the "Approve access" button.
Okay, time to get started  organizing and documenting!
Create tags via UI
I'm going to create a tag called  PII and assign it to a topic.
To add a tag to a topic, click on a tile  and then View Topic next to the topic name.
You can now select the tag that  you want to assign to this topic.
Let's add a description too while we're at it.
If I go back to the data portal  home page, and click on the PII tag,
it will only display topics which have this tag.
Tags are great for categorizing, but at times,
Add Business Metadata
you would like to have a way to add some  additional information to your topics.
That's exactly what "business metadata" are for.
The difference with tags is that business metadata  have attributes in the form of key value pairs.
You can give a value to each  key when assigning the business
metadata to an entity such as a topic,  a schema or even an environment.
First, let's head over to the environment and  create a "business metadata" object called
"Domain" with a name, a team-owner  and a slack-channel attributes.
Next, we're going to assign it to a topic.
In the data portal, I'm going to search for  the insurance_customer_activity topic.
Ok, let's see, "Name" is the "Contracts"  domain, Team-owner is "Contracts Engineering",
and "contracts-eng" is the Slack-channel  in which the team can be reached out.
If I go back to the data portal page, I  can now filter by this business metadata.
For example, I can show only the topics  who belong to the "Contracts" domain.
Likewise, I can display only the topics  who belong to the "Offers" domain.
Now, it would be a bit cumbersome to add  dozens of tags and business metadata for
Create business metadata and tags via Terraform
each environment and assign them  to topics or schemas via the UI.
There are two additional ways to do that.
You can either use the Confluent Cloud API,
for example using cURL in a shell  script or in a python program.
But it's even better to use a declarative  approach using Confluent Cloud Terraform Provider.
It's the recommended approach  in a production setup and it
works particularly well as a CI/CD step.
I have used the template available in  the Terraform Provider GitHub repo to
import all Kafka resources from  my cluster into a Terraform file.
Here are my insurance related  topics plus all other topics.
I can now create additional  tags or business metadata,
associate them to topics  and run terraform apply.
Now, with little effort, I've added more  tags and business metadata via Terraform.
It will document and help teams  quickly discover the data they need.
I have 5 more tags to clearly identify  which topics are data products,
which ones have sensitive data  and which ones are deprecated.
Speaking of deprecation, I've even added a  "deprecation notice" business metadata which
indicates when a topic has been retired or will be  retired and which topic people should use instead.
So, even if folks missed the memo, it's right  there for them to see, on the topic itself.
As you can see, data portal brings  visibility into what data exists and
Conclusion
makes it much faster for teams to build  new streaming applications and pipelines.
It's available in Confluent  Cloud for users with a Stream
Governance package enabled in their environments.
Alright, If you have any questions about the  Data Portal, put them in the comments below.
I hope you enjoyed the video, please like,  share and subscribe to support this content.
And if you want a deep dive into data streaming  with Apache Kafka or Stream Processing with
Apache Flink, check out our Youtube channel  and the courses on developer.confluent.io.
Thank you for watching and see you next time!