Get Started Free

Amazon S3 Kafka Connector Setup & Configuration

Learn how to set up and configure the Amazon S3 Sink connector with Confluent Cloud.

8 min

Amazon S3 Kafka Connector Setup & Configuration

8 min
dan-weston

Dan Weston

Senior Curriculum Developer

Learn how to set up and configure the Amazon S3 Sink connector with Confluent Cloud.

Resources

Kafka Connect tutorials, courses, and other resources on real-time data streaming and integration: Confluent Developer

Amazon S3 Kafka Connector Setup & Configuration

Introduction

Hi, I'm Dan Weston from Confluent. And in this video, we'll take a look at how to set up the S3 Sink connector in Confluent Cloud. If you'd like to follow along, you'll need both an Amazon and a Confluent Cloud account.

Prerequisites

If you don't have one, that's totally okay. Feel free to set up a trial. And then when you're ready to put the S3 Sink connector into production, you'll know what you need to do and the steps necessary to put it into use. Before we get into the details, I wanted to talk a little bit about what a connector is, specifically the S3 Sink connector.

Kafka connect intro

A connector is exactly what it sounds like. It's a way to connect your Kafka cluster with other sources or sinks. It provides an easy way to get data into or out of your Kafka cluster. The S3 Sink connector specifically allows you to take messages in a topic and send and store them in an S3 bucket.

Demo configuration

For this demo, I'm using a trial, AWS and a demo Confluent Cloud account. While I won't go into detail about how I set up the AWS account, I will show you the basic configuration of mine and hope that you can work with your organization to do something similar. Last, you'll need to make sure that your Confluent Cloud cluster is in the same region as your AWS S3 bucket, otherwise the two won't be able to connect. With all that out of the way, let's see how quick and easy setting up the S3 connector can be.

Getting data into confluent cloud

We'll start here on Confluent Cloud, where, as you can see, there's no data and not even any topics. So I could just go and create a topic. But since I know that I'm going to be using the Datagen Source connector, I'm going to go ahead and click on Connectors, select the Datagen Source connector. And you can see I can add a new topic from here. I'll click add a new topic, give the topic a name, leave the default partitions to six and click create with defaults. Now you can see my topic is available right there inside the connector. I'll select that connector and click continue. I'm going to want this to have global access and then of course I'll need to generate and download the API key. Make sure to give it a description and click continue. I know the format that I'm going to be having these messages stored in is Avro, so I'll select Avro and then I'll go ahead and select the Orders template. I'll then click continue one more time. For the connector sizing, I'll leave it as the default as one and click continue. And now I can review all of the configuration. If I'm happy with everything and everything looks good. In my case, I'll click Continue and the connector will be created. Now I can wait until this is fully provisioned or I can go over to my topics, click on my topic, name, click on messages and wait until I start seeing messages start to come in. And there they are. At this point, let's switch over to my S3 console.

Amazon S3 configuration

As you can see, I've already created a bucket. If we take a look at this bucket, you can see there's no data in there. And I've turned off blocking all public access. I've also, on the management side, created a new policy. And if we take a look at the JSON, you can see the roles that I've granted. I've allowed anyone with this policy to list all the buckets, see my connector sample bucket information, location, and multipart uploads. I've also allowed putting objects object tagging, getting objects, uploading multipart files and listing multipart uploads. I've also created an S3 user, assigned them the policy and given them an access key that we'll use to connect. All right, now back to Confluent Cloud. Our messages are still coming in, so that's good news.

S3 connector setup

And we'll go ahead and go over and create the new Sink connector. I'll click on the Amazon S3 Sink. If it's not appearing here, you, of course, can either click add a connector or see all connectors. I’ll select the topic. I'll also give it global access and of course generate and download the API key. I'll click continue. Now it's asking for my Amazon access key ID and the secret access key as well as the bucket name. Now, this is the access key that I've given to my user over here. Since I've already downloaded, all I have to do is copy and paste the key ID and the secret access key. Now we remember, my bucket name is connector sample, so I can either copy that or enter it in manually. Then I'll click continue. Again, our messages were in Avro format. The time interval, in this case I want it to go over in an hourly cadence and the flush size will leave with at the default 1000. I'll click continue. I'll leave the task size at the default one. Click continue and then I can review my configuration. In this case, I'm happy with everything, so I’ll click continue one more time to actually set up the connector. Now I can click on the connector and I can start watching it as the messages are processed and it starts sending them over to S3. Now, depending on what time you created this, it's going to be a while before you actually start to see the messages appear over on Amazon S3. In this case, I created this at 1:04 p.m., so since I set up an hourly cadence, I'm going to have to wait until 2:00 my time in order for the messages to start showing up. So through the magic of editing, we'll skip to the future where we can see our messages being processed and sent over to Amazon S3. As you can see, it's been just about an hour and we've processed almost 6000 messages. You'll also notice that as part of adding this connector, we've also created a dead letter queue topic. If you've misconfigured anything along the way, this is where your messages will end up until the issue is fixed. Let's pop over to our S3 bucket to see if the messages have started to appear.

Verify your messages

If I hit refresh, we'll see a new topics folder has been created. The name of our topic, the year, the month, the day, and the hour. And there we go. We can see all of our messages appearing in our S3 bucket. That's it.

It's a wrap!

You've now successfully connected and started to send your messages to S3. Be sure to check out the documentation for the S3 connector for more details and information. You'll also want to subscribe to this channel and click the bell icon to be notified of all of our new videos. You can also head to developer.confluent.io to see other courses and resources for learning Apache Kafka. Last, be sure to comment below if you have another connector video you'd like to see. Until next time, have fun processing messages.

Related Videos

You may also be interested in:

What is a Kafka Consumer and How does it work?

How to Evolve your Schemas with Migration Rules | Data Quality Rules