Staff Solutions Engineer (Presenter)
Let’s assume you are a Confluent Cloud user and need to transfer data back and forth between it and your apps. Confluent Cloud runs in its own virtual network in one of three cloud providers: AWS, Google Cloud, or Azure. Your apps run in another network—in the cloud or on premises. During this course, you will learn about the available network connectivity options for connecting your network with the Confluent Cloud network. You will also learn about the benefits of each option as well as any trade-offs that need to be considered. At the end of this course, you will be better prepared to decide which option is right for you.
Before we continue with this course, we want to provide a little bit of background information for a couple of key audience groups that this course is intended for: Apache Kafka® users that may be less familiar with cloud computing or cloud networking concepts, and networking people that may be less familiar with Kafka.
There are a couple of key networking fundamentals that we need to understand before we discuss Confluent Cloud networking in depth. The first is IP addresses. If you’re already familiar with public and private IP addresses, and CIDR notation, you can skip this section. If CIDR is a new term for you, then hang around.
Devices that are part of a network are uniquely identified by an IP address. A portion of the address identifies the network the device belongs to and the remainder of the address identifies the device on that network. Traffic that is destined for the device is first routed to its network and then to the device itself.
IPv4 addresses are 32 bits that are broken down into four “octets” of 8 bits each—for example, 10.0.10.20. Following the address is a network mask, which is separated from the address by a slash. The network mask—for example, /16—determines what portion of the address is the network ID and what portion is the host ID. A mask is applied from the left side of the address. A /16 mask, for example, says that the leftmost 16 bits of an address refer to the network ID, while the remaining 16 bits refer to the host ID.
A network is commonly referred to by its CIDR, which stands for Classless Inter-Domain Routing. A /16 CIDR network is one where the first two octets are the network ID—2 lots of 8 bits is 16—and the last two octets are the host ID. A /16 CIDR network can be broken down into smaller networks, e.g., /24 CIDR networks. The larger the mask value, the smaller the range of host IDs in the network.
Devices can be assigned both private and public IP addresses.
We will cover methods for communicating with devices that are part of a private network later in this course.
Depending on the networking option(s) you choose for your Confluent Cloud network architecture, you may have to fit Confluent Cloud IP address range(s) into your existing network infrastructure.
The second networking fundamental concept we need to understand is the Domain Name System (DNS). Again, if you’re already familiar with DNS, you can skip this section.
It can be very tedious referring to devices by their IP address so we typically assign them a friendly name that is known as the domain name. It is much easier to refer to devices using this domain name.
The Domain Name System is a distributed, hierarchical system that provides information about domain names and their corresponding IP addresses. This hierarchy consists of root nameservers at the top. At the next level are top-level domain (TLD) nameservers for top-level domains such as .com, .net, and .io. Below TLD nameservers are additional nameservers, which store domain name information for their respective domains (such as confluent.io). Your organization may also host its own nameservers, which share information with the rest of the DNS network.
From a client side, each client that needs to be able to access servers via domain names is configured with a set of DNS servers, which the client will use to resolve DNS names. The client will make DNS requests to these DNS servers; if the DNS server doesn’t know the answer to the DNS request, the DNS server will forward the request to another DNS server, and so on, until a response is found and returned back to the client.
DNS records describe the relationship between domain names and IP addresses.
A name resolution request occurs when establishing a network connection with a device by its domain name. This name resolution is an iterative process that works its way through the DNS hierarchy until it reaches the authoritative nameserver that holds the DNS record associated with the request. It is that nameserver that returns the corresponding IP address for the requested domain name.
Let’s now take these basic networking concepts and apply them to the cloud.
A cloud service provider (CSP) such as AWS, Google, or Azure, runs multiple services for customers, including “Infrastructure as a Service (IaaS),” which means it is running virtual infrastructure for a customer. This often includes virtual machines as well as the networks through which those machines can interact with other services. Different cloud providers have different patterns around how virtual networks and virtual machines interact with each other.
Different cloud providers have different names for the logical private networks they provide. AWS and Google call them Virtual Private Clouds or VPCs. Azure calls them Virtual Networks, or VNets. It isn’t strictly true, but it’s often helpful to think of cloud virtual machines as running “in” a VPC or VNet.
The implementation and terminology of this varies from cloud to cloud, but generally speaking, the following are true (in the context of IPv4 networking):
The VPC/VNet is typically assigned a /16 CIDR block of IPv4 addresses.
Cloud providers break their infrastructure into cloud regions across the globe which correspond to groups of datacenters where infrastructure is running.
When you create a virtual network, you use policies or rules to control traffic in the VPC/VNet:
Your network architecture can selectively control what can talk to what—for example, in some environments, you can directly access the internet, while in other environments, access to the internet may be blocked.
Separately, there may be scenarios where you expose parts of your infrastructure to the wider internet. For example, if your business has a public website, that website is accessible from anywhere in the world.
Also, in some cases, you may directly connect your cloud network to other cloud networks, or to your on-premises infrastructure, either directly or through a VPN or tunnel.
Now that we have covered some basic networking concepts as they apply to Confluent Cloud, let’s now take a look at a few Kafka concepts that are also important from a Confluent Cloud perspective.
When designing a network architecture for Confluent (or Kafka), there are a few things to be aware of.
Kafka uses a binary protocol over TCP. It does not use HTTP (or HTTPS). It does support TLS, with either of these two options:
Kafka clients need to be able to access all of the brokers in a given Kafka cluster, and will initiate direct connections to individual brokers to produce or consume partitions that are active on those brokers. This means that you cannot access Kafka through a traditional load balancer, which assumes that your client doesn’t care which backend server it connects with. The data flow looks like this:
Given the above, for Kafka to work properly:
In addition to the above, the server may require that the client provide some form of authentication; in Confluent Cloud, we use SASL_SSL with the SASL mechanism of PLAIN which is effectively APIkey/Secret authentication (all wrapped in SSL encryption).
Throughout this course, we’ll introduce you to Confluent Cloud Networking through hands-on exercises. If you haven’t already signed up for Confluent Cloud, sign up now so when your first exercise asks you to log in, you are ready to do so.
Review your selections and give your cluster a name, then click Launch cluster. This might take a few minutes.
While you’re waiting for your cluster to be provisioned, be sure to add the promo code NETWORKING101
to get an additional $25 of free usage (details). From the menu in the top right corner, choose Administration | Billing & Payments, then click on the Payment details tab. From there click on the +Promo code link, and enter the code.
You’re now ready to complete the upcoming exercises as well as take advantage of all that Confluent Cloud has to offer!
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.