December 22, 2020 | Episode 135

Mastering DevOps with Apache Kafka, Kubernetes, and Confluent Cloud ft. Rick Spurgeon and Allison Walther

Transcript
Notes

Tim Berglund:

Allison Walther and Rick Spurgeon spend their time actually using Apache Kafka, Confluent Platform, and Confluent Cloud in the ways that other developers and operators use them. They create all kinds of great demos and white papers that a lot of people find useful. Today I got to talk to them about Kafka DevOps, their experiences with it and tooling they've found useful around it, including one really interesting bit of custom tooling they have to share with us. It's all on today's episode of Streaming Audio, a podcast about Kafka, Confluent, and the cloud.

Tim Berglund:

Hello, and welcome to another episode of Streaming Audio. I am as per the ush your host, Tim Berglund and I'm joined here in the virtual studio by a couple of colleagues today, Allison Walther and Rick Spurgeon. Allison and Rick both work very near me in the developer relations team in a group called the integration architects. Allison and Rick and welcome to the show.

Allison Walther:

Thanks Tim.

Rick Spurgeon:

Happy to be here Tim.

Tim Berglund:

You guys what, the name of your team obviously I know quite well what it does, but it is not necessarily a standard function like if I say developer advocate or community manager or technical writer, or software engineer or something like that, those all make sense. What would you say you do here to coin a phrase whoever, a jump ball to whichever one of you wants to answer go.

Allison Walther:

I can take this one, Tim. So integration architecture team is a really fancy title and I understand that not every company has a team of us but I think every company should. What we do is pretty neat. We look at the latest and greatest products that Confluent is coming up with. We also try to stay up to date on related technologies in the space and we create content that highlights how all of these pieces work together. So that content can manifest itself as a blog, a white paper, it can be an extremely long demo or it can be a short tutorial, but it's all pretty much aimed at engineers to get their hands dirty with Confluent products.

Tim Berglund:

I love it. I informally like to describe what you guys do as being the same as developer advocates but not necessarily being addicted to the attention of other people. Coding demos, teaching people how things work but you don't need all the eyes on you all the time. So maybe that's a character win for you, I'm not sure.

Rick Spurgeon:

It sounds like a fair explanation to me.

Tim Berglund:

Yeah, I think so. So we want to talk about DevOps today, and I think there's three of us here, so there should be at least four definitions of what DevOps means but you guys are the guests, so my opinion matters a lot less. Allison what is DevOps and why are you interested in this?

Allison Walther:

So sort of riffing off of the number definitions we will have about DevOps. I agree that DevOps is a very ambiguous term and I think that your experience with DevOps really shapes your definition of it. My experience with DevOps is working in a company that had a development business area and an operations business area and then they decided to merge the two and start doing DevOps. So for me DevOps meant everyone on the team is actively developing, configuring services or pieces of hardware, deploying services, monitoring the hardware and services, doing all the alerting and all of the auditing, yeah in that same company with the same definition of DevOps I've been on a team that has done a quarter of those things. And why I'm interested in DevOps is because it is so multifaceted, you can pretty much never get bored.

Tim Berglund:

Nice. Rick how about you?

Rick Spurgeon:

Yeah. I mean Allison summarized it really well. For me it means the kind of the functions of the small engineering group that's out there. So teams of a certain size with engineers that have to wear multiple hats. Bigger companies may not require individuals to know all the details of how an application service gets from built to deployment, but teams of a certain size will need that knowledge. I also think it's interesting there's kind of a movement. I think it maybe came about from, there was a well-known Amazon document back in the day about service oriented architecture maybe you remember that. They've talked a lot about teams owning their service. And I think that the DevOps movement may have come out of that idea.

Rick Spurgeon:

So if you are on call, you were asked to support a service that facilitated the shopping cart, maybe it would be good for you to know kind of not only what the application does and the way it's built, but how it was configured, how it was deployed. I know when I've been on call having that information was certainly important to me. And so I think the DevOps movement kind of came out of that and I think it's also just interesting for engineers to kind of just know everything they can about a particular thing that they build and support. It could really improve not only, how you build things, but it can also improve what you actually build. Quality can go up if you know more about what it is you're building.

Tim Berglund:

I love it. And I said and I meant it that as the host, I think my opinion on questions like this matters less, at least in this episode because you guys are the ones talking about DevOps. I have a third definition that is not inconsistent with anything you guys, either of you have said, but I wonder what you think of it. I think of DevOps as a specialized kind of software development discipline, if you imagine say a Kafka core committer, that's a person who specializes in distributed data storage or stream processing or something like that. It's sort of a distributed systems developer kind of domain specialty. We've got it, there's even a series of interviews on this podcast about what it's like to be a distributed systems developer, how to become one. And those are all like Kafka committers were talking about that.

Tim Berglund:

Early in my career I wrote firmware for telecom devices for satellite modems this satellite communications startup. I had to know a bunch of things about data communications protocols and modems and RF things and everything involved in satellite communications. So that was like a specialized domain in which I wrote code. I think of DevOps as a software development discipline where the domain is systems, right? Like you're programming the computers and the infrastructure and the deployment tools that cause the other program to go and be running which is consistent with Rick like you were saying that kind of cross training sort of thing where you've got the team of the "developers" who also have to know how to operate their thing. Well, if they do that, then they're probably doing some of the automating of that operations.

Tim Berglund:

And Allison, like if you're responsible for configuration deployment monitoring, alerting again, if any of that is automated, then now you're a software developer in the specialized domain. And really I guess I'm just throwing that out there blatantly wondering what do you guys think of that? And we can move on from there but what do you think? DevOps is a specialized domain of software development where what you program is systems and not some engineering or application sort of discipline.

Rick Spurgeon:

Yeah. I would agree with that. Maybe it's worthwhile to think about it as developing glue, kind of the things that hold it all together. I think we're going to talk a little bit about monitoring and alerting and observing systems and that certainly as a development practice but a lot of times maybe what you're doing is stitching or gluing together and so that kind of ties into our job title a little bit-

Tim Berglund:

integrating.

Rick Spurgeon:

Yes. So I think you're onto something there. I like to think about it as glue.

Tim Berglund:

Nice. Allison you mentioned monitoring, observing and things like that. So talk about that. That's a big part of what it means to operate software is to be able to observe it. And I always think of observability as kind of the super set of monitoring and alerting and control and all that kind of stuff but talk to us about it. Just kind of riff on what is the view of those things that you have developed?

Allison Walther:

Sure thing. So I have been a DevOps engineer for quite a few years, but I often found myself gravitating towards observability. Like you said, that is sort of a broader term for monitoring and alerting. So I found myself often in this space where I needed to create monitoring systems, enhance alerting for various different distributed systems. And so I think that observability within Confluent and within Kafka is a pretty interesting topic because there's so many different routes you can go and it really depends on if you're deploying in Confluent Cloud or if you're deploying Kafka on your own. There are lots of different routes you can take and I think the amount of options that you have there is what I find particularly interesting in this space.

Tim Berglund:

Talk to me about that with Confluent Cloud. And I always like talking about cloud ops because the kind of marketing narrative about cloud ops is, "Oh well we're in the cloud and that way you don't have any operational concerns, that's all taken away from you." Which is like deeply not true. They're different and they're smaller and like there's value in using a cloud service, a huge value in some cases, but you still have to operate the thing. I know there's no one answer, but what are typical ways in which one observes what is happening in one's cloud service?

Allison Walther:

So I think the big answer to that question is folks should be looking into Confluent Cloud metrics API. So Dustin [Korte 00:10:33] came actually on this very podcast back in December of 2018, and talked specifically about the metrics API and Confluent Cloud-

Tim Berglund:

Which was quite new back then, as I recall.

Allison Walther:

Yeah. It definitely was, I think it was just entering preview. But what I found particularly interesting about that talk is with Confluent Cloud, you still have to monitor your applications. You're not free of observability, but you're going to be asking yourself different questions. You don't need to ask yourself about JVM metrics or garbage collection because that's handled for you in Confluent Cloud but what you need to ask are things that impact your business directly like the throughput and the latency of your streaming applications.

Tim Berglund:

Got it. And I mean this isn't necessarily a product discussion at least the product under discussion today is not Confluent Cloud but what are some of the metrics that you can get in the metrics API these days? Either of you guys and that could be a jump ball, what's the current state of things there?

Allison Walther:

So I think this ball is still in my court. So you can get a wide range of metrics on various different levels. So either about your cluster, a topic or partition, and the metrics are about bites, records, and requests. One neat thing I found about the metrics API is that it is queryable. So you can more or less send this API question, like what's the max retained bytes per hour over 10 hours for my topic or my cluster? Or how many bytes were sent to consumers per minute, grouped by a topic? And then we'll do the computations for you and just send you the answer. So not your typical API, but definitely a really neat feature.

Tim Berglund:

Yeah. That I could not agree more that is so cool and you just get little pieces of Jason streamed to you the way they ought to be and an answer to an actual dynamic query that you send it. So pretty cool stuff. And it just totally sounds like I'm being a product promoter right here, but I can, it's my podcast, one of the other cool things is like, it's not just a layer on top of JMX. Like you'd normally JMX things at brokers and get data out that way, but there's actually, you got the query layer and roll ups that are aggregates of things going on in the whole cluster that you would have to build out significant infrastructure to do it yourself. So, it's a cool thing.

Tim Berglund:

And I know, like I said, this is just such a giant question, but for non Confluent Cloud things, just what are some other ways that one normally does this, kind of speaking to the listener who unlike you is not a person who has done lots of ... Who's been responsible for looking after lots of cloud services, what are the options I can expect to see for how to observe what's going on in my cloud thing?

Allison Walther:

So I will say you're in good hands if you're using Confluent and Kafka because they offer you an insane amount of metrics out of the gate with JMX. And so what you can do is you can tune how many metrics you want to receive and particularly about what. So say if you want metrics about brokers, their networking, zookeepers, your audit logs, an authorizer, how your outback in LDAP health is doing, all of those you can configure how loud you are willing to let them be and then you can use something like Prometheus, which is a time series, database, totally open source.

Allison Walther:

So set up that service, set up some exporters' JMX exporters' and basically take all of those logs from your brokers, your zookeepers, yada, send them to Prometheus and then deploy yet another service Grafana, which is also open source and create the visualizations to have that observability with both of those services, Prometheus and Grafana, they do have alerting built into them so you can have it send you a slack message or page you in the middle of the night. But that's a pretty typical way of monitoring Kafka if you are running it on your own.

Tim Berglund:

Got it. Got it. Which is I mean, there's a lot of moving parts there, but this goes to the I think what I said before about how the marketing copy I see on cloud service webpages, and maybe I'm exaggerating here, but it seems like people say things like, eliminate your operational concerns, or ops is a thing of the past it's in the cloud. Yes to some degree, because all that stuff you just described is what's behind a typical cloud service or even a really nice metrics API, like the one in Confluent Cloud. You get all that stuff and you have to roll it your own if you run on-prem and there are lots of valid reasons to run on-prem and there are other tools. I mean, there's things like control center. If you're an actual Confluent customer and you're not just running open source Kafka, that can make that more pleasant, but in your mind, Allison where's control center fit into things?

Allison Walther:

So I think control center is at an interesting point in its product life right now, where it's not offering your production Kafka user all of the garbage heap metrics that you absolutely need but it does provide you all of the data that you need to be monitoring your streaming applications, to be knowledgeable about how your connectors are doing, how your case SQL DB queries are doing and gives a very high level overview of how your cluster health is doing. So I'm really interested to see where control center goes if it's going to start showing off these lower level metrics that are actually pretty necessary to run a Kafka cluster, so I guess we'll just have to wait and see.

Tim Berglund:

There you go. Now Rick, talk to us more about this just general DevOps for the cloud. It's a thing many concerns are taken away like Allison was talking about JVM tuning and JVM heap management, garbage collection. And I know that's gotten easier over the years anyway, but like you don't do that if you're running a Java service in the cloud. If you're running Kafka in the cloud, you don't ideally tune inter broker replication settings and sizes of thread pools and all that stuff that are valid concerns. You've got people operating the service that do that for you, but you still have to turn knobs and you still have to look after stuff and you still have to automate stuff. So what have you seen in that world?

Rick Spurgeon:

Yeah, so kind of stepping back from the monitoring side of things and thinking more about deployments, because if you're going to use a cloud service like Confluent Cloud, your focus is going to shift over to the application side. So you've got application workloads that you want to deploy to actually do the business which is the nice part. You can now kind of shift your focus that way, but you may be deploying things differently depending on what kind of service you're using. So let's take for example, Kubernetes popular service for running microservices in particular. I don't know if you've heard about it. Maybe you have Tim.

Tim Berglund:

It seems like I've heard the word and probably most people listening know basically what it is, but why don't you in case there's anybody who's really new to the field give a quick definition, What's Kubernetes?

Rick Spurgeon:

Oh, geez. So let's see.

Tim Berglund:

Right. It starts with containers.

Rick Spurgeon:

Right, container orchestration service, a system I should say. And what's nice about it is that in my opinion it has this declarative deployment model. So you can tell Kubernetes what it is you want versus how you want it. I find that to be extremely powerful. What can be tricky is let's say you have cloud services like Confluent Cloud, and you want to actually provision some of those services or resources, it can be tricky because they may not and Confluent Cloud does not yet at this point have a matching declarative pattern for the resources you're deploying. So whether it's a cluster or an ACO or a topic, you're currently doing that using some sort of imperative method, like calling an API or a CLI command. So there's a bit of a disconnect there, but there's some interesting work being done in this area that I think it's interesting to talk about, and it's specifically kind of targeting Kubernetes workloads, but it's around this idea of the operator pattern.

Rick Spurgeon:

And so with Kubernetes, the operator pattern, not to be confused with a well-known operator product called Confluent operator, which is specifically about running the Confluent platform on Kubernetes. What this pattern is really about is how to declare things that you want and then operationalize them. And so if you're running applications in Kubernetes, in particular there's some really cool projects out there to help you operationalize your cloud resources in the same manner you do for your applications. So if you're deploying an application, a streaming application, let's say Kafka streams and it's a Java application, then if it's a Java application, then you're going to be deploying a container on the Kubernetes. Let's say that's going to consume data from a particular cluster in a particular topic, there's some interesting products out there you can use to declare those things as well and with a little bit of that glue code, you can put your cloud resource declarations next to the actual application resources.

Rick Spurgeon:

So you can kind of deal with them in the same manner that you do. Let me say, you can deal with them in the same form, right? So you can have a YAML file with the container definition for the job application, and next to it as a YAML file with a declaration for a cluster at a topic. And it does require currently to write some code, but there are some tools out there that can help you do that. There's a tool called ... Go ahead.

Tim Berglund:

I want to back you up there to make sure, again for the total Kubernetes new, but that we're, covering those a little bit. First of all I appreciate that you said YAML because you were saying declarative, which sounds very respectable and we do need to make sure that we are talking about YAML. Just so that's not false advertising. So you write a description of what you want reality to be in YAML rather than instructions for how to make reality be that way which would be the imperative way to do it, which would be like chef or puppet where you're actually writing in some domain specific language programming language really about what to do to these servers. So you're declaring what the state of affairs ought to be, not how to get there.

Tim Berglund:

And so then the tooling Kubernetes, one of these operators Kubernetes can take the difference between your description of what you would like the world to be and what the world actually is and modify the world until it looks like what you want it to be. Wouldn't it be nice if there was like one for just regular life, just me describing that I'm like, wow, I want that. I would even, describe things in YAML if I could have that, but anyway, go on. You were talking about it as long as you don't disagree with mine.

Rick Spurgeon:

No, you're absolutely right. And that's called the control loop process, which you might have been familiar with in your former days, popular in robotics, for instance to have a definition, do a delta and then a control loop process to drive the state of the thing to that desired state. That's exactly what Kubernetes is doing.

Tim Berglund:

I'd never thought of this in controls terms, but I'm now going to refer to all Kubernetes clusters as the plant, just to use proper controls terminology. So go on, you were talking about just starting to get warmed up and talking about tools and I interrupted you.

Rick Spurgeon:

Yeah. Sorry. So there was a handful of projects out there that I wanted to highlight that could help you do this kind of work. One of them is called Crossplane, it's really cool. KUDO Dev is an operator product that ... These are all like I would call operator SDKs, so-

Tim Berglund:

Yeah. Tell me about what each one does, I want to know.

Rick Spurgeon:

Well, they all do similar types of things. The strategies of which I think are pretty similar, but they just do them with slightly different technologies. So there's another one called Shell-operator, which I actually use in a project we're going to talk about a little bit, but it allows you to declare something and let's just say some basic YAML, have it posted to the Kubernetes cluster and it will actually detect that declaration and invoke a container for you with some basic code. Let's say even in Bash or Python, this allows you to write like a basic operator. And this was the one of the most basic ones I found, but it was really sleek.

Rick Spurgeon:

So that way I could take a YAML file with a structure that says, here's my cluster, here's my topic, here's my ACL and I could write some basic Bash scripts to read that declaration and call the proper APIs and let's say the Confluent cloud CLI to make those things a reality. First query, determine the current state, calculate the difference and then execute the appropriate commands to apply the difference. So you get to write your own little control loop process, and you can do it in Bash or Python basic tooling. And this is something Amazon is picking up on too. They're building a controller to control their own services like Lambda, for instance, using Kubernetes declarations. So there's pretty cool tools coming out of this space.

Tim Berglund:

Nice. And I guess to make it a little bit more concrete again, if you just don't know what Kubernetes is, that's a perfectly respectable state for a human being to be in, to not know Kubernetes well, but the resources you're talking about are really like compute and storage resources. So you have programs that you want to run out in this Kubernetes cluster and those are Docker images, and it goes and puts them on computers that can run them, but they might also need a certain amount of disc space and like a particular program. I don't know, like if you're a Kafka broker might always want to have the same chunk of storage associated with it because you've got log files in there and you need to make sure if the node that your container is running on happens to get shifted around, that you don't lose access to your storage.

Tim Berglund:

And that sounds simple, but like actually doing that in an efficient way for a real application, like Kafka, or like whatever else, you need this program in the middle to help mediate between your declaration of beautiful YAML with perfect indentation and the state of affairs of the cloud.

Rick Spurgeon:

Yeah. That's a great explanation Tim, and then this idea of operators just takes it to the next level. So deploying your application is one thing, but then what they call people have coined day two operations. So let's say your application needs to be versioned or upgraded or some other, let's say a table on a database if you're deploying a database application or a topic in a cluster, that would be considered like a next level kind of operation that typically you might code in some other way, these operators are aiming to allow you to code them using the same method as you do for the more base concepts that you're describing.

Tim Berglund:

Nice and those things Crossplane, KUDO, Shell-operator, those are all tools for making it easier to write operators.

Rick Spurgeon:

You got it.

Tim Berglund:

Because it's not trivial for a system like Kafka. I mean, that's actually a product Confluent has, it's Confluent operator that runs Kafka and other components of Kafka Platform in Kubernetes, it's a meaningful piece of engineering work to go by this but-

Rick Spurgeon:

I don't recommend building a Confluent or Kafka operator.

Tim Berglund:

Super bad idea it's been done [crosstalk 00:27:33].

Rick Spurgeon:

Yeah.

Tim Berglund:

That's all we need.

Rick Spurgeon:

Definitely. The idea is interesting though, just to use the concept to operationalize your business not our business, right?

Tim Berglund:

Right. The stuff that's domain specific about the way your things get deployed, well the DevOps of it, this is converting what you might do into automation and software. Allison, how about Ansible? That's the thing that you guys also work with and is a part of our lives in dealing with cloud service management. So how does that plug in?

Allison Walther:

So in a lot of ways Ansible is a declarative resource management tool. In the past I've used it to set up some sort of cloud resource. So whether that be an instance or a managed service in the cloud and then also deploy my service on top of that resource and configure that service all done with Ansible in a declarative manner. So I think it's a neat thing to mention when we're talking about operators and how to manage your cloud resources in a DevOps fashion. It allows you to write playbooks in the YAML with the nice pretty indentation and in co-located that wherever your other code is for whatever services that you have, if you would like to do that. So I think it does fit pretty nicely into this idea of an operator. I don't think it's nearly as new as some of these other systems that Rick may have mentioned but I think it's definitely worth mentioning in relation to DevOps.

Tim Berglund:

Absolutely because I mean, if you look at Ansible and the operator stuff Rick was just talking about, like operators are well, given Kubernetes. So Kubernetes is present and it's probably going to be present. This is the way of encoding a lot of your domain specific operational ... I'm trying to use the word operations twice. Your operational game, the stuff that you do to make your software go live somewhere and keep running when pieces of it die. Stuff that in the old days everybody just did by hand, sitting at a keyboard and a mouse on a little tray, pulled out in a data center ... That was actually pretty dark, it didn't have to be that you could always SSH in.

Tim Berglund:

But the stuff that you used to do by hand, and then people wrote Perl scripts to do it and now you've got operators that are these specifically designed plugins that tell Kubernetes what to do, which is great inside that Kubernetes world. But you've got all these other services, like you were talking about a minute ago and we were talking Allison a little bit ago about how to monitor those things and how to observe those things. But like, what about what in your life isn't Kafka and it's sad but there are systems that you have to deal with that aren't Kafka, and there are other cloud services that you need to spin up and you need to manage. And Ansible is that kind of layer on top of the Kubernetes operator layer that lets you orchestrate all that. Or maybe even like spin up your managed Kubernetes service. That could be a thing that you do in Ansible. Tell me if I'm wrong about that, I mean I've got Ansible right, right?

Allison Walther:

Yeah. You've definitely got it right. I've seen Ansible used as a means to expedite work. So whether it's launching instances or deploying services, upgrading services so that we can focus on migrating to Kubernetes, with that maybe migrating to Kubernetes let's simplify our lives, but use Ansible as a means to get there and free up some time for ourselves. So there are definitely some services that won't be ready to move over to Kubernetes and you still have to manage them in some way, shape or form, and maybe Ansible is the tool for that.

Tim Berglund:

Yeah. Ansible is one of them, at least, and in the spirit of introducing people to the world of DevOps and things or tools they might not know, what are other tools that Ansible competes with? I mean, I know it's an open source thing but what are you probably not using because you're using Ansible?

Rick Spurgeon:

I'll take that one. So I was thinking just as we were speaking how you were talking about manual management of these ideas, so what did people used to do? Well, they used to write a bunch of Bash scripts tucked away, hard to read, hard to understand, but-

Tim Berglund:

Imperative as all get out.

Rick Spurgeon:

Yeah. And we get to move away from that, but we're not going to tell Allison's nice boss who happens to be a Bash aficionado. So-

Tim Berglund:

She doesn't need to know, I'm pretty sure [crosstalk 00:32:29] to the podcast, so it's okay. And we'll find out if she does.

Rick Spurgeon:

We will.

Tim Berglund:

Hi Eva ... Please go on Rick.

Rick Spurgeon:

That was my answer. We get to not use Bash anymore. And there's also just the idea of declarative versus imperative, which I just think is really powerful because you get to go away from having to define state transitions all over the place. So you don't know, you get to say, this is what I want, not this is how I do it and then when you say this is how I do it, you have to handle all of the scenarios of the various A to Bs, Bs to Cs or As to Cs, right?

Tim Berglund:

Right. And sort of ought not to worry yourself about how. That is kind of like a function that gets applied to your input parameter. And there should be per cloud provider, just kind of one way to do those things and you let the tool worry about the how and you tell it for what.

Rick Spurgeon:

Yeah.

Tim Berglund:

Ansible compared to Terraform, how do those two things relate?

Allison Walther:

So I think that those products compete pretty nicely against one another and why I say that is because in some instances I wish I had used Terraform and other incident instances, I'm kicking myself in the butt for not using Ansible. Terraform does allow you to define your resources over various different cloud providers so does Ansible, there are minor differences which you are better off just Googling rather than having me attempt to rattle off to you. I think it's interesting Tim that you didn't ask us to compare cloud formation to any of the technologies mentioned here.

Tim Berglund:

Yes. So please, Allison talk to me about cloud formation?

Allison Walther:

And so cloud formation is another one of these declarative resource management infrastructure as code, things that you can use except for pretty, if I remember right, it's just for AWS. So you're really locked into what cloud provider you can use.

Tim Berglund:

Right. The reality is everybody does that. I like to sing this song about how great it is to be cloud agnostic and you don't want to start coding against vendor specific APIs. I mean, there's always, lock-in. Every line of code is locking into something and data infrastructure selection I'm using Kafka that's lock-in right? There's a huge investment there and the transaction cost of switching to something that's not Kafka would be enormous. So yeah, I get it there's always lock-in but lock-in when it comes to the operational lifeblood of your stuff, which is your cloud provider, in the abstract as a person who doesn't operate lots and lots of services and isn't running a thousand person IT department and all that kind of thing just sounds like a terrible idea, like gee, why wouldn't you want to keep your options open?

Tim Berglund:

Pretty much nobody keeps their options open. They signed with a new cloud provider and that's the new DB2 versus Oracle versus SQL server sort of large vendor selection kind thing that's fixed for a generation. So it would seem, and I'll keep telling people, it's a bad idea, but I don't think they care. There are some other set of concerns that seems to be pushing rational decision-makers to accept an awful lot of lock-in, at least I'll give him the benefit of the doubt that it's rational till I can prove it's not. But yeah which is actually why I didn't mention cloud formation because it's completely associated with one vendor. I've got this implicit lightweight, well-mannered cognitive bias against all that sort of thing. This gets us to a tool Rick, that you have been working on and Allison called Kafka DevOps. Tell us about Kafka DevOps, Rick. Linked in the show notes by the way. Everybody linked?

Rick Spurgeon:

Yeah.

Rick Spurgeon:

Definitely please check it out. So as Allison was mentioning earlier, what our team does is we built these technical materials that we hope are helping our users. We had this idea to build a project that would operate to use the term of the day in the same way, in a way that's maybe closer to the way our users and maybe Confluent customers actually operate. So what would it be like if we built a system that ran in production meaning we didn't start it up and tear it down, we ran it, we monitored it, we really dealt with things like secrets, we dealt with configuring and deploying the applications, connecting to one of these cloud services. What would it be like if we actually built something just to kind of live the way that our users do? And as you're in a developer relations expert, the ideas let's feel what our developers feel. Can we get closer to them? And maybe it will build some empathy for us, maybe we'll build something useful for them and we can learn a lot about the way people use our systems.

Tim Berglund:

I like it. I like it. What are some of the trade-offs and kind of opinions you've needed to take as you go through things like this because it's also very broad problems. So what are some interesting trade-offs you've come up against?

Rick Spurgeon:

Yeah, so the project is still pretty young. We're definitely interested in it being open to the community. We want people to look at it, help us if they're interested. It'd be great if everyone could gain something from its existence. We want to iterate over it. Like we want to do things like build monitoring deployment. This thing is actually running in real life in a Kubernetes cluster in Google Cloud. The code is out there for everyone to see, and we've tried to look at it like let's solve some real problems that people might have. So what's one of those problems? Let's say like managing secrets. This is a problem you'll see everywhere in the kind of DevOps space. Like what do I do with my secrets? The secret that my Kafka clients need to connect to the cloud service-

Tim Berglund:

API keys and things.

Rick Spurgeon:

API keys, credentials, database creds, all of those things. So we utilize a tool called Sealed Secrets. It's a product, it's an open source tool by a company called Bitnami. And so what it does is it allows you to actually take secrets and encrypt them using a key that's provided by an operator inside of Kubernetes. This allows that encrypted value to only be decrypted by the operator inside of the Kubernetes cluster. That's the only way it can get unenveloped. And so you can take a secret, like a Confluent Cloud cluster, API key. You can use this key that's stored privately inside the cluster to seal it up and then you can actually just check it right into your Git repository. Allows you to like deal with the secrets and exactly the same way you deal with even the other declarations that you're dealing with.

Rick Spurgeon:

So I can deal with a secret as a YAML file with an encrypted payload and just check it right in, even into a public GitHub repository. This operator thing inside of the cluster will notice that it exists because we also use GitOps, we can talk a little bit more about that. But it determines that this manifest exists, pulls it down into the cluster, and then we'll decrypt it, making it available for the applications that need it. It allows that un-encrypted value to just be isolated inside of the cluster entirely.

Tim Berglund:

So I didn't quite pick up on when you first described what Kafka DevOps does, the association with Kubernetes, but you're describing the secret management as being tied to it. So back me up again and tell me ... Sketch out the block diagram with words.

Rick Spurgeon:

Yeah, definitely. Sorry, I probably did skip ahead there, but Kafka DevOps is basically a microservices applications, a suite of applications. It mimics an order system like it's actually based on some code originally written by our colleague, Ben Stopford in his book that he wrote about microservices event streaming. And so what we did is we took those applications and we deployed them into Kubernetes as containers and we orchestrated them together. The way we did that was using a GitOps methodology but "there is use cases, there are some microservices in there, there's a rest service you post and order to it, it streams it out. There's a validation service. These kinds of things work in coordination to ship off to you your trousers as Ben put in the original code."

Tim Berglund:

Speaking as a man of the Commonwealth.

Rick Spurgeon:

Yes.

Tim Berglund:

Ben that is. And Allison you're involved in this too. So what is exciting about what's coming up next with it in terms of the things you guys are working on? Allison Walther: So Rick has done 100% of the heavy lifting of this project thus far. I actually just joined Confluent about two months ago but I'm really excited to get involved in this project because I do have DevOps experience and me and Rick are in this really exciting space where we get to provide opinionated solutions about Kafka DevOps. So what I'm excited to see this project grow into is building out that observability piece that we touched on earlier and sort of implementing these various different solutions for folks to be able to go and look at how our production Kafka cluster is doing. Rick had talked about Sealed Secrets and there are various different ways to deal with Sealed Secrets so I'm kind of curious if we will revisit that at any point in time, specifically for connectors. And I'm just hoping to see other folks come along and possibly contribute or point out problems that they've had with their production clusters and ask for how we would solve them.

Rick Spurgeon:

Yes. I'm going to try to recruit Allison to help me build out these use cases if she will. And that is let's take various use cases that our users are feeling and experiencing every day and try to implement them. And implement them in code, implement them in the open. And so people can see them. How do you upgrade things while they're flying? So zero downtime, no maintenance window upgrades have a Kafka stream application for instance. We want to solve these problems and so Allison and I are going to look to solve some of them into this project as we go forward. We'd love community help to not only do them of course, but to also tell us what problems they're having that we can help find the answer and codify a solution

Tim Berglund:

Open for PRS?

Rick Spurgeon:

Absolutely.

Tim Berglund:

My guests today have been Allison Walther and Rick Spurgeon. Allison and Rick, thank you very much for being a part of Streaming Audio.

Allison Walther:

Thank you for having us, Tim.

Rick Spurgeon:

Thanks, Tim.

Tim Berglund:

Hey, you know what you get for listening to the end? Some free Confluent Cloud. Use the promo code 60PDCAST. That's 6-0-P-D-C-A-S-T to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 31 2021 and use it within 90 days after activation. And any unused promo value on the expiration date will be forfeit and there are a limited number of codes available so don't miss out. Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on a YouTube video or reach out in our community slack. There's a slack signup link in the show notes if you'd like to join. And while you're at it, please subscribe to our YouTube channel and to this podcast wherever find podcasts are sold. And if you subscribe to Apple podcasts, be sure to leave us a review there. that helps other people discover us which we think is a good thing. So thanks for your support and we'll see you next time.

How do you use Apache Kafka®, Confluent Platform, and Confluent Cloud for DevOps? Integration Architects Rick Spurgeon and Allison Walther share how, including a custom tool they’ve developed for this very purpose.

First, Rick and Allison share their perspective of what it means to be a DevOps engineer. Mixing development and operations skills to deploy, manage, monitor, audit, and maintain distributed systems. DevOps is multifaceted and can be compared to glue, in which you’re stitching software, services, databases, Kafka, and more, together to integrate end to end solutions.

Using the Confluent Cloud Metrics API (actionable operational metrics), you pull a wide range of metrics about your cluster, a topic or partition, bytes, records, and requests. The Metrics API is unique in that it is queryable. You can send this API question, “What's the max retained bytes per hour over 10 hours for my topic or my cluster?” and find out just like that.

To make writing operators much easier, Rick and Allison also share about Crossplane, KUDO, Shell-operator, and how to use these tools.

EPISODE LINKS

Continue Listening

Episode 136December 28, 2020 | 43 min

How to Become a Certified Apache Kafka Expert ft. Niamh O’Byrne and Barry Ballard

Niamh O’Byrne and Barry Ballard discuss Confluent’s Certification program, including sample test questions, bootcamp, exam details, Kafka training, and getting the necessary practical hands-on experience.

Listen Now

Episode 137January 6, 2021 | 44 min

Event Streaming Trends and Predictions for 2021 ft. Gwen Shapira, Ben Stopford, and Michael Noll

Coming out of a whirlwind year for the event streaming world, Tim Berglund sits down with Gwen Shapira, Ben Stopford, and Michael Noll to take a guess at what 2021 will bring.

Listen Now

Episode 138January 11, 2021 | 43 min

Change Data Capture and Kafka Connect on Microsoft Azure ft. Abhishek Gupta

What’s it like being a Microsoft Azure Cloud advocate working with Apache Kafka® and change data capture (CDC) solutions? Abhishek Gupta would know!

Listen Now

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® for .NET Developers

NEWPractical Event Modeling

NEWHybrid and Multicloud Architecture

NEWMastering Production Data Streaming Systems with Apache Kafka®

Kafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® for .NET Developers

NEWPractical Event Modeling

NEWHybrid and Multicloud Architecture

NEWMastering Production Data Streaming Systems with Apache Kafka®

Kafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups & Events

Ask the Community

Community Catalysts

NEWCommunity Use Cases

DevX Newsletter

Data Streaming Awards

NEWKafka Summit 2024 - Bangalore

NEWKafka Summit 2024 - London

NEWCurrent 2024

Current 2023

Kafka Summit 2023

Meetups & Events

Ask the Community

Community Catalysts

NEWCommunity Use Cases

DevX Newsletter

Data Streaming Awards

NEWKafka Summit 2024 - Bangalore

NEWKafka Summit 2024 - London

NEWCurrent 2024

Current 2023

Kafka Summit 2023

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® for .NET Developers

NEWPractical Event Modeling

NEWHybrid and Multicloud Architecture

NEWMastering Production Data Streaming Systems with Apache Kafka®

Kafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWLearn More

Language Guides