Get Started Free
February 24, 2022 | Episode 201

The Evolution of Apache Kafka: From In-House Infrastructure to Managed Cloud Service ft. Jay Kreps

  • Transcript
  • Notes

Kris Jenkins:

Hello, you're listening to the Streaming Audio Podcast. And my guest today is Jay Kreps, one of the original creators of Apache Kafka and through a long and interesting journey, the CEO of Confluent. We talk about the natural evolution of companies from those Wild West pioneers of internet-scale to today's service providers that let you nearly forget about scale entirely and just worry about your business. And we also talk about the natural ups and downs of life, and I'm still not sure how I feel about that part of the conversation, but you'll find out when we get there.

Kris Jenkins:

Before we start, let me tell you the Streaming Audio Podcast is brought to by Confluent Developer, which is our site that teaches you everything you need to know about Kafka, from how to start it running and write your first app, to architectural patterns, performance tuning, maintenance, and more. Check it out at developer.confluent.io. And if you want to take one of our hands-on courses that you'll find there, you can easily get Kafka running using Confluent Cloud. Sign up with the code PODCAST100, and we'll give you an extra $100 of free credit to get you started. And with that, I'm your host, Kris Jenkins. This is Streaming Audio, let's get started.

Kris Jenkins:

My guest today is Jay Kreps, who is one of the original co-creators of Apache Kafka, former cutter of code on that project, and now the CEO of Confluent. I've got to say, there's something a bit strange. I feel like inviting the Confluent CEO into the Confluent Podcast is a bit like I've broken into your house and I'm inviting you into your kitchen.

Jay Kreps:

Easy guest to get.

Kris Jenkins:

Yeah, hopefully. But nevertheless, welcome to the show, Jay. Thanks.

Jay Kreps:

Yeah. Excited to be talking, Kris.

Kris Jenkins:

Cool. So I wanted to frame this by saying, years ago I had this contract. I used to be a contractor and I had this contract where the company did social media analytics. But that's what they did in theory. In practice, they spent half their time keeping a Hadoop cluster running.

Jay Kreps:

Yeah.

Kris Jenkins:

Right. And it's just the care and feeding of that system. So I've always wondered what was it like back in the day in LinkedIn when you're trying to build this thing and run it and actually build apps on top of it? What was that like?

Jay Kreps:

Yeah. I mean, I think it was a slightly different world. LinkedIn at that time was one of these early social networks, which were scaling like crazy. There was a burst of consumer tech companies and there wasn't really good solutions for how to build that kind of thing. And the examples of success were like Google, where they did everything in house. And the examples of failure were like Yahoo, where they tried to do everything in house and they didn't do it good enough.

Kris Jenkins:

And no one was quite sure what the magic sauce was.

Jay Kreps:

Yeah. That's right. So I think for a lot of the Silicon Valley companies, the takeaway was really building out excellent general purpose in house infrastructure is the key to success. And I think at the time that may have actually been right. To actually scale and build rich products, you needed to have these capabilities. You had to have them and you couldn't go out and buy it. So that really was what you were going to do. And for LinkedIn, that was a combination of adopting open source technology and then making some open source technologies. You had a little bit of both. And so some of the database technologies, we built Kafka, Hadoop we borrowed. But the interesting thing was even the things that we were getting as open source, it was about the same amount of work to really operationalize it at scale as it was to build it. That sounds counterintuitive. But the problem of running big data systems at scale is so hard. We would actually joke internally that getting to write the system was the reward for having to run it.

Kris Jenkins:

Yeah, I can believe that.

Jay Kreps:

And it's one of those problems that gets harder the more successful you are. The more the system is used, the harder the problem is. And the more usage, the more applications, the more scale, et cetera. So that was a pretty significant focus for LinkedIn. And with a high failure rate. You had to attract these very high-end distributed systems engineers, you had to conceptualize and build the right thing. It has all the risk of a startup to get one of those projects to really be successful. But to some extent, the end result is, well, you just have a database. And I think it's a mark of how much the world has changed that now there are all these cloud services, which are amazing, which do a lot of this stuff. They do it really well.

Kris Jenkins:

Yeah. We've gone past those Wild West days, haven't we?

Jay Kreps:

Yeah. I mean, I think if anything, the mindset is still shifting off the, call it the Google mentality. But I think that's pretty far along now. We work with a lot of Silicon Valley companies and they're going through this shift where they often have these very high-end infrastructure teams. But increasingly the desire is to get them focused on something unique to that company, which is typically not the Kafka clusters or the analytics infrastructure, or running the databases or writing new databases. It's typically that, something unique, the special sauce for that business. [crosstalk 00:06:00].

Kris Jenkins:

Sorry. You want to spend 100% of your time on that social media analytics and 0% on maintaining Hadoop.

Jay Kreps:

Yeah. To me, it's certainly possible. I mean, there are not great cloud services for everything, so there are things where you just need them. But I think increasingly that's changing. The cloud providers have done a great job. There's a whole swath of companies like Confluent that are putting huge amounts of resources that are some orders of magnitude more investment than most internal teams can do. And I think it's a big improvement, honestly. Even though I ran a significant infrastructure effort at LinkedIn, to some extent, we were a little bit like... this may only make sense to the Americans, but it was a little bit like AT&T the phone company in the seventies. So they had a monopoly business. And so you could get any phone you wanted, as long as it was a black rotary telephone. And if you didn't like that phone, you could not have a phone. And customer service was available 10:00 to 4:00.

Jay Kreps:

And it wasn't quite like that. Obviously, we were working hard, but to some extent, the amount of choice of what you had as an application developer was pretty limited because of the difficulty of running the stuff at scale. So we were limited to two flavors of ice cream. And our ability to really produce that on-demand was always limited just because it was always a struggle to scale, keep it up, ensure everything was secure. Just the very basics were so much work that it was very much not on-demand as it were. So I think in that respect, it's a much better world for application development where you're no longer as constrained by basic infrastructure.

Kris Jenkins:

Yeah. I can see that. How'd that evolve though? There must have been, at some point, you said at LinkedIn, "We need to spin this out into a company." And then at some point in Confluent life, you said it's not enough to just be providing the software.

Jay Kreps:

Yeah. Well, the first one happened... I mean, when we were starting Kafka, we actually were thoughtful about it. We'd done some other open-source projects. We felt like, hey, there's a really big gap. Everybody's focused, at that time on Hadoop and these key-value stores. And it was these two extreme ends of the spectrum. One was doing very rich things with data, like once a day. And the other was looking up little keys and values in half a millisecond. And a lot of our challenges of data weren't really bad. It wasn't like, hey, we need another storage system. It was really about the flow and interconnection. And I felt like that was very different where a lot of the distributed systems work was taking an idea that already existed, a database, a file system, whatever it is, and trying to make it scalable.

Jay Kreps:

This area of connectivity, I didn't feel like it had really gotten first-class attention. There were older message cues and bits and bobs that moved data around. But if you think about just the level of intellectual energy that went into databases, the research efforts in any university with a computer science department, the amount of commercial investment. A lot of really smart people tried to come up with a really principled, thoughtful abstraction for managing data at rest. And nobody really put that thought into data in motion, and to how data flows and how you can connect and tap in. And that was where we were spending a lot of time. And it was one of those things where you're either going to hire a team of people who do these random one-off integrations and spend all their time maintaining it, which is actually a pretty big challenge, especially if you're operating at scale. And even a relatively simple integration could be relatively complicated to make work well.

Jay Kreps:

So you're either going to do that or you're going to try and come up with some better way. And so we wanted to come up with some better way and we felt like, hey, this is how companies ought to work. And so from early on, we felt like, hey, this should be very successful. Now I would say at first it was very unsuccessful. We launched Kafka and we were expecting a little more attention for it. Our key value store had been popular. Some of our other open source stuff had been popular. And I think at first people was like, eh, who cares? What is it? They didn't even know what kind of animal it was. Is it a [crosstalk 00:10:39].

Kris Jenkins:

Yeah. It's so much harder to get people excited about something when it's not entirely familiar.

Jay Kreps:

Yeah. That's right. It's not like we were product marketing experts. So it took probably about a year of talking to people to get a better conception of, hey, why are we excited about this internally and why externally? And then it did start to catch on in Silicon Valley. And I think a lot of the other pure digital companies were working through similar challenges of just, hey, how does it all connect together?

Kris Jenkins:

Was this while you were still at LinkedIn? Were you thinking [crosstalk 00:11:14].

Jay Kreps:

Yeah, that's right.

Kris Jenkins:

Okay.

Jay Kreps:

I think the question for the viability of a business was really like, okay, there's a handful of these very large tech companies that are doing this. It's pretty hard for them. They're hiring these big teams to go build this into all their systems. Is this an idea that would work outside of that kind of tech company and become something more main mainstream? Well, smaller tech companies and mainstream enterprises, is this applicable to the larger world or not? And just in talking to people increasingly we became convinced it was. It was interesting, for, I would call it a more traditional enterprise. If anything, they're more complex, they're more diverse, there are older technologies that have to come together.

Jay Kreps:

Like to some extent, the importance of that connectivity layer, it's dependent on how big the organization, how diverse the business is, how many older systems there are. That's what you're abstracting away. And to some extent, the Silicon Valley companies that are these very young companies that turn over the technology stack relatively quickly, they're actually the least complex internally. They may be operating at scale. There may be technical challenges, but it's not like they have technology from 30 years ago that is still running big parts of the business. And so as we saw that, we were like, "Actually, this makes sense for everyone. And the competitive pressures on these more mainstream companies are going to be exactly the same." They all need to have a significant digital portion of the business. So thought, okay, this could work.

Jay Kreps:

And then it was just a question of, hey, what's the business model? How do we get going? How does one even start a company? Is there any hope for this? That became the concern once we felt like, hey, this is going to turn into something, or this has the potential to turn into something big. And then maybe we were as much motivated by fear, which we were like, "Hey, this could be a big thing. How sad would it be if we started on this and then somebody else came and did it?" So our fear of some of those things [crosstalk 00:13:28]-

Kris Jenkins:

Yeah. The fear of missing out is a powerful motivator.

Jay Kreps:

... to go try it out.

Kris Jenkins:

Yeah. This also explains why Connect came in so early.

Jay Kreps:

Yeah. It was one of the dilemmas for us. When we were thinking about how to do the company, we were late enough in the game that it was clear you could start with just a managed service in the cloud, or you could start with a software product. For us, we felt like for what we did, it was very important to span all these environments, but you can't do both at the same time. It's just not practical. You have to pick one. So the big choice for Confluent early on was do we want to try and make operations easier by offering a cloud product, but still having just the bare core of Kafka, which is this commit log, which is a pretty low-level abstraction?

Jay Kreps:

Or do we want to try and make it a little bit easier to use for complete use cases, which was like the connectors and stream processing capabilities, which makes it a lot... I think that shows a lot more of the power of the model of what you can do. What's actually different from a message queue or something like that. And that was the dilemma is do you want to make life better for development or for operations? What's going to help you get going in the world better? And it was actually a big dilemma because the reality is, you need both. But if you have a company and you're starting out with three people, you can't do it all at once.

Kris Jenkins:

If you try and build everything, you end up building nothing. Right?

Jay Kreps:

Yeah. That's right. So we were pretty aggressive, I think through the early days of the company of just trying to get to that more complete offering really on both sides. First I'm trying to complete what the technology could do. And then secondly, on trying to get a managed service out there that would make this available without people having to build some internal team. And that combination I think is what makes it something that can be a no-brainer decision. If it's like, hey, you have something powerful. It's a good abstraction for data, but you have to go hire some team of experts and spend years baking it into all your systems and figuring out how to run it. That's a big trade off. It's valuable, but it's really hard.

Jay Kreps:

And I didn't think that was inherent. I think it's possible to make something that's a better abstraction and is easy to use and consume and get the value out of. And I wouldn't declare mission accomplished on that ever. The goal is always to try and make it easier to consume, make it easier to abstract way more of the operations. But we've certainly come a long way in that respect, I think.

Kris Jenkins:

Yeah. It's one of those things often in like systems design, if you start on a better bedrock, you just keep building forever, but at least the tower stays stable as you go up.

Jay Kreps:

Yeah. That's certainly the hope. I think there's always an element of that in these kinds of infrastructure systems because they're not quite as agile as some application layer stuff, so you have to get the basic bones, the architecture of the thing sort of right. And then of course you can evolve from there. But it's harder to make a big right and left turns.

Kris Jenkins:

Yeah. And as it evolves, you find out how right you were on your initial guess.

Jay Kreps:

Yeah, that's right.

Kris Jenkins:

So I didn't know in the early days of Confluent that you had the gleam of Confluent Cloud in your eye.

Jay Kreps:

Yeah. It was a big dilemma. At the time we were trying to start the company, there were really no examples of successful offerings on top of the cloud. But we had come of a world of running Kafka and we knew that that was a big challenge that would be a great product offering. And so in many ways that were just very attractive. I would say we actually almost started that way. And then the concern on our side was like, well, to build a good cloud service, you're going to spend the first couple of years just building out billing and compliance and operational stuff. And to some extent, the user experience on the other side doesn't really get any better. You're just offering the same thing, but you're taking away the operational burden.

Jay Kreps:

And so that was why we didn't... It was a combination of those two things. It would probably be a little harder to get started on the business side. And we'd limit us to just the Silicon Valley crowd that was all in the cloud and probably the smaller, more risk-tolerant part of that since there weren't a lot of examples of people using this kind of infrastructure from other companies other than the cloud provider. So as a result, we started on premise. So then we started internally this cloud effort relatively early after that. It was in the first couple of years that we started the development of that first version. And then of course it takes significant time to really make it good.

Kris Jenkins:

So what was the tipping point? When did you say, okay, we've got enough now that it's time to not just do this one arm, but launch into the cloud?

Jay Kreps:

Yeah. I mean, for us we just felt... I think there was never any doubt that that was where the puck was going overtime, just in terms of where IT dollars were spent, still, the majority was on-premise at that time. But if you look at the first derivative, the cloud was the place that was taking on more new use cases more quickly. And then just from the first principles, we felt very strongly for all the reasons I described that that was going to be much more successful. Because I'd run one of these infrastructure teams, one thing I understood was everybody hates their infrastructure team. Because you're always the bottleneck. You're always slow. It's never good enough.

Jay Kreps:

If you do everything perfectly, then nobody knows you exist. And the one time you make a mistake, that's when you get to meet the CEO. I was very familiar with the dynamic and it just made a lot of sense that it's a hard problem, it's a problem that's not the core competency of a lot of these companies, and they're not doing it that well internally. And the resulting product internally is not that good. So if there's an option, people are going to move to that pretty aggressively over time. And so that pushed us to start early. I mean, that was a pretty significant challenge in its own right, just because what they say in any startup is don't try and have two products. You don't want to divide the army up and fight two wars at the same time. In many ways, cloud infrastructure offering and a software offering are actually much more than just two products. Two software products or two cloud products, it's actually not that bad. You're doing two instances of the same thing. In many ways, it's like a whole different business model, development, methodology, everything.

Kris Jenkins:

Yeah. Two business models at once is a big thing to take on. Right?

Jay Kreps:

Yeah, right. We were we thought we were eyes wide open on the difficulty of that, but it was actually probably more difficult than we thought to make that successful. I still think it's the right approach for our domain. In most domains, if I was starting an infrastructure company, I would probably just do the managed service. But for us, since we're ultimately about connecting stuff together, connecting the applications, connecting the data systems, connecting all the parts of the company, unlocking the data, and letting it flow between things. To do that, you have to be in all the environments. And that's like a significant value proposition for us. And even to an extreme where if there's a region we don't cover or back when we didn't have good coverage of the cloud providers, there would be significant pressure from customers.

Jay Kreps:

They would say, "Hey, we want to use your thing. But the whole point was to connect all the stuff and you don't have regions in APAC we're going to work with you." And so we're like, "Okay, we better get regions in APAC." And it's the same thing where it's like, okay, you have AWS, but you don't have Google or Microsoft or whatever the other cloud provider is. We'd love to use your solution and it doesn't cover it. And so for us, we had this idea of just making it available everywhere and making it possible to connect all that up into one fabric for data in motion. If you do that, that'll be difficult, but it'll be really valuable. And most people are probably too smart to take on all that pain. For any company, you're trying to figure out, hey, what are these competitive modes that we can do really well, that are going to be hard to copy. And so you want these things that are painful, but valuable that you can offer to customers and that is hard for competitors to offer to customers?

Kris Jenkins:

Yeah. And you've got to get the right level of pain where it's achievable without killing you.

Jay Kreps:

Yeah, that's right. Well, you have to solve the problem. That's the challenge.

Kris Jenkins:

Yeah. So I've got to pick you up on trying to get rid of the infrastructure team, which leads me to something I want to know. So you get this thing, now there is a well-managed service available and companies start moving towards a managed service. What happens to that generation of Kafka experts who are deploying in house? What happens to their jobs?

Jay Kreps:

The good news is there are no layoffs in software. So the actual movement you tend to see in companies is moving up the stack. And for people who've been working in software engineering for let's call it more than a decade, you've probably already done this. There's a set of things that were a much lower level than we used to do. And then the tooling gets a little better and you're up one level in the stack. And every year it's always like, oh, hey, the new tools are going to write all the code automatically and all the problems will go away and the problems never go away. They actually intend to get harder. They're just at a slightly higher level of abstraction. They're a little closer to what the business is trying to do.

Jay Kreps:

I think it's actually really, really valuable. And you see this just in the raw economics of it. If you look at what's happened to software engineering salaries in the last, call it 15 years, it's amazing. And so what caused that? Well, partially it's scarcity. It's a hard skill set. But why is there so much scarcity, it's because there's so much demand and because the value one person can create as a software engineer is now actually quite high. And that comes from working with higher-level abstractions, it comes from not being in the weeds. It comes from moving the needle for what the business is trying to do. That's the only way you can produce enough value to pay these very large cash and equity compensation packages is if that pencils out in the end, and it is creating in aggregate enough value for the business.

Jay Kreps:

And so that trend of moving up the stack has actually been immensely valuable. We see it in our customers. We're just like, there's some team, they're struggling to get the next security patch rolled out in their own self-managed thing. There's often some kind of internal debate about, hey, do we want to do this and own it ourselves? Or do we want to manage service? I think for those who move, they immediately move on to a set of more interesting problems of how to really use data in motion. How does it plug into all the systems we have? How do we react and trace what's going on with it? How can we feed this into the machine learning algorithms that are going to make decisions. In many ways, the more interesting problems that are a little bit closer to what that company is doing.

Jay Kreps:

And it's usually more interesting. I think a lot of the work we do is very interesting when done at scale. If you're managing thousands of clusters, if you're very focused on efficiency and security, et cetera, that's great. But to do it in one company over and over again, there's not enough time to really fully automate everything. And there's always just this long tail of toil and drudgery. And so I think if anything, it ends up being very positive for a lot of these infrastructure teams and you see the same thing, which is those teams never really... They always seem to have a lot to do. It's just the level of abstraction changes.

Jay Kreps:

I had a funny story from LinkedIn that was a little bit like this. So when we were building out the Hadoop infrastructure, we came up with some pretty successful analytics applications. So we were always getting these larger and larger Hadoop clusters to be able to crank more data. And so the progress was very dependent on this and it would impact how many people would join the site and the level of connectivity and the social graph, all these things if we could get our scoring to be better. And so there was one point where we even built out this new Hadoop cluster and it was delayed. And what was delaying it? Well, we had all the computers, but it turned out that the power cord for the hard drive was too close to this raw metal. I don't know what you would call it. Guardrails.

Jay Kreps:

And it would scrape the rubber on this little wire. And there was a concern that if you scrape off all the protective rubber, someone's going to get electrocuted. And so we're in there taping the wire to the side so that when the drives come in and out, it doesn't scrape. And I was thinking about this at the time, I was like, "I'm not operating at the right level of abstraction." We weren't thinking about the wires. And now we're thinking about the wires and we're taping the wires and how many drives can a person tape down? We just want to process more data. This is not right.

Jay Kreps:

And it was interesting. At the time there was early AWS stuff. And I was like, "Man, we should look at that and see if it's cost..." At the time, it was not cost-effective compared to what we were doing. But you could see the appeal of just having some of these problems be somebody else's problem.

Kris Jenkins:

Yeah. It reminds me, have you ever read The Hitchhiker’s Guide to the Galaxy.

Jay Kreps:

No. It's one of the science fiction classics I've never gotten around to.

Kris Jenkins:

There's this bit where he talks about exactly this, you never run out of problems. But an early culture has the problem, how can we eat to survive? A more advanced culture still has a problem, but it's where should we go for lunch?

Jay Kreps:

Yeah. That's right. Well, they have this concept of the hedonic treadmill. Have you ever heard of this?

Kris Jenkins:

No.

Jay Kreps:

Okay. So it turns out people are more or less equally happy. And so when something good happens to you, maybe you get a nice house and you get happier for a short period of time. But then you go back to the same amount of happiness. And if something bad happens to you, you're like, "Oh, this is terrible. You're really upset." And then you go back to the same. And so my friend is a video gamer. And in these multiplayer video games, they have a similar thing which is if you win, they just drop you into games with better people. So you're never really progressing, you're just playing with better people who kill you the same amount. And apparently, humans are wired a little bit the same, which is your level of happiness is more or less relatively constant. I don't know if that's good news or bad news, but I thought it was really fascinating psychology [crosstalk 00:30:01].

Kris Jenkins:

Yeah. I really don't know if that's good news or very depressing. We must imagine [inaudible 00:30:08] happy.

Jay Kreps:

Yeah. That's right. Well, it's like, we're in a pandemic. At first, everybody's like, this is terrible. Then people adjust and they're like, ah, it's bad but [crosstalk 00:30:17] getting used to it.

Kris Jenkins:

And then we're allowed out again and we're like, yay, for about 10 minutes.

Jay Kreps:

That's fair.

Kris Jenkins:

To get us back on the topic of computing. I want to hear more of the... you must have some stories from inside customers of how, not just the people on the ground, their job changes. But if you're a company that moves from having to worry about all this stuff in house to getting all that off your plate, how does life change for the company? Does the company get a different focus?

Jay Kreps:

Yeah. I mean, I think it's a big deal. So at one point at LinkedIn, we were just trying to figure out resourcing and we were discussing, hey, what's the right level of resourcing for internal infrastructure, and what's the right level of resourcing for product features? And the decision at the time, given all the scaling challenges and how much we were slowed down by things was that about 70% of the engineers should be working on internal infrastructure and about 30% should be working on the product. Now, if you think about, for a lot of companies that are product-centric and where software is a significant component of that product, that velocity is going to determine over a longer arc, a lot of the success of the company.

Jay Kreps:

And so if 70% of your resources go into something that's a means to an end, then you only got the 30% that's left. And so, I think that's a very clear illustration of it in a really quantitative way. And so you can see that now where certainly if you look at this newer generation of smaller startups that was born in the cloud, they're very aggressive about not taking on... They almost think of these operations projects as a debt of like, oh, now we're stuck running this thing forever. And they're very aggressive about trying to avoid that. And I think there's some wisdom to that. Probably.

Jay Kreps:

I mean, it's interesting, it's a different skill set, I think for tech companies to be good at buying managed services, getting the value out of those products and vendors, and making sure that they're evaluating them appropriately. It's a little different, I think than operationalizing open source or building stuff in house, but still important.

Kris Jenkins:

Okay. What's your completely unbiased guide to choosing a managed service?

Jay Kreps:

Yeah. I mean, I think the question is how much of the problem can it take on? So when we've thought about what we're trying to do, there have been three pillars we've tried to focus on. So one is, really try and make something that's cloud-native. You shouldn't be picking out how much disk space there is per machine or how many CPU cores or whatever. It's just not the right level of abstraction. You want to be buying capability to some extent, and you want someone to properly take over the operation of the system and be accountable for that with a strong SLA. If that part doesn't happen, you don't get the cloud-native capabilities. I think it's a big loss. People from far away, it's easy to miss this, but the difference between a Teradata, Redshift, and a Snowflake is really to the extent to which that system is designed to capitalize on the capabilities of the cloud and the degree to which it's priced as a service versus something you stand up and pay ahead on.

Jay Kreps:

And I think that's actually a very big deal in enabling the agility in getting people abstracted away from it, and in actually handing over the responsibility. I think these things where it's partially managed, it's a big problem. It's a very split responsibility between the team using it and the team that's partially running it. And so if it doesn't work, whose problem is that at the end of the day? So I think that's critical. For us, the other thing was really trying to get the full ecosystem around Kafka. So if that first pillar is cloud-native, the second one is to be a complete offering, really get the ecosystem, the connectors, the SQL layer, how you govern streaming data. Really bring that together because that's ultimately what people need. And so if you manage one component, but they have to go pieced together, some ecosystem of other things to make it work, you're not really taking on that much of the management and then make it work [crosstalk 00:35:02] and make it work across all the environments. And so those were the three that cloud-native, complete, and everywhere. Really try to nail those three.

Jay Kreps:

And then the other side of this is even more basic, which is just, is this service trustworthy? I think one of the amazing things about something like S3 is you can just treat it like a utility you can depend on. And that's what you want in an infrastructure layer. If it's a little finicky or it doesn't always work, it's actually much harder to just consume it. And that kind of trust is very important. And then the cost-effectiveness, ultimately you need something, again, like a utility that you can have a cost structure you feel is healthy, that you want to build around.

Jay Kreps:

I think for a lot of these services, it's possible to end up with something that's very cost-competitive with the do-it-yourself route. In part, because that tends to be pretty expensive. The engineers aren't cheap. You typically end up with a big pile of semi-utilized cloud resources. So I think if you can check those boxes, that's what makes it compelling, something that's really a first-class cloud service that's a good deal. And we do that analysis with customers often of like, hey, what are the economics of running the open-source Kafka yourself and try to piece this together and hiring a team. What are the economics of pure software offering? What are the economics of our cloud offering? And it's interesting, typically the cloud offering's a much better deal than the other two, just in part because of the utilization of resources and then the people and what the people can go do.

Kris Jenkins:

Yeah. If you're spending that kind of money on expensive people, you really want them to be focused on just stuff that makes your business unique, I think.

Jay Kreps:

Yeah. I think that's right. Ultimately the economics for engineers is how much value you can create. And so one way or the other, you want to be part of something that's adding a lot of value to the business. I think it's one of the things that it's often easy to lose track of. If you're early in your career, you often have a limited domain that you're working on. It can be hard to trace that back. But to some extent, the ability to spend on engineering is determined by the value the business is creating. And so in a very direct way, that's what matters in the end.

Kris Jenkins:

Yeah. I often think as programmers, we don't get paid these great salaries to work. We get paid to teach machines to do the work, right?

Jay Kreps:

Yeah, that's right. There's a really funny or interesting or clever riff by the guy who founded O'Reilly books.

Kris Jenkins:

Yeah.

Jay Kreps:

And that's one of the things he said is that in these tech companies, it's actually the workers are the computers and the managers are the software engineers. And what you're doing is creating some system of work to try and get everybody to do their part. And I thought it was an interesting way of viewing this next generation of company that to some extent the execution of the company is actually happening in large part in software and the building of that system and the tuning of that system, that's what is actually being done. It's a slightly more meta problem, and I thought that was an interesting way. And he was contrasting it to maybe a more traditional, whatever you want to call it, industrial economy type organization that is a very commanding control people structure where you have X workers at Y dollars per hour and you're trying to utilize them. And so it's a very different way of thinking about the structure of a company. And then of course it has a whole bunch of implications in terms of how people work together and culturally what's important to make that successful.

Kris Jenkins:

Yeah. So that leads into one other thing I wanted to ask you and you mentioned stream governance. Which seems to me like one of those things that you want, but you can't even think about it when your main concern is keeping the cluster running. It's like a second-order thing. You can only want it once you are free enough to want it. Is there more stuff like that coming up in the pipeline? These things that only become demands once you've got the bedrock in place?

Jay Kreps:

Yeah. Absolutely. So our goal is really to make something that makes it possible to open up a lot of the data across the company, have applications be able to tap into that and respond in real-time, and do smart stuff with it. And there's a lot to that problem beyond just the raw transport or transformation of bites from one place to another or one format to another. You obviously need that capability. That was [crosstalk 00:40:24].

Kris Jenkins:

And it was hard enough to start with.

Jay Kreps:

Yeah. But if you look at that wider world of what does that enable or what's required to make that successful? It's actually a really interesting space and it's, I think really underdeveloped. A lot of the ecosystem around databases has grown for a long time. So if you think about the ecosystem around data warehousing, there are all kinds of governance and analytics tools and graphing layers and gooey layers and layers for mapping out all your tables and managing all your queries. And a lot of that is just coming into being in the world of streaming. And for us, we feel like there's a really significant platform emerging in companies around data streaming, that can really connect all these things together. But a lot of what needs to grow up there to make that as productive and usable is just coming into being now. And so it's an exciting time for us. I think for the whole community of people who think about this and work around it because the white space is not all filled in. So there's an opportunity to add to that picture.

Kris Jenkins:

Yeah. Gradually building up on that. So the last question, I think, on the same topic is to go back to the start, you said you saw the way Confluent would evolve from the days in LinkedIn. But where is it at now? Is the future roadmap still set by that early vision? Or is it more talking to customers and getting their feedback and them saying, hey, we need stream governance now?

Jay Kreps:

Yeah. It's a little bit of both. I think you often hear this debate on product teams, from engineers, which is like, hey, are you more customer-driven or are you more vision-driven?

Kris Jenkins:

Yeah.

Jay Kreps:

And our product vision comes in different forms. I think there's an aspect of like, hey, what's the high-level thing we want to be in the world? I think there's also an aspect that's more bottoms up. What does the technology want to be? Which is a funny thing to say. But to some extent with these infrastructure layers, you're trying to make them more complete, more general. There's a natural way of generalizing or expanding capabilities, which is often good. And it's often not something that customers will ask for. Sometimes they will, but sometimes they won't. And yet it often opens up very new use cases. And so to me, it's usually not one or the other. Typically customers are very good at assessing their own problems. And companies that don't listen to the problems that customers have usually aren't successful for very long.

Jay Kreps:

And really understanding the environments and challenges people are facing is incredibly important. I think customers are often really good at pointing out gaps. Typically when a product infrastructure layer company like us, if you make something, it's not perfect. There are gaps, there are holes, there are things you haven't thought of. And those become really clear to folks when they try and use it and put it into practice for something. And so that feedback is incredibly valuable and you have to act on it.

Jay Kreps:

I think typically I don't think you can depend on customers for that bigger picture of like, hey, what are we trying to do in the world? They're just not thinking about you that much. Maybe they are, but I wouldn't depend on it. And then that, hey, what does the technology want to be? That I think needs to come out of an engineering team that has a strong view of, hey, this is what's happening in the space. This is what makes sense. This is how this can be simpler, more powerful, more elegant, better over time. And the hard problem is trying to marry all those things together so that you're solving for all of that over time. That's the challenge [crosstalk 00:44:27].

Kris Jenkins:

Yeah. Seeing those patterns and juggling them all.

Jay Kreps:

Yeah. That's right.

Kris Jenkins:

Yeah. Well, I better let you get back to it. Jay, thank you very much. This has been a high point on my treadmill.

Jay Kreps:

Yeah. That's right. Well, good news is it comes back down quickly.

Kris Jenkins:

Oh, great. I'll look forward to my evening.

Jay Kreps:

All right. Take it easy, Kris.

Kris Jenkins:

Thanks for joining us, Jay. See you again.

Jay Kreps:

Bye.

Kris Jenkins:

Bye.

Kris Jenkins:

And that was Jay Kreps off to continue the grand balancing act of customer requirements and software needs. That's actually a really complex path to happiness. If you want a simpler path to happiness, you could experience the fast dopamine rush of getting in touch with us here at Streaming Audio. Leave us a review on your podcasting app or a thumbs up or a comment on YouTube, or just drop me a tweet. My handle is in the show notes, and we'd love to hear from you.

Kris Jenkins:

Before we go, let me remind you that if you want to learn more about Kafka, we'll teach you everything we know at Confluent Developer, that's developer.confluent.io. If you're a beginner, we have getting started guides. And if you want something more in-depth, there are blog posts, recipes, and in-depth courses. And if you've come away from all that wanting to use Kafka without maintaining it, then sign up for an account at Confluent Cloud and use the code PODCAST100, which will give you $100 of extra free credit so you can run a little faster or a little further. And with that, it just remains for me to thank Jay Kreps for joining us and you for listening. I've been your host, Kris Jenkins, and I'll catch you next time.

When it comes to Apache Kafka®, there’s no one better to tell the story than Jay Kreps (Co-Founder and CEO, Confluent), one of the original creators of Kafka. In this episode, he talks about the evolution of Kafka from in-house infrastructure to a managed cloud service and discusses what’s next for infrastructure engineers who used to self-manage the workload. 

Kafka started out at LinkedIn as a distributed stream processing framework and was core to their central data pipeline. At the time, the challenge was to address scalability for real-time data feeds. The social media platform’s initial data system was built on Apache™Hadoop®, but the team later realized that operationalizing and scaling the system required a considerable amount of work. 

When they started re-engineering the infrastructure, Jay observed a big gap in data streaming—on one end, data was being looked at constantly for analytics, while on the other end, data was being looked at once a day—missing real-time data interconnection. This ushered in efforts to build a distributed system that connects applications, data systems, and organizations for real-time data. That goal led to the birth of Kafka and eventually a company around it—Confluent.

Over time, Confluent progressed from focussing solely on Kafka as a software product to a more holistic view—Kafka as a complete central nervous system for data, integrating connectors and stream processing with a fully-managed cloud service.

Now as organizations make a similar shift from in-house infrastructure to fully-managed services, Jay outlines five guiding points to keep in mind: 

  1. Cloud-native systems abstract away operational efforts for you without infrastructure concerns
  2. It’s important to have a complete ecosystem for Kafka, including connectors, a SQL layer, and data governance
  3. A distributed system should allow data to be accessible everywhere and across organizations
  4. Identifying a reliable storage infrastructure layer that is dependable, such as Amazon S3 is critical
  5. Cost-effective models mean sustainability and systems that are easy to build around


Continue Listening

Episode 203March 10, 2022 | 44 min

Why Data Mesh? ft. Ben Stopford

With experience in data infrastructure and distributed data technologies, author of the book “Designing Event-Driven Systems” Ben Stopford (Lead Technologist, Office of the CTO, Confluent) explains the data mesh paradigm, differences between traditional data warehouses and microservices, as well as how you can get started with data mesh.

Episode 204March 15, 2022 | 41 min

Handling 2 Million Apache Kafka Messages Per Second at Honeycomb

In this episode, you’ll get a taste of how Apache Kafka is used at Honeycomb! Liz Fong-Jones (Principal Developer Advocate, Honeycomb) explains how Honeycomb manages Kafka-based telemetry ingestion pipelines and scales Kafka clusters. Honeycomb is an observability platform that helps you visualize, analyze, and improve cloud application quality and performance. Their data volume has grown by a factor of 10 throughout the pandemic, while the total cost of ownership has only gone up by 20%.

Episode 205March 22, 2022 | 42 min

Building Real-Time Data Governance at Scale with Apache Kafka ft. Tushar Thole

Data availability, usability, integrity, and security are words that we sometimes hear a lot. But what do they actually look like when put into practice? That’s where data governance comes in. This becomes especially tricky when working with real-time data architectures.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free