Get Started Free
September 12, 2019 | Episode 53

Jay Kreps on the Last 10 Years of Apache Kafka and Event Streaming

  • Transcript
  • Notes

Tim Berglund (0:00)

It's been five years since Confluent was born, and since then there's been a lot of development growth around Apache Kafka and the adoption of event streaming in general. Today I had the privilege of celebrating Confluent's birthday with CEO and co-founder, Jay Kreps, who obviously has been there since the very start. As one of the co-creators of Apache Kafka, I asked Jay to share the story on how Kafka came to be and the journey of starting Confluent and how his vision for event streaming has changed over the years. You'll get to hear that and more on today episode of Streaming Audio, a podcast about Kafka, Confluent, and the cloud.

Tim Berglund (0:43)

Hello and welcome back to another episode of Streaming Audio. I am delighted to have with me today in the virtual studio Jay Kreps, co-founder and CEO of Confluent. Jay, welcome to the show. I think a lot of people who listen to this know who you are so I wasn't gonna do the normal, like, build the drama of here's this person and I'll let him tell you his title. Like, we know who you are, and we're really glad, I'm really glad, to have you on. Yes, now, Confluent is turning five, and that's a decent age for a startup and I think for a startup of our size, it's a pretty darn good age. But given people at birthdays, people are looking forward and looking back and evaluating things and it seemed like a fun time really to get you on the show and talk about some of the history that I think you have unique access to of where we've been as a company, where Kafka has been as a project and so that's really what I wanted to talk about. With that in mind, when I first started thinking about this, I thought there's this legend of you rage-coding the original Kafka code sort of primordial Kafka over Christmas break one time. I know the story of Kafka is clearly a lot bigger than the story of you, but number one, is that legend true? And what were you working on at the time that made that seem like the thing to do?

Jay Kreps (2:23)

Yeah, I think the nature of these stories is they're always a little bit true, right? But it's not quite like you got hit in the head with the apple and then you just invented it on the spot. The reality was, at LinkedIn, there was all these different data sources and we were thinking about how to try and integrate that, and I'd run the team that owned the Hadoop infrastructure and owned a lot of the realtime data-driven product offerings and we were always struggling with the problem set and just marinating in it. And we had a team that collected some of the event data, and that team had a system, I think it was called WebTrack or something like that, and it was this system that was just always breaking and losing all the data and it was the worst combination of ad hoc technology and backdoor-oriented stuff, and it didn't really cover any other data source. It didn't cover any of the things in the databases. And so, I think it was more just having that pain for a long time was the real origin and then eventually, yeah, I did get worked up and wanted to just do something about it and that was the initial coding, which wasn't entirely over the Christmas break. I think it was a combination of the Christmas break and a trip out to Pittsburgh for my grandmother's birthday, or something like that. Wait, no, maybe it was my cousin's wedding. I'm not sure. So it was these long chunks of time where you're either stuck on a plane or you're kinda in the house and it's cold and that was enough time to get something prototyped. And of course it was a pretty simple system at that point, so all the good code came later.

Jay Kreps (4:09)

I think they got rid of all that bad code by now.

Tim Berglund (4:11)

Yeah, I would expect that to have developed out.

Jay Kreps (4:14)

But that was the origin. It was actually easy enough to write the early code. It was hard to get the project funded at LinkedIn because this failing system was actually owned by another team that felt like we were criticizing their thing and it turned into this big organizational mess as we eventually took it over. But we did eventually get it off the ground, and it became really important in the architecture of LinkedIn over time.

Tim Berglund (4:39)

Yeah, yeah, as history is born out, and we've had guests on the podcast who've talked about that very thing. The centrality of contemporary Kafka to LinkedIn's architecture and how that evolution took place. A good insight there just about I guess software engineering in general. When you say, hey, I want a distributed log and you know four or five distributed systems concepts, which you did at the time. This was not your first rodeo in terms of data infrastructure projects. You can do that prototype rage-coding and build a thing that works, but then maturity as a product and a piece of infrastructure is a horse of a radically different color.

Jay Kreps (5:28)

Yeah, that's right. I think I pitched the project by saying it would only take three months and solve all of our problems. And maybe it did solve a lot of the problems, but it took a little more than three months since we're still basically working on the same problem 10 years later. So the moral of the story is never believe software engineers when they tell you a timeline. If something works well then people just want you to do it better and better.

Tim Berglund (5:51)

Exactly, and if you're listening to this podcast and you don't know that somehow, then, well, I guess now you do. But yeah, no, that's how these things go. So then you built that and you started using it and you said you had this little bit of an organizational struggle to get it funded but you did get it funded and a team formed that did that ongoing maintenance. And it always grows hair, and needs to do more at some point. So I always point out to listeners, sometimes I ask questions because I know the answer but I want everybody else to know the answer and sometimes I'm asking questions 'cause I don't know and I actually don't quite know. I guess I could do the math, but how long between that event and the project getting funded and you starting to use it and everything to when you started the company, is that about five years?

Jay Kreps (6:52)

Yeah, it was probably about that. I think it was kind of end of 2009 to the early 2010 was when we started working on it and it took a while to get it into production and then there was a whole sequence of ramping it up and open sourcing it, and the first two people on the team were Jun Rao and Neha Narkhede who ended up co-founding Confluent, so I guess it was a good early team with the three of us.

Tim Berglund (7:18)

One might say dream team.

Jay Kreps (7:21)

Yeah, so, we had big aspirations for the technology early on. It was designed to be something that could be open source from early on and it was designed to be something that would be broadly useful. You know, not something that would just solve the two or three problems that LinkedIn immediately had but something that would go solve this everywhere. And so, we had high hopes for it. And then as we saw it succeed in the rest of the world as it started to get open source adoption, we thought hey, this is really a big deal, but it's not really gonna go mainstream on its own just as a GitHub project and it really needs a much higher level of investment than just three or four people at LinkedIn.

Jay Kreps (8:04)

And it also needs just that push to take off in the world, and so that was why we left to start a company.

Tim Berglund (8:13)

Awesome point there. And a little bit of a hobby horse of mine. This is a distraction. We won't talk about this for long, but I would like, like, there needs to be some proper business scholarship. Somebody to really gather some data and look into this but we have the myth of the basement coder the heroic basement coder, which you happen to fit perfectly, right? That's the actual story of the initial code. And we think, oh yeah, here's this wildly successful open source project, and it's all fueled by these heroic basement coders at night or something like that when, in reality, what you just said is quite true and my hypothesis is this is always the case for broadly successful open source projects is that they do need that investment really to go from. I hacked this and it does a thing if you treat it very carefully and feed it and water it and keep it between 71-and-a-half and 72.1 degrees Fahrenheit, you go from there to here's a robust product that can power the world.

Jay Kreps (9:09)

Well, I see it a little differently. I think there's a whole range of things that are open source and some of these are actually very small in scope things. They're typically libraries. They're easy to build and test. They have a pretty limited scope of what they do. And I think a lot of those work pretty well as the kind of evening and weekend volunteer projects. In a sense, they would be, trying to add a real company behind it or around it would make it worse, not better.

Tim Berglund (9:44)

It would be weird, yeah. Those have been my open source contributions. Like, little Gradle plugins that do this or that. They're cool and they help people, but it's not a thing.

Jay Kreps (9:52)

I think there's another set of things where there's just a really massive chunk of R&D to be done. And there it is important to figure out how does that get funded? I think it's even harder if the end state you're building towards isn't totally known. I think one of the advantages that some of the early open source like Linux and mySQL had was, in a sense, it was kind of a better implementation of a known interface, and that makes it easier to collaborate with little bits contributed here and there. But I think anything where you're trying to figure out what the end state is while you build it and there's just a really material amount of work to be done on it, you do have to figure out how to fund it. We were originally doing that primarily with contributions from within LinkedIn, and it was okay, but it was ultimately a very small team, so it was a really big problem. And so, so it was kind of, I think maybe while we were there, the team probably peaked at five or six people. And it just wasn't really enough to do nearly as much as we hoped to, to kinda complete the vision. So that was one of the exciting things about Confluent was being able to really align the contribution with what the rest of the world wanted and not just be solving the problems of one company and being able to take up the level of investment in the project as well.

Tim Berglund (11:23)

Yeah, which, I guess, which you need, if you wanna do a thing that is of service broadly to potentially everybody, otherwise it's gonna be pegged to what that one company wants.

Jay Kreps (11:33)

Yeah, yeah, I think that's definitely true. I think the areas where it's hard from within one of those tech companies to make an open source project successful is all the stuff that helps you get started helps integrate it into the environments people would have. You tend to do a good job over the does it run at scale in our particular environment? And everything else outside of that is a little bit hard to find time for.

Tim Berglund (12:01)

Yeah, well, and that is rational on the part of that sponsor. So when you started the company, you got from there and it's four or five years later and you're like, okay, this thing has legs. Let's start a company. Back then, again, at birthday time, reflecting back on the past, what would you think would be the hard part? I mean, I know you come at this as not just an engineer but one of the key engineers on the project. Did you think engineering problems would be hard, or was it hiring, or where do you get revenue? What seemed difficult back then?

Jay Kreps (12:40)

Yeah, I think probably early on, the big concern we had was would this technology and vision catch on outside of Silicon Valley? And at that point we'd had really good adoption from the big, large-scale tech companies and then a smattering of adoption outside of Silicon Valley. Or, you know, it's not really Silicon Valley. The geographical location, but really the large-scale tech companies. So that was an area of concern, and then beyond that, the three founders were all engineers that knew this technology and product space well, but had no background in enterprise, go to market, or anything like that. Yeah, a lot of the initial challenge was okay, great, like, is this gonna catch on elsewhere and are we gonna be able to build a viable business in this area? And I think one of the exciting things about it was it became clear pretty quickly that it was definitely gonna catch on and be, if anything, an even bigger deal outside of the core tech companies. That those companies had more diversity and older systems, and just were larger, and had more lines of business and acquisitions and things that had to integrate. And so, in some sense, something that could act as the central nervous system across its value is proportional to the complexity of all the digital investments, right? And so, in a lot of these other companies that they're basically just much more complicated and so the value of what we're doing was actually higher. I think people don't realize that, but tech companies are actually surprisingly simple internally in large part 'cause they're new. But also because they compulsively rebuild the technology on the new stack and hence, they're more disciplined about their investments. And I think Kafka is definitely useful in those environments, but it's even more useful inside of a company that has much more complexity and sprawl because you really then need this integration layer. You're not talking about integrating the big three data systems 'cause the company may have hundreds of them.

Tim Berglund (14:51)

That's such a good point. 'Cause in a tech company, you have a website and a mobile app experience and could be fiendishly complex in all kinds of horrible problems of scale and blah, blah, blah, but it's one thing. And an insurance company that's been using computers for 50 or 60 years, there's a few systems in there. And, of course, Kafka shows its value in getting those things to talk.

Jay Kreps (15:19)

Yeah, that's right. One of the questions that we got early on from customers was hey, how do you integrate this with mainframes? And we were like, what? Yeah, yeah, and sure enough, it turns out really important parts of the economy are run off of mainframes and that's actually a really valuable area where data is. And in a lot of companies, being able to open that to a modern stack is a really valuable problem to solve. And the technology does it well, but I don't think we really appreciated the full diversity of what the internals of large enterprises outside of Silicon Valley look like.

Tim Berglund (15:56)

Absolutely, absolutely. They don't do 'em in Sonoma, but they do them in the world.

Jay Kreps (16:03)

Yeah, that's right, that's right. It's an interesting thing that I actually think a lot of companies miss if the founders are techies is that all this new stuff, whether it's cloud or the new programming layers and abstractions, it's all coming in addition to a larger running business and that stuff gets rewritten but the cycle is slower, and it takes more time. And so, understanding the story of how your technology lives with all those things is really important.

Tim Berglund (16:33)

Yeah. You said something, you said it a couple times, actually, just a couple minutes ago you said when you started the company, one of the areas of uncertainty for you was whether this vision of stream processing would catch on. There's this broader vision, right? And you also, when you were talking about other successful open source projects like mySQL being an early massive success, you made the point that there's this standard interface, like we get this, there's this data model, and there's this API that just mean database to the whole world and everybody knows what that is and let's build one of those and we don't have to persuade anybody about how to use a relational database. We just have to say, hey, look, this one is as good or better than the one you would have paid a bazillion dollars for, and so that kind of open source victory is there. And in contrast, you were saying we have this stream processing argument to make. So talk to me about that. Like, has that vision changed over the past five years? And maybe, if I might ask you, just kinda lay that out. What is stream processing?

Jay Kreps (17:42)

Yeah, I think the big idea was that you could build around these streams of events. We at LinkedIn had Hadoop and we would suck in these big dumps of data and we would run big batch processing jobs at the end of the day, but 'cause we thought about it, it didn't really make sense. Like, there was no part of LinkedIn that generated data in a batch manner. Like, the business was a continuous process and I guess everything in real life is a continuous process, so as we thought about it, it seemed like this weird almost legacy of mainframes or big punch card batch processing jobs or something that you would align things to this 24-hour cycle where at night some big job kicks off and processes all the data from the previous day. So then the question is, well, what has held this back? If this is a more natural way of solving the problem, why don't people do it? And I think it was two things. One, to really do a lot of this stuff at the scale of a company, like across all the different systems, kind of required modern distributed systems techniques and the other part of it was the abstractions just weren't there. You have to be thinking in terms of events. You have to be thinking in terms of this real-time processing that just wasn't where it came from. And I think the early applications for databases were very much these human-driven things. It's like, how do I build a UI that does some basic CRUD operations. It looks up records, and it helps me do some data entry and makes me more effective. And I think the role for software now is really changing 'cause it's not like you just have these individual applications that are aimed at presenting a UI to humans. The software in a company is this much more integrated thing that is operating large parts of the business and so the software is much more likely to talk to other software rather than just show things to humans. And so in that world, it's less about the database being this reactive thing that sits there and does look-ups when you ask it. It's more about how things trigger and react and respond to other parts of the business to things happening in the world. And so, yeah, I think we understood parts of that as we started. I think we were motivated originally by this kind of academic literature and stream processing and the fact that LinkedIn seemed like it needed stream processing, and by he fact that there was no stream processing that you could go download in open source and then as we got into that problem we realized, okay, if you wanna do stream processing you first need a stream, right?

Tim Berglund (20:27)

That would be nice.

Jay Kreps (20:29)

Yeah, and so, that was how we started with Kafka was like, we thought, well, just capture these streams across the company. If you can do that, the processing could happen in your handwritten code, or you could build on top of that some kind of layer that did this. And so, we started then with something lower level than stream processing, which was just stream storage, basically, which is kind of what the core of Kafka does which was just read and write and store events and do it in a way that allows you to build these scalable applications around it. And then we started almost immediately playing with the possibility of stream processing abstractions on top of it, and eventually LinkedIn created a system called Samza, which isn't heavily used outside but it's still around, and then eventually it started integrating some of that into Kafka with the Kafka Streams API and then at Confluent we've created KSQL, which, you know, we think is super exciting, which is attempting to really prove SQL capabilities and really act much more like a database for streams of data. If you think about it, Kafka is almost like a file system for streams of data in that that's what it does is store this big log. But the way people build applications, you usually don't build directly on top of the file system where you're kind of poking around with individual bytes in a manual way. You usually build with a higher level interface like a relational database, and we think KSQL has the possibility to be something like that where you can work with these streams of events in a much higher level way in a much more productive way. It's basically a lot less code.

Tim Berglund (22:06)

All right, I wanna come back to KSQL. That's definitely an interesting thing to talk about but the vision you just laid out for stream processing, I recognize as the contemporary one. I mean, that sounds like exactly what I talk about to people and what I know you talk about. How how much has that evolved? Do you feel like that was in place? And I know it can be hard to remember some things like this 'cause you can retrofit your memory of an idea to your contemporary understanding of it, but were you pretty much there five years ago?

Jay Kreps (22:44)

A lot of it was there. I think a lot of what evolved, yeah, the two major things that evolved was, one, I think we got better at explaining the vision to other people. Early on, we really didn't know how to talk about this stuff and it made it difficult, actually, as we were trying to explain what we'd done. We weren't sure, like, should we call it a messaging system? Like, what was the exact reasoning or use case that it was for? Because we felt like, look, this is broadly applicable. We don't wanna limit it. We don't feel like it's just another message key. That's not really the goal. And so we really struggled early on. I think the first few years, one of the reasons there was much less open source adoption after it was open sourced was really just people didn't have an easy category to put it into. So that was one, was like, hey, how do you talk about this stuff? And one of the big evolutions there was really talking about events, which is actually the important thing. Like, hey, there's some event. It's something that happened in the company. People get that concept. We had that internally, and actually, all the message names in LinkedIn were modeled around events, and that was a big part of the idea, but we weren't sure if we could really communicate that outwards, and I think that's actually just caught on where for a bunch of reasons now, events are more central in software engineering. And so the ideas are just much more natural when you're thinking about data and micro-service communication and so on. So that evolved significantly. I think the other big thing was we really didn't understand the full breadth of use cases. So, for example, we had no idea about the Internet of Things use cases. Which is actually a really big area where there's things happening in the world and you need this software stack that reacts and responds to that as it happens. And that's kind of the pattern of those Internet of Things use cases, I think, by and large. And a traditional relational database is actually a pretty poor fit because it works the other way around. It's actually a great fit for a web app where the UI kinda drives the action on the data side. It's a really poor fit for something that's trying to model changes in the world and react to them all the time. And so we had no idea about that, so we knew a lot about these kind of internal of a social network, we knew a lot about these monitoring and real-time analytics use cases, and services that would feed off of and respond to this data, but we had no idea about these domains outside that we just hadn't interacted with.

Tim Berglund (25:10)

Nice, nice. Yeah, and you were implicitly describing things that we identify by the buzzwords. Reactive micro-services, for example, you were talking about them without saying those words, which I appreciate. And it's interesting that that was a part of the vision initially, even though that buzzword I think followed by a few years.

Jay Kreps (25:34)

Yeah, yeah, at least when I was at LinkedIn. I don't think people were talking as much about micro-services. I think that came later. We internally had all these services and the goal was to build these services that would feed off of event streams. But there wasn't a good word for what to call that. We weren't even sure if you would really call it a service because that, in our mind, was so tied to HTTP and request response. We just weren't even sure if it qualified as a service anymore if it was taking a stream of events. So I think the world has kind of evolved a little bit around this type of thinking. Which just kinda makes our job easier, but it is one of the harder things if you're coming with something relatively new then just figuring out how to explain it to people is a big part of the challenge.

Tim Berglund (26:23)

Right. Started having customers, and we certainly do now, and I wonder, what are some of them that you've been a part of, and I know as CEO you're certainly not involved in every customer, but who are some that you've been a part of that really, I guess, encourage you in terms of that vision becoming real, or that make you think, all right, what I'm doing here is really gonna make a big impact and can help people at scale and can be useful at scale, or can transform the way software is built? Any way you wanna slice that. What are some customer stories that get you excited?

Jay Kreps (27:04)

There's a ton of 'em, actually. Probably my favorite part of my job is actually getting to go and hear how people apply this technology in these really different business domains. It's like, okay, how does a retailer use events in Kafka and stream processing and how do you model an insurance claim as a series of events? I mean, I think it's totally fascinating, and it's also your goal. If you build something like this, your goal is to see it get used in the world. And if you start a company, your goal is to help these other companies use your product and be successful. So it's one of the most fun parts. I think about being an entrepreneur or running a business, or creating an infrastructure product is actually seeing that used in the world and helping to make these other businesses better. There's more of these than we could probably get into on this podcast, but a couple that I think are really cool, I think the ride sharing area generally is, you know, all these companies use Kafka. Lyft, I think, is a public customer reference of ours. The use cases are just amazing where they're tracking all these drivers and they're computing supply and demand and they're gathering all the data to make good routing decisions. I think that integration of things happening in the real world with software systems and all the dynamic logistics and pricing and operationalization and real-time analytics, I think that's just the prototype of what the direction of the rest of the world is going. If you look at other businesses, I think integrating some of that into the operation of other businesses is gonna be a big thing that happens over the next five or 10 years. And so I think that's a phenomenally cool use case. Another use case I think is cool that I like just because, as we were starting with Kafka, we started with this almost big data use cases. The kind of high volume, low value data. And over time we had to really build both the trust and the fault tolerance capabilities to earn the trust to be able to take on more and more core use cases. And so now, even now there's a number of databases and core storage systems that run off of Kafka, and one of the things I think is most validating about that is there's a really cool use case with Euronext, which is one of the major stock exchanges in Europe. And they're using Kafka as this core persistence layer for each trade and then model all the downstream processing and analytics and compliance work that happens on a trade as a stream processing that happens in reaction to that event. And I think it's cool just because it's so mission critical and just core to the functioning of the economy if you're at the heart of a stock exchange. That's about as important a use case as you could have, and so I was really excited to see that one. They did a really cool video with us on some of the stuff that they did that's on our website.

Tim Berglund (30:17)

We'll be sure to put a link to that video in the show notes. It was pretty fun. It's nice and fast-paced, and just gives you the idea of, like you said, they are actually moving trades in a continent-wide stock exchange through Kafka, which is a pretty big deal. Like you said, if that's not validation that the technology can be used for things, I don't know what is. Talk to me briefly about KSQL. You mentioned it a minute ago, but that's a thing that we built. You know, we sort of started executing really Kafka, apart from Confluent, even, started executing on a stream processing strategy and briefly, if you're new to the domain, if you're new to the podcast, the basics there is that just pulling messages out of a log is fine but you're gonna have to do things with them, and there is this small set of the usual suspects of those things, right? You'll compute aggregations, you'll deal with time windows, you'll enrich one stream of messages with another or do something akin to a join. This is just a small number of things that you actually do when you're consuming messages and Kafka started to get good at those and KSQL basically addresses that same set of things. As a super hard thing to build, why did we do that?

Jay Kreps (31:47)

Yeah, I guess our view was to really become a mainstream technology paradigm you have to really become simple and easy to use. It's like in economics, I think they have this idea of kind of the elasticity of price and supply and demand, so if the price goes down, people wanna buy more, right? And in software engineering, I think it's a little bit similar where instead of price you don't really have the price mechanism exactly. It's more like how difficult is it to use the tool? How much of a pain in the butt is it gonna be? So for something that's a big pain in the butt, it has to be really the only possible solution or the software engineer has gotta stay away from it, right? And so if you wanna make something really mainstream, you have to make it easy, and the progress for us was really starting with something that could work across a company that would scale, that would handle the different data types from high volume, low value data to really critical data, and just make that available, and we always tried to make Kafka's APIs relatively simple but it's still pretty low level. It's giving you this log abstraction. It's letting you build applications around that, but a lot is left to your code. And you have to code up these applications and test them and from early on we felt like it's gotta be possible to move up the stack, move up the ladder of abstraction and make it really easy. And we also felt like, look, if you have all these streams of data, then the whole purpose is to do stuff with 'em. The easier you can make that, the better. And so, the goal with KSQL was really marry something new which is this whole world of event streams with something old, which is databases in SQL that a lot of people really know. And I think our goal is to really make it easy to work with this, make it approachable, make it build on the techniques and stuff that you already understand. And yeah, I think that's a powerful thing, and our hope, as a company, is to continue moving up that ladder of abstraction over time and try to make event streams just really the most natural way of solving a problem and that's really core to what we're trying to achieve. If it isn't the easiest thing, then it won't be the thing that people reach for, and we don't want this to be a technology that's only applicable for the largest possible problems where it's like you would avoid it except when the problem is so large in scale that you have no choice but to use this. We want this to be something that's useful across that whole domain, from super critical, low volume use cases to these really high volume things. And so, scalability is just one dimension, and not even really the most important dimension for the solution that we're trying to build. I think it's a view that I think was maybe a little bit lost in the heyday of big data where there was so much focus on scalability that I think people lost sight a little bit of usability. And just is this a good platform? And I think that's coming more back to the forefront and I think that's a really healthy thing.

Tim Berglund (35:14)

Very, very true. I was in the NoSQL heyday trying to teach people how to use certain of these things and the sales pitch was scale. And you had to, for most applications, you have to tell yourself, well, yeah, I mean, I could need to be as big as Amazon. Hardly anybody is. Now, scale's not not a thing. We need to be able to scale, but there's this, I think, larger range of scales which the technology must economical. It can't be go big or go home.

Jay Kreps (35:51)

I think that's exactly right. Like I think, fundamentally, if we want to make these event streams something that's like the central nervous system across the company, it's gonna have to scale. Even if the individual applications are small, you add 'em all up, they start to get pretty big, and you wanna have that central hub of what's going on. But yeah, one of the things that annoys me most, like, one of my pet peeves is when infrastructure engineers, the people building a database or some piece of infrastructure, when they start telling people that everybody has to be an infrastructure engineer. That you have to really understand distributed systems and you'll hear this lecture of like, well, you know, everybody needs to go really understand distributed systems in this modern world, and it's like, well, guys, that's our job. We're supposed to understand how to build distributed systems and then make it easy for other people. We can't make it trivial, but we can always make it easier, and if the answer is you have to understand everything about the whole problem domain and how everything works, we kinda haven't done our jobs as well as we can. Now, look, the cracks always show through in these abstractions. It's never perfect. But whenever we can, our goal should be to take that into our domain, into the things the infrastructure makes easy for you, and I think we should look at things like TCP. It doesn't always work. It doesn't always hide the abstraction of the network, but it does a pretty good job of doing that so that you don't have to be thinking about packets and bytes and all of that. It just takes that away for you, and I think these infrastructure layers higher up the stack should have that same aspiration to just become transparent. Become something that people can just build against without having to know how it works.

Tim Berglund (37:35)

Right. It is almost certain that you, for most values of you who are enterprise software developers, almost certain that you should not be thinking about distributed systems infrastructure. If we're asking you, ah, you're gonna have to write your own raft implementation, is that okay? And you're gonna have to manage distributed state. And all those things cannot be a part of your life as an application developer.

Jay Kreps (38:02)

Yeah, I totally agree. I mean, we're at an intermediate stage where it's not like everything in the world of stream processing and event streams and cloud is like a soft problem that you can just learn the interface and not have to know anything about how it works. But that should be the goal for the people working on it, you know, should always be to make it easier and simpler and make the abstractions better so that it can become more prolifically used out in the world.

Tim Berglund (38:30)

Love it. How 'bout cloud? Now, of course, we've built a cloud product, are building a cloud product, and what has that experience been like from the CEO's perspective? And how does it fit into the overall plan? It seems like a nonnegotiable part of a company like us, but talk to us about cloud.

Jay Kreps (38:54)

Yeah, it's been really interesting. So at LinkedIn, of course, we ran the software and we built the software, and there's actually wonderful things about that. You can put improvements in and ship them the next day and then there's hard parts about it like when it breaks, they call you in the middle of the night to fix it. But I actually think it's a wonderful thing for a company. I think it's just really recently that this lowest level of data infrastructure has been available as a service. It's something that really I think Amazon brought around and I think it's opening up this whole world of new data services that you can just get ahold of without having to learn how to operate them and run them yourself. So I think it's just a much better experience for customers, and I think it's also an incredibly deep technology problem to actually make a data system that's cloud native. I mean, it's a buzzword, but something where you just pay for what you use where you can expand it elastically where it can span data centers. Like, all of these things are magical, and so it's a great thing for a company to put R&D and to try and make possible for their customers. So as we started Confluent, we knew we would wanna have a COD offering. We debated back and forth whether to just start with that or start with the software. We ended up starting with a software offering because we felt like, well, that way we could build up the stack a little more and finish out the client ecosystem and finish out some of the stream processing interfaces and I'm really glad we did that just because otherwise I do think we would be pigeonholed a little bit as a dumb queue. Just a pipeline.

Tim Berglund (40:46)

Ah, right.

Jay Kreps (40:47)

But what that meant was we needed to really pretty young in the life of the company start building out this cloud offering, and it's definitely hard to get a second product cadence going in a young company 'cause you're early on, and so, but we were really lucky to have a lot of DNA from people who'd done it before. And that kinda helped us to know that this was gonna be important. It was gonna be ultimately the future of the business, so we really, over the course of a year, pivoted the full engineering team to be really working this cloud-first manner where we're shipping features into our cloud product and then we're kinda running them for customers and then shipping them on premise. That's actually a great thing. Going back to the LinkedIn experience, it aligns the incentive, right? It makes you make your stuff work first before you hand it out to other people to run. It allows you to move a lot faster 'cause you can ship things more quickly. It makes you think upfront about the experience of running it, and for these big distributed systems, that operational problem is a huge chunk of what's hard in the domain. So you wanna have that experience. And then it also gives just a great experience to customers. One of the unfortunate things about new data systems is depending on the company, it can take a long time to build the capability to really run a new data system as a totally reliable, dependable technology even if the software is perfect. You have to build that capacity and hire people and really master it. And so I think it's really a powerful thing if you can just get that instantaneously. You can just instantaneously be world class at this new thing. And so we've just seen the companies that start with that are able to go much faster in their adoption and usage of it than they would be if they were trying to do it all themselves.

Tim Berglund (42:43)

And you said a moment ago, you were talking about moving up the stack of abstraction. Grady Booch has said a lot of quotable things, but the most memorable thing to me that he ever said was, ""The history of software engineering ""is one of increasing levels of abstraction."" We bring, as developers, essentially the same kind of noetic equipment, the same minds to the problem. Everybody is basically as smart as everybody was 10 years ago and you've got programmers, but we're trying to do more, and so we build these more and more abstract tools, which is super obvious in stream processing. Less so in cloud, but you said it, and I think it's not just a matter of hey, let's give you a Kafka cluster that you don't have to manage. But as I've seen the evolution of the product, having a little bit of a front row seat as an insider and getting to see decisions that are made and so forth, it is a little bit of that same abstraction story. It's not just here's a cloud-based cluster that we will occasionally upgrade for you, but it is a little bit more abstract. Here are topics that you have that are hosted, and I think cloud, as we've done it, encourages developers and architects to think at a still more appropriate, that is higher level of abstraction.

Jay Kreps (44:11)

Yeah, I couldn't agree with that more. I think you see a lot of products that are kind of, you know, I often call it fake cloud products, but they're kind of intermediate where they're not really taking over fully managing the service, so you're still thinking about, oh, how many servers of this do I want to allocate? I do think the future of this stuff is getting away from servers as a unit in a data system and getting away of too just like, okay, how do I pay for it? And then you use as much as you use, and you pay the bill. Much more like a utility, which is kinda the original vision for cloud computing, and I think it's just so technically hard at the data system level that of course we're all struggling to build the systems that can really fulfill that vision. But yeah, definitely for Confluent in our experience of our product we're trying to get as close to that as we can and just always pushing closer and closer.

Tim Berglund (45:08)

Final question. Imagine there's some other talented team of developers right now that are building something that could have the potential to turn into a great company. This is incredibly hypothetical, because, you know, it's hard to identify who those people are as any venture capitalist could tell you, but imagine you had access to them or they had access to you. What advice do you have for them?

Jay Kreps (45:28)

Yeah, I mean, it's definitely a really exciting time both for enterprise companies, which you can kinda take away just from the valuation those companies are getting or the number of IPOs that are happening in that space, but also I think particularly for these developer facing infrastructure layers, abstractions that are being offered as a service, I think the world of cloud opens up a whole new ability to do this just much easier as a much better business than would have been possible before. And so yeah, I think it's a great time for it. I think whenever a new platform like this opens up, I think there's a number of companies that are built around it, and so yeah, my advice would be just go do it. I think it's hard to fully prepare for something that's new. You can try and learn ahead of time, but there's no better way than to just jump in and start swimming to really pick something up, and it's certainly a great time, both in terms of the money that's available to fund the startups, the enthusiasm with which customers are actually investing and embracing new technologies. I think it's just a great time for this type of company to be born and we'll see if that bears out in the years to come. But yeah, my advice is go for it.

Tim Berglund (46:50)

My guest today has been Jay Kreps. Jay, thanks for being a part of Streaming Audio.

Jay Kreps (46:54)

Thanks so much for having me.

Tim Berglund (46:55)

Hey, you know what you get for listening to the end? A Kafka Summit discount code. Kafka Summit is coming up on September 30th and October 1st in downtown San Francisco and you can get 30% off if you go to and use the discount code ""audio19"" during checkout. Just enter ""audio19"" while registering at, and that 30% off is all yours. I'd love to see you there. But hey, I hope this podcast was helpful to you. If you wanna discuss it or ask a question, you can always reach out to me at @tlberglund on Twitter. That's tlberglund, or you can leave a comment on a YouTube video or reach out in our community Slack. There's a Slack sign-up link in the show notes if you wanna register there. And while you're at it, please subscribe to our YouTube channel and to this podcast wherever fine podcasts are sold. And if you subscribe through iTunes, be sure to leave us a review there. That helps other people discover the podcast, which is a good thing. Thanks for your support and we'll see you next time.

As Confluent turns five years old, special guest Jay Kreps (Co-founder and CEO, Confluent) brings us back to his early development days of coding Apache Kafka® over a Christmas holiday while working at LinkedIn. Kafka has become a breakthrough open source distributed streaming platform based on an abstraction of the distributed commit log, and his involvement in the project eventually led him to start Confluent with Jun Rao and Neha Narkhede. 

In this episode, Jay shares about all the highs and lows along the way, including some of his favorite customer success stories with companies like Lyft and Euronext, which empower their real-time businesses through event streaming with Confluent Cloud.

Starting a company certainly comes with more than the technology, and Jay also reflects on some of the challenges around funding, support, and introducing Confluent to the rest of the world. How they have brought us from the beginning to now yields some wise words from Jay to any developer who is interested in establishing their own startup. 


Continue Listening

Episode 54September 16, 2019 | 29 min

Should You Run Apache Kafka on Kubernetes? ft. Balthazar Rouberol

What are the maturing stages of Kubernetes adoption? How did Datadog experience these stages? Balthazar Rouberol explains what to think about before hopping on Kubernetes hype train.

Episode 55September 18, 2019 | 43 min

KIP-500: Apache Kafka Without ZooKeeper ft. Colin McCabe and Jason Gustafson

Colin McCabe and Jason Gustafson discuss the history of Kafka, the creation of KIP-500, and the implications of removing ZooKeeper dependency and replacing it with a self-managed metadata quoroum.

Episode 56September 23, 2019 | 13 min

Understand What’s Flying Above You with Kafka Streams ft. Neil Buesing

Neil Buesing (Director of Real-Time Data, Object Partners) discusses what a day in his life looks like and how Kafka Streams helps analyze flight data.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free