Get Started Free
February 15, 2023 | Episode 258

What is the Future of Streaming Data?

  • Transcript
  • Notes

Kris Jenkins (00:00):

There's a question that developers often raise when we're excited about a new technology. I've heard variants of this question for just about every new technology that's starting to seem production ready. And it's very simple, how do I persuade my boss to let me use this? And you can put it more formally if you like. You can express it as, how do I explain the value of this technology to the business? Same question. It's a case we often want to make as devs. I can tell it's useful, I can tell it's cool, but more than cool, it's valuable, but how do I transmit that to the CEO, to my business manager, to the other departments? How do you pin it down when you're, at heart, a technical person?

Kris Jenkins (00:44):

So I've been on the lookout for a while for a guest who can help us with that, who kind of sits at the intersection of software development, dare I say it, dare I use the word, marketing. That talking about software to non-software people world, someone technical that can speak to a CEO and explain what's in it for them. And I think I may have found the perfect guest in the form of Greg DeMichillie. Greg has an interesting career path. He started out as a software engineer, and he took a route that eventually led him through as director of product management for EC2 and then GCP, and then eventually ended up in our world of event streaming in a job that actually gets the word product, technical and marketing into just one job title.

Kris Jenkins (01:33):

I met him late last year at the Current Conference we held in Austin. We hung out in the bar afterwards, and I really enjoyed hearing his thoughts about the value of event streaming and where he thinks our industry is going and what's stopping us from getting there faster. And I thought, "This is the man we need to get on the record. This is a perspective we need to understand if we're going to understand what our technology means to the wider business." Before we begin, as ever, this podcast is brought to you by Confluent Developer. More about that at the end. But for now, I'm your host, Kris Jenkins. This is Streaming Audio. Let's get into it.

Kris Jenkins (02:18):

I'm joined today by Greg DeMichillie. Greg, how are you over there?

Greg DeMichillie (02:22):

I'm doing great, Kris. Great to be with you.

Kris Jenkins (02:24):

Good to have you. You're an alauded, vaulted hall for Streaming Audio, because we don't normally invite people in marketing on Streaming Audio. We go a bit more technical.

Greg DeMichillie (02:37):

Yeah, I can understand that.

Kris Jenkins (02:39):

You are officially the vice president of product and technical marketing. It's a mouthful.

Greg DeMichillie (02:45):

Yes.

Kris Jenkins (02:46):

Vice president of product and technical marketing. But you get a special pass because we really like you, but you get special pass because you've got deep technical credentials that can back it up, right?

Greg DeMichillie (02:57):

Well, thank you for that. I feel like I'm in a vaunted company here. Yeah, I think I come to marketing from the backdoor, so to speak. Really, the vast majority of my career has been as an engineer and a product person. I'm a software engineer by training. I spent 10 years at Microsoft working on developer tools, of all things.

Kris Jenkins (03:20):

Hello.

Greg DeMichillie (03:21):

So first version of Visual C++, IDEs and compilers and debuggers. You don't get much more in the basement of the engine room than-

Kris Jenkins (03:31):

Yeah. I love... Not to date you, but I think I can remember the box that that software came in, when software came in boxes.

Greg DeMichillie (03:38):

Yes. Well, the fact that I go back to the days when software came on disks, I think, alone probably dates me. But yeah. And the reason is, because I was a developer, the only products I know how to relate to are products that developers use. Because to me it's really important that you be able to put yourself in the shoes of your user. And I know how to do that for developers because I was one. And as a hobbyist, I still am. I think we've talked before about programming languages. I go through phases of learning new programming languages. I'm rolling up my sleeves on SwiftUI for Mac and iOS right now, of all things.

Kris Jenkins (04:21):

That's a fun one

Greg DeMichillie (04:23):

Yeah. So the bulk of my career has been spent building developer tools and developer platforms. But the thing I really enjoy is I love that intersection of where products that are used by millions of developers then actually make a real difference for their business. And that, I think is what always leads me to this intersection of engineering and product and marketing. Because I think when all that comes together, developers are happy because they get to use cool tools and cool technologies, and their businesses are happy because it's something that actually makes a difference. So I'm in marketing, but to be honest, I could only do marketing at a company like Confluent where our core audience are developers and architects. If you dropped me into marketing at Procter & Gamble, no offense to folks who work at Procter & Gamble, I would be miserable.

Kris Jenkins (05:19):

Yeah. Yeah. I feel that on the developer advocate side, it's like tools that people use is great, but tools that developers use, working in that world, the stuff that you would want to use yourself, you get to talk about.

Greg DeMichillie (05:34):

Exactly.

Kris Jenkins (05:34):

Does that make [inaudible 00:05:38] communicators, we can only relate to people like us? [inaudible 00:05:42].

Greg DeMichillie (05:44):

I don't know. I think to be successful in any of these roles, whether it's developer advocacy, product management, marketing, you have to start with a real understanding of who your customer is. I was at Adobe for about four years. And I remember there, we wouldn't have thought of hiring somebody on the Photoshop team if they didn't just own a digital camera because they thought they were cool.

Kris Jenkins (06:09):

Yes. Yes.

Greg DeMichillie (06:09):

I love marketers who have side projects of software development of some kind, right?

Kris Jenkins (06:18):

Yeah.

Greg DeMichillie (06:18):

You got to keep your hands dirty. You know what I mean?

Kris Jenkins (06:20):

Yeah. It's got to be the kind of thing... I mean, ideally, you find a job where you would do it for free, but you get paid to prioritize their to-do list, right?

Greg DeMichillie (06:31):

Yeah.

Kris Jenkins (06:32):

It's like I'd be programming no matter what I did, and I'd be talking to developers no matter what I did.

Greg DeMichillie (06:36):

Exactly.

Kris Jenkins (06:37):

But for Confluent, I will focus on this area more than that area.

Greg DeMichillie (06:42):

Well, exactly. So to sort of finish off that, I spent 10 years at Microsoft, and the other big chunk of my career was about nine years at Google, sort of building up Google Cloud Platform. And again, same sort of thing, super technical product, used by developers of all stripes. But then the value really is to the business. And when you connect those two, you really see cool stuff happen.

Kris Jenkins (07:11):

Yeah. Because GCP is a big, high profile project, what made you think, "Here I am at the perfect intersection of developers and tools they use and businesses that need them, I'm going to jump ship into event streaming."

Greg DeMichillie (07:27):

Well, it's really funny you mentioned that because I hadn't thought about the space lot. I was aware of Kafka, but we were getting ready to do... At one point, I was working on keynote demos for GCP Next.

Kris Jenkins (07:41):

All right.

Greg DeMichillie (07:42):

And I think it was the year we were getting ready to launch TensorFlow, our AI platform. And at the same time we were announcing a data streaming product called Dataflow, uses Apache Beam, SDK to do both batch and realtime APIs. And I hadn't paid much attention before. And it occurred to me as we were doing this launch that although AI was going to get all the attention, that the value to a company of making the jump from batch data to real time probably was more transformative than AI.

Kris Jenkins (08:18):

You thought at that time?

Greg DeMichillie (08:19):

Well, just it leapt out to me as I dug into it more, this idea of moving from... So many companies were trapped in the era of, "Well, we get our quarterly retro." And maybe they have a dashboard, but the data in the dashboard's never up to date. And I realized that there are great uses of AI, but so many companies are trapped in this data at rest batch mode of working. And so it occurred to me, I thought at the time, "Wow, this is really underappreciated. I didn't immediately go, "Oh, so I should go check out this company called Confluent."

Kris Jenkins (08:59):

Life rarely goes that linearly.

Greg DeMichillie (09:01):

Mine certainly doesn't. Beware anybody who says they have a total plan, I think, for their career. I think you have to be open to what comes up. And then when some folks at Confluent reached out, and I... My boss is Stephanie Buscemi, and my friends knew her from Salesforce. Do you find this, tech is a massive industry, yet you still find you're always... When you get into the developer side of it, it's a small world that we keep circling and running around. Do you know what I mean?

Kris Jenkins (09:36):

Yeah. I mean, there are always so many more new people to meet, and yet some of the same faces seem to circle around you like fate.

Greg DeMichillie (09:42):

Yes, exactly.

Kris Jenkins (09:43):

And I often remind myself of this, when you meet someone at a conference, you don't know if you're going to be working with them in five years time, so pay attention.

Greg DeMichillie (09:52):

Yes. Yeah, completely. So yeah, it occurred to me then that data streaming was under hyped, underappreciated. And so when the opportunity came up to join the company that invented all this, it just was like, "Why wouldn't I?"

Kris Jenkins (10:11):

So it was mostly a real time move rather than the many, many other great features [inaudible 00:10:19]?

Greg DeMichillie (10:18):

Well, that's exactly right. And I think that's one of the things I think we hope to change, which is, I think a lot of people immediately think Kafka, Confluent, they immediately gravitate to the realtime part. And that's certainly valuable, but when you think about everything streaming data can do, I think it's much more than just being realtime. And I think you and I talked about this when we were at Current. Shameless plug, Current, it's a great event. We're doing it again in 2023. I have to live up to my marketing reputation. I got to be selling something here. So let me-

Kris Jenkins (10:57):

That's how we met, doing the keynote together, right?

Greg DeMichillie (10:58):

That's right.

Kris Jenkins (10:59):

Working on that [inaudible 00:11:00].

Greg DeMichillie (11:00):

So let me put a plug for Kafka Summit London in May and Current...

Greg DeMichillie (11:03):

Let me put a plug for Kafka Summit London in May and current in August. Anyway, I think one of the things that happens to a lot of, I have this theory that a lot of really radical technologies, they're underappreciated at first because we look at them as through the lens of what we already know. So we underappreciate them. Again, I'm old enough to have survived. When the PC first rolled out, the first uses of PCs, Airlines bought millions of them, but they literally used them simply as a dumb mainframe terminal.

Greg DeMichillie (11:42):

So they had this machine with its own processing power, its own storage. They ignored all that, and they made a 3270 terminal emulator out of it. They treated it like just a different terminal. As an aside, if you ever watch your check-in agent at an airline typing a Russian novel on their keyboard, it's pretty clear they're still stuck in the era of terminal emulators.

Kris Jenkins (12:05):

Yeah. They don't let you peer around at the monitor, not for security, but because it would look dated.

Greg DeMichillie (12:09):

It would be horrifying. Yeah. They put it in a browser now, but they're still just typing terminal commands. Anyway, over the course of time, we realized that having your own local processing and your own local storage let you do amazing things, and out of that comes Lotus 1-2-3 and Microsoft Excel and the Mac and Photoshop and all these things. We all carry these amazing devices in our pockets.

Kris Jenkins (12:34):

Greg is holding up a phone for the people listening on the podcast.

Greg DeMichillie (12:37):

Oh, yes. This is this great for radio, isn't it?

Kris Jenkins (12:39):

Yeah. Yeah.

Greg DeMichillie (12:42):

We all carry these phones in our pockets, and when the iPhone first came out, even Steve Jobs was saying, oh yeah, it's just you used the web. We just used them as web browsers. And only when we stopped and said, wait, there's local processing and a camera and GPS and accelerometers. There's whole new things we can do here. And I suspect that data streaming is the same way that we're looking at it through a lens of, well, you know, I've already do ETL type stuff, so this is kind of ETL, so I'll just do ETL type stuff.

Kris Jenkins (13:20):

But faster.

Greg DeMichillie (13:21):

But faster. And I think there's much more potential for data Streaming adopted broadly to really have a profound change in the way you think about your data infrastructure. I'll give you an example. If I ask you, Kris, you're at a bank or a manufacturer, and I ask you to draw your data landscape, chances are you're going to draw a bunch of boxes and you're going to say, this box is a SQL server and this box is HANA. You're focused on the boxes, and then maybe you'll draw some lines connecting them.

Greg DeMichillie (13:56):

But fundamentally, your mental model is the boxes, but the problem is the boxes by themselves aren't where the value is. The data sitting in my inventory system in and of itself isn't valuable until it shows up on my e-commerce site, until my weather data impacts my inventory data, which impacts my ordering data. The value is actually in the lines between the boxes. It's not in the boxes itself.

Kris Jenkins (14:31):

Being able to create the relationship between different systems.

Greg DeMichillie (14:34):

The connections, the movement, and then once you realize that, you realize, wow, when I look at my estate, it's not 20 boxes. It's actually 800 or 8,000 connections between the boxes, and that's when I think the light bulb goes on. You go, whoa, I need to think about how do I manage those connections? How do I build new connections? How do I take a stream of data that connects my inventory to my e-shopping portal and connect another data stream into it?

Greg DeMichillie (15:07):

It inverts the way you think about it, and at that point, the value of Kafka and Confluent just comes screaming out at you because it's like, do I want to manage 8,000 strands of spaghetti and five different ETL tools and eight different systems, or do I want have a consistent way to think about all the data that's moving around in my company? Because data that's not moving isn't actually doing anything for you. You're paying money to store it and archive it and back it up, but what's it doing? It literally has no value until you query it or do something to pull it out and put it into another system. And I think-

Kris Jenkins (15:48):

It goes somewhere and triggers a reaction.

Greg DeMichillie (15:50):

Exactly. And so that to me is why I think data streaming in Kafka is... Jay, our founder has talked about it as, it's the fourth estate. It's the fourth big piece of your database, data architecture we're used about thinking about our databases and our transactional systems. I think, I don't know, I'm bad at predicting the future, so I'll say N years. I don't know. I don't if N is one, two, or five. I think the companies will fundamentally think of their data differently. They will think about it in terms of how the data is moving around, and the places where they're stored are important for sure, but they're not where the value is created. They're not how you fundamentally think about your data architecture.

Kris Jenkins (16:42):

Yeah. There's a physical analog to this, and it strikes me as if we could automate the physical analog, we'd be a lot happier, but you have... if you've been in a large bank and you've seen two teams that theoretically could communicate their data to each other, but then you try and wind up the department A talks to department B project, and 18 months later you want to just throw yourself off a cliff because getting departments to talk to each other in a large organization is hard to impossible sometimes. We can automate that.

Greg DeMichillie (17:17):

Yeah. And we've seen this happen in other parts of the compute world. Think about pre-cloud. The way you provisioned infrastructure was with a ticket. You put a ticket in the ticketing system, then somebody in the infrastructure department finds a spare server, fires it up, you collaborate via tickets, and what cloud did was you collaborate via APIs. Here's a programmatic API. You want infrastructure called the API. I may quota you, I may throttle you, I may put budgetary constraints, but fundamentally it's an API-driven process.

Greg DeMichillie (17:55):

I think that's the same way with data collaboration. The idea of, oh, let me file a ticket with you. Let me schedule a meeting. Let's figure out how my data relates to your data, versus let's have a catalog of all our data streams, and again, subject to security and visibility and all that. Why can't I just see your data schema augment it with my department's data and produce a new data stream that is the combination of your inventory data and my sales data. Why does that have to be a meeting driven, ticketing driven process as opposed to an automated process? So I completely agree. I think, and again, that's back to that the values and the lines, not the boxes.

Kris Jenkins (18:45):

So this is reminding me of a conversation I had today at lunchtime with my brother-in-law. We spent about 60 seconds talking about the kids, and then we moved on to technology, of course.

Greg DeMichillie (18:54):

You have very different in-law conversations than I do, my friend.

Kris Jenkins (18:59):

Yeah. Maybe that relationship is quite particular. So he is all excited about communicating between departments with REST based microservices, and I have my own answers to that, but I'm going to pick your brains. Why do this with this new thing that's an event log rather than just set up REST APIs to everything?

Greg DeMichillie (19:21):

Yeah. I think there's a couple reasons. One is because we're talking about data, things like schema really matter. We have really well defined ways of expressing data schema. We have really well defined ways of querying it with things like SQL and I think fighting against SQL is fighting Gravity, and there's been lots of attempts to say, we're going to replace SQL with something else. I don't know, too many 1960s, 1970s technologies that are still largely around in the same way.

Kris Jenkins (20:01):

Yeah.

Greg DeMichillie (20:03):

So I think, sure, you can put REST APIs and write custom code on all this stuff, but why not go directly to the source? Why not go directly to the fact that this is a data representation? I'm just not certain what putting a REST layer on top of it adds on top of the core data itself. I'm curious about your, what was your response?

Kris Jenkins (20:30):

Well, I'm going to pick up on what you said just quickly because one of the things that SQL based interfaces, one of the things that's always been a problem with giving someone access to the SQL is it's very hard to lock down mutation, and you can do that with role-based access control. Sure. But because most SQL engines are expecting updates to happen, you're often limited technologically to how many SQL connections you can have.

Kris Jenkins (21:02):

I remember when I was using Oracle, it was like 120, 200 connections was your absolute maximum because not all of them are going to write, but all of them have to be prepared to be able to write, and that mutation mismatch means we can't open up SQL to the rest of the department. And so we put a facade in front of it. And I think it's an interesting thing that Kafka, because it doesn't have updates, it kind of does away with that problem. Everything's a read until you need something else. So you can open up your data to other people.

Greg DeMichillie (21:35):

Yeah. I think that's a great point. I'm curious, have you've seen this? I think companies and people go through stages of adoption of Kafka, and the first one is sort of the dumb pipe stage. I take the data out, I do some simple renaming, rekeying, I drop some fields of the schema and I just pipe it to the other end. I don't store it. I don't

Greg DeMichillie (22:03):

Just pipe it to the other end.

Kris Jenkins (22:03):

Yeah.

Greg DeMichillie (22:03):

I don't store it. I don't process it except sort of minimally like cleaning it up along the way.

Kris Jenkins (22:08):

Yeah, we could have used any queue technology, but we went with this one.

Greg DeMichillie (22:11):

Yeah, exactly. Kafka's got lots of advantages even in that scenario. Again, back to the point of if you just think about it as a simple ETL layer, you'll get some value. Then when you start to realize, "Oh wait, I can store the data. It's a log. I can actually store that data in the stream. I can query the data in the stream. I can combine streams together." That's where I think you start to open up more interesting possibilities.

Greg DeMichillie (22:44):

I think folks go through this phase. The first phase is, "It's just a dump pipe," and then the second is, "Boy, I've got a lot of pipes. Maybe I can be smarter about discovering and reusing existing pipes. Maybe I can mix and match data and create new data as a product." Again, when you start thinking about a stream is a product I produce as opposed to just plumbing that connects system A to system B, I think that's when companies start to really see a lot more that sort of second order value. Again, this is human nature. What's the first phase we all did with EC2? We did a lift and shift. We took... Back to my analogy of the smartphone and the terminal, cloud went through the same thing. The first phase of cloud adoption was literally taking a VM and putting it in the cloud. There were stickers on backs of laptops, "The cloud is just a computer in somebody else's data center." Right?

Kris Jenkins (23:47):

Yeah.

Greg DeMichillie (23:48):

Then what happens is we realize that's actually a far too limited way of thinking about it. If you think about the cloud not as a collection of computers, but as one big computer, then you start to see the development of things like Kubernetes and containers. A lot of the services we take for granted, whether that's Amazon S3 or Google's cloud storage, those wouldn't be possible to build in an environment where you thought of each server as an individual VM. It took a mental shift. The fact that a lot of Kafka usage today is pretty simple pipes is normal, but then you see as once customers, once people really start to see the possibilities, then a lot more possibilities open up.

Kris Jenkins (24:44):

Yeah. The way I see that is we didn't really shift in our use of things like AWS until we stopped seeing it as individual machines, as you say, and started to see it as like a faucet, a tap.

Greg DeMichillie (24:54):

Exactly.

Kris Jenkins (24:55):

Just a utility. You can turn on or off as much as you like.

Greg DeMichillie (24:58):

Apply that to data streaming. You stop thinking about it as just a pipeline from A to B, and instead as a data product in and of itself, then you start to think about, "Oh, how do I get more value out of each data pipeline I'm using? Again, because that's where the value is.

Kris Jenkins (25:19):

Also you get this thing where once you realize you can create as many lines between the boxes as you like quite easily, you also hit this realization, actually the outbound line you only have to write once, and then lots of people can consume from that same produced line of data.

Greg DeMichillie (25:41):

What I'm seeing that's really interesting is I'm starting to see companies say, "Hey, I'm using Kafka internally. You're using Kafka internally. We need to collaborate between the two companies. Why are we thinking about building some other system to do the company to company connection? Why can't we use Kafka for that? Why can't my internal data stream output be the input to you, even though we're in different organizations?" Now, it raises lots of issues, not the least of which of course is security and data governance and all that, but we're literally seeing companies start to say, "Why are we going through this expensive translation layer between my internal Kafka system and your internal Kafka system?"

Kris Jenkins (26:30):

Yeah. You are actually seeing that in the field?

Greg DeMichillie (26:33):

Yes. We're seeing companies and they're asking us, "How do we do this? What's the best practices for this? Can you help us with this?" Right now, I mean, I'll be honest, it's an advanced use case for sure, but I think it's an interesting next step that once you've adopted Kafka internally and you've made streaming data the foundation of how you think about data movement in your company, it's only sort of natural to say, "Well, but what about my close partners?"

Greg DeMichillie (27:09):

Back to our example, I've got an inventory management system, my supply chain has a system to track the creation and shipping of products. Why are these two things that we update each other via email? Or why do I have to write a custom rest API that just sits in front of a Kafka topic when you're doing the same thing on your side? Now, as I said, it's an advanced use case because it raises lots of issues around governance and data security and data policy and PII.

Kris Jenkins (27:39):

Yeah, yeah.

Greg DeMichillie (27:39):

There's lots of things that make it not as easy as it could be or should be, but I think you're going to see that as one of the next big frontiers as companies more and more adopt Kafka internally.

Kris Jenkins (27:56):

I could believe that actually, because one thing I think is when you move a company to microservices based systems, you quickly realize that you almost have to treat other departments as external, in that you need to validate them the same way you'd validate a user's input. You need to provide uptime guarantees in the way that you provide them to your customers. Then once you're treating your own internal departments as semi-external, you start to think, "Well, why am I treating my completely external departments as not semi-internal? There's a middle boundary between the two, right?

Greg DeMichillie (28:39):

Yeah, no, I think that's a great analogy because again, the whole point here of these loosely coupled organizational models is to allow sharing and collaboration, but also allow independent evolution, independent development. You're right, you make a great point about if we're collaborating via an API, then all of a sudden I've got to worry about accidentally being DDoSed by my internal departments. It happens.

Kris Jenkins (29:11):

Yeah, it happens. It almost never happens deliberately, and it often happens accidentally.

Greg DeMichillie (29:16):

Absolutely. Absolutely. Suddenly you realize, "Oh, I kind of need API throttling internally as well, otherwise I'm going to get taken down by an errant fat finger mistake from my colleague."

Kris Jenkins (29:31):

Yeah.

Greg DeMichillie (29:32):

You're right, once you, I hadn't thought about it that way, but once you treat internal teams as semi-external, then why don't you treat external teams as semi-external? Again, there's a lot of frontier that has to be solved here relating to, well, the thing just we're talking about. How do I prevent a partner from flooding me with topics faster than I'm prepared to accept? Or how do I make sure that I'm not... Security and governance just immediately comes to mind.

Kris Jenkins (30:02):

Yeah. Yeah. Those are kind of external constraints. Making a difference between the constraints we have to do as a business and the constraints we're stuck with because of technology. We should at least as technologists be able to solve the technology problems, right?

Greg DeMichillie (30:19):

Exactly. Exactly. Yeah. All those are reasons I think why I think Kafka and data streaming is just at a super interesting place right now. I think it's humbling to see the Kafka community. Like you, I love developer events. Kafka Summit is currently the highlight of my year. I love those sorts of things. This gets back to, I think what drew me to developer platforms in the first place, which is it's amazing to watch what people build that you never in a million years envisioned was possible. I think our customers in some ways are way... Not in some ways, in most ways, are way more creative than I am. I love that one week you might be talking to a financial startup, the next week you're talking to Michelin, huge conglomerate in Europe. The next week it's BMW. The next week it's a tech company. That's what makes, I think something like data streaming so fun is we talk in the marketing side, which is the great news is Kafka has infinite use cases. The challenge is Kafka has infinite use cases. Right?

Kris Jenkins (31:41):

Yeah. Lord knows our industry can suffer sometimes from solutions looking for problems.

Greg DeMichillie (31:49):

We're not in one of those.

Kris Jenkins (31:51):

No. No.

Greg DeMichillie (31:54):

I love seeing what developers build that we never thought was even possible.

Kris Jenkins (32:00):

Yeah. One of my favorites, if I can interject this, there's a online grocery store in Europe called Picnic and they are heavy users of Kafka, and I got to go to their robotic factory last year. They have topics with events streaming through constantly, millions of events during the day. Meanwhile, they have groceries, streaming through conveyor belts, millions of those. There's just this wonderful physical analog to what we are doing in data.

Greg DeMichillie (32:33):

That's amazing.

Kris Jenkins (32:34):

It was like being inside a Kafka server, seeing [inaudible 00:32:38] with everything going around. I'd never have imagined this and I'd never have imagined how cool logistics could be.

Greg DeMichillie (32:44):

How nerdy is it, Kris, that you and I are laughing at the idea of climbing inside a Kafka server? I mean, I'm very happy.

Kris Jenkins (32:51):

If any listeners are feeling the same way, then you're in a warm and welcome club.

Greg DeMichillie (32:56):

Absolutely.

Kris Jenkins (33:01):

I would like to know, and you should have answers to this. It's part of your job to have answers to this. How do we get there a little faster? What's missing? What do we need?

Greg DeMichillie (33:11):

Yeah, that's a great question. So I'm going to answer it two ways. I think what is the technology sort of need, and then what's the market kind of need? And they're related. I think technology-wise, Kafka's proven itself incredibly, many, many times over again. I think about stream processing, the idea of doing the processing of data in flight, the transformation, the analysis, I think is still a little too difficult. It's one of the reasons I'm really excited about our acquisition of a company called Immerok and some really talented Flink folks, because I think taking Flink and bringing it even closer to the Confluent Cloud product, I think that's going to open up a lot of possibilities to make moving beyond dump pipes even easier and even more powerful. So technically, I think that's one area we can make, we should really make stream processing even easier for more and more developers. You shouldn't have to be a 10-year Kafka expert. I got to do the math. Did I just jump farther back than Kafka exists? No, I think I'm okay.

Kris Jenkins (34:32):

Oh, very close to it. I think it's-

Greg DeMichillie (34:33):

Very close.

Kris Jenkins (34:34):

You can only hire Jay for this role, man.

Greg DeMichillie (34:35):

That's right. Exactly. It's very small market. Yeah. The second one is, and again, we hear this from large enterprise customers, it comes back to governance. If you're great, "I've pitched it, stop thinking about the boxes, start thinking about the lines." Then anybody with data that matters, their business is going to say, "Great. How do I govern that?" How do I make this not be a plate full of spaghetti, but something that I can actually reason about and think about and control and manage and scale and version and all those sorts of things. So yeah, on the Confluent side-

Kris Jenkins (35:11):

You see that at the business level, but you see at the developer level as well. When we're saying, "how do we monitor this? How do we observe it? How do we..."

Greg DeMichillie (35:19):

Exactly. We've taken away sort of the first order pain, I think of Kafka with Confluent Cloud where we've really re-architected it from the ground up to be a cloud native surface to use cloud native primitives to make it easier to automate and scale. But I think we're putting a lot of focus and emphasis around governance and governance tools. Because again, if you buy that, the lines are where the value is, then you got to give me some way to reason about all the lines in my organization. And then I guess on the market side, we need to make Kafka, this relates to the first point, more approachable, easier to get started from more and more developers. Like the Kafka community's great. I want it to be 2, 3, 4 times, 5 times as big, 10 times as big as it is. But we're going to need to make it more approachable, easier to use because the opportunities are out there for folks with Kafka expertise.

Greg DeMichillie (36:21):

If you do the job searches, a lot of companies are out there looking for Kafka talent, and we need to help continue to grow that. To get new developers interested in Kafka and what it can do. To show them the possibilities to make it easier for more and more developers to get started with. So on the technology side, I think stream processing and stream governance are sort of two areas that are really ripe for us to invest in. And we are, as a company, Confluent, putting my Confluent hat on. And then putting my community broader hat on, I think we just got to continue to help grow and nurture the Kafka community.

Kris Jenkins (37:06):

Yeah. Yeah, I'd agree. I think one of the big things that holds it back is it is such a different way of thinking, going from object orientation with a relational database at the backend to a Streaming API. It's like web sockets should have taken off more than it did, but we kind of used to request response HGTP, and there's a similar thing there,

Greg DeMichillie (37:30):

And we have decades of experience with data at rest. They're very mature systems. And to say suddenly that data in motion is as a peer of data and rests, or in some ways as I try to make the case even more valuable than data at rest, that's a big mental shift.

Kris Jenkins (37:56):

We have to do a good job of explaining it in a way that people can get comfortable with it.

Greg DeMichillie (38:03):

And again, make it easy. One of the things I learned from 20 years of working on developer tools is the more you can clear away the repetitive crap for developers, the more you unlock their potential. And whether that's IDE's and code editors and debuggers, and I think that's what we're trying to do with Kafka and with Confluent Cloud clear away all the stuff about how do I install and how do I provision and how do I scale and how do I manage Zookeeper and put all that behind a simple cloud service. It lets the developer do the fun bits. I know of no developer who gets up every day and says, "Oh boy, I get to recharge a database, sir. I get to." You know what I mean? That's just not the fun part. What gets me excited is helping developers get to the fun part of their job. And the more we can take away the undifferentiated part, the better.

Kris Jenkins (39:04):

Yeah. A friend of mine had a really pithy way of putting this. He said, if you can make something twice as easy to use, people will find it 10 times more useful.

Greg DeMichillie (39:13):

Yeah.

Kris Jenkins (39:14):

That'd be a nice-

Greg DeMichillie (39:15):

Absolutely.

Kris Jenkins (39:15):

To get. So are you going to come to wrap up? Are we expecting a big keynote announcement from you in Current that you can give away? Right now?

Greg DeMichillie (39:25):

Can you expect announcements? Yes. Can I give them away to you now? No, I like my job.

Kris Jenkins (39:30):

Oh, well, in that case, I'll have to see you at Current in August 23.

Greg DeMichillie (39:34):

And Kafka Summit, London.

Kris Jenkins (39:36):

Yes. That's coming.

Greg DeMichillie (39:37):

May.

Kris Jenkins (39:37):

In May.

Greg DeMichillie (39:38):

So we'll be there. Looking forward to talking to all the European Kafka community, and then Current in August, we'll be back in the US, so.

Kris Jenkins (39:50):

Austin, Texas. Yeah. Yeah. That'll be fun. I'll see you there. Thanks very much, Greg.

Greg DeMichillie (39:54):

Yeah, thanks Kris. It was a pleasure.

Kris Jenkins (39:56):

Thank you, Greg. And I will pick up on Greg's shameless plug from the start of that conversation, if I may. We are organizing two event Streaming conferences this year. There's Kafka Summit London in May, and that will cover all things Apache Kafka. And then there's Current in Austin, Texas in August, and that's all things Apache Kafka, plus all things relevant to the real time data streaming world in general. Much like this podcast aims to be Kafka at the core, but it's a big old world out there. I'll be at both, and I hope to see a lot of you there and meet you in person. In the meantime, probably the best source of a Kafka information I can point you to is Confluent developer, which is our library of knowledge about writing, architecting, and maintaining calf applications.

Kris Jenkins (40:45):

If you want an article, a tutorial, a guide, or some documentation, check it out at developer.confluent.io and I promise you you'll learn something new. But until next week, it just remains for me to thank Greg DeMichillie for joining us and you for listening. I've been your host, Kris Jenkins, and I'll catch you next time.

What’s the next big thing in the future of streaming data? In this episode, Greg DeMichillie (VP of Product and Solutions Marketing, Confluent) talks to Kris about the future of stream processing in environments where the value of data lies in their ability to intercept and interpret data.

Greg explains that organizations typically focus on the infrastructure containers themselves, and not on the thousands of data connections that form within. When they finally realize that they don't have a way to manage the complexity of these connections, a new problem arises: how do they approach managing such complexity? That’s where Confluent and Apache Kafka® come into play - they offer a consistent way to organize this seemingly endless web of data so they don't have to face the daunting task of figuring out how to connect their shopping portals or jump through hoops trying different ETL tools on various systems.

As more companies seek ways to manage this data, they are asking some basic questions:

  • How to do it?
  • Do best practices exist?
  • How can we get help?

The next question for companies who have already adopted Kafka is a bit more complex: "What about my partners?” For example, companies with inventory management systems use supply chain systems to track product creation and shipping. As a result, they need to decide which emails to update, if they need to write custom REST APIs to sit in front of Kafka topics, etc. Advanced use cases like this raise additional questions about data governance, security, data policy, and PII, forcing companies to think differently about data.

Greg predicts this is the next big frontier as more companies adopt Kafka internally. And because they will have to think less about where the data is stored and more about how data moves, they will have to solve problems to make managing all that data easier. If you're an enthusiast of real-time data streaming, Greg invites you to attend the Kafka Summit (London) in May and Current (Austin, TX) for a deeper dive into the world of Apache Kafka-related topics now and beyond.

EPISODE LINKS

Continue Listening

Episode 259February 22, 2023 | 43 min

Real-Time Data Transformation and Analytics with dbt Labs

dbt is known as being part of the Modern Data Stack for ELT processes. Being in the MDS, dbt Labs believes in having the best of breed for every part of the stack. Oftentimes folks are using an EL tool like Fivetran to pull data from the database into the warehouse, then using dbt to manage the transformations in the warehouse. Analysts can then build dashboards on top of that data, or execute tests. It’s possible for an analyst to adapt this process for use with a microservice application using Apache Kafka and the same method to pull batch data out of each and every database; however, in this episode, Amy Chen (Partner Engineering Manager, dbt Labs) tells Kris about a better way forward for analysts willing to adopt the streaming mindset.

Episode 260March 1, 2023 | 61 min

Migrate Your Kafka Cluster with Minimal Downtime

Migrating Apache Kafka® clusters can be challenging, especially when moving large amounts of data while minimizing downtime. Michael Dunn (Solutions Architect, Confluent) has worked in the data space for many years, designing and managing systems to support high-volume applications. He has helped many organizations strategize, design, and implement successful Kafka cluster migrations between different environments. In this episode, Michael shares some tips about Kafka cluster migration with Kris, including the pros and cons of the different tools he recommends.

Episode 261March 7, 2023 | 55 min

Next-Gen Data Modeling, Integrity, and Governance with YODA

In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free