March 8, 2021 | Episode 147

The Human Side of Apache Kafka and Microservices ft. SPOUD

Transcript
Notes

Tim Berglund:

Sam Benz and Patrick Bönzli have helped more than a few enterprises adopt Kafka, and have seen plenty of challenges, and plenty of successes along the way. In this episode, I got to talk to them both about their experiences doing just that. This kind of experience really is solid gold, I invite you to listen in carefully on today's episode of Streaming Audio, a podcast about Kafka, Confluent, and the Cloud.

Tim Berglund:

Hello, and welcome to another episode of Streaming Audio. I am as always your host, Tim Berglund. And I'm joined today by two people, Patrick Bönzli, and Sam Benz, of SPOUD. SPOUD is a Swiss startup in the domain of real-time analytics, sounds like something that could have some Kafka involved, I don't know. Patrick and Sam, welcome to the show.

Patrick Bönzli:

Thanks very much, nice to be here.

Samuel Benz:

Hi, Tim. Hi.

Tim Berglund:

Maybe tell us a little bit about yourselves. Sam, you go first. What is your role there at SPOUD? And if you want to talk a little bit more about the company, I'd love to hear more about it.

Samuel Benz:

Yeah, sure. I'm working here as CTO, as you already mentioned, we're in domain of real time analytics.

Tim Berglund:

Which could mean anything.

Samuel Benz:

Which could be anything. Now we have different customers, some of them are in logistics, so it's analyzing how the trucks are going and so on. We have autonomous buses, which we are tracking if they have more incidence on, for example, on cobblestone streets things like that. But we are also helping in the data architecture mainly Kafka-based, sometimes different pops up systems, but most of them is really Kafka use cases. That's pretty much the big use case, and we have also a product which helps in Kafka, but to explain that, I think Patrick, has better words.

Patrick Bönzli:

I'm happy to jump in here, you built it up really nicely. I'm Patrick, I'm co-founder and CEO. I'm actually an engineer myself, so we are an engineering company of course. We have been using Kafka for like eternity, I mean, our whole existence since 2016, which is of course not the beginning of Kafka as we know, but a fairly long time here in Switzerland.

Patrick Bönzli:

And we have been collecting a lot of experiences around Kafka, and how our customers are using it. And what led us here is actually that during these project works analytics projects, et cetera, we saw a lot of the challenges that our customers have and came to the idea that to build them our own product, which helps companies to navigate, find, explore data as products on top of Kafka.

Tim Berglund:

All right. Tell me, I wonder about how you got to that product. You see, I think based on what you guys do, it sounds like you see some interesting use cases. I mean, just do trucks have more incidents on cobblestone streets? That's kind of a cool analytics question. It's definitely a European one, that just doesn't come up very much in the United States. You can find cobblestone every once in a while, there's not a lot of it here, but what are some of the cool use cases that you've seen in your history that got you to where you are?

Samuel Benz:

I mean, in one case we were monitoring a microservice infrastructure. It's from a big insurance company here in Switzerland. I don't know how many microservices they have, thousands, and we have the whole real time telemetric distributed tracing everything, and working on that data, doing anomaly detection on top of that, that's pretty interesting. Then you really see the value of all this data, that's full observability. This is maybe one of the bigger use cases we have, also one of the bigger Kafka incidences which we're using, so that's interesting.

Patrick Bönzli:

I think from emotional standpoint, the one project that we did that I loved most was a project for the European space agency. They're building telescopes to find planets outside of our solar system, and a lot of these measurements like devices, they are being produced by universities. And one of these universities is here in Switzerland, and they're famous for doing this cameras, and we help them to actually set up the data analytics for the testing environment that they have, because what they do, if they built such a sensor they put it on the real realistic environmental conditions.

Patrick Bönzli:

And this is like a vacuum, and this is a radiation, and this is really, really low temperature. They built these vacuum tubes with radiation and then temperature control. And this is awesome to have just real data, I mean, any project that we have, like data from real-world objects, I think are kind of emotional and fascinating.

Tim Berglund:

Especially if their spaceships.

Patrick Bönzli:

Yes, yes.

Tim Berglund:

If you visit the site.

Patrick Bönzli:

I was very surprised that although I think that a lot of us engineers love space stuff, only half of our company has actually seen Star Wars. I was really, really disappointed.

Tim Berglund:

Really. Well, that certainly helps you make decisions about who's going to be retained.

Patrick Bönzli:

Absolutely.

Tim Berglund:

Who's in and who's out. Difficult decisions have to be made there I'm afraid.

Patrick Bönzli:

Yes, and I think probably during this COVID episode, this ratio probably has I hope changed.

Tim Berglund:

Maybe remedied itself a little bit. A little more time at home. Now, I think if my coworkers can't understand my Mandalorian memes, I mean, I don't know how to relate. How are you even supposed to talk? This is after all the way. Now that kind of thing is seriously cool though, especially when you get to make a site visit and see physical objects, because so much of what we do, even with Kafka, you talked about monitoring microservices, absolutely key bread and butter use case, very important, huge amount of value.

Tim Berglund:

And it's all information, right? It's all ephemeral. None of it is material, and then when you go look at a physical object that's not just a physical object, but a super cool one, that's a telescope, that's going to fly in space, I mean, that's rewarding. It sounds like you cover a lot of different things?

Patrick Bönzli:

Yeah. I mean, it's not that you can find this project if you want them to find, it's mostly that these kinds of projects they come to you if you're not expecting them. And in our case it was a little bit like that, so a lot of these more fancy projects that we had, they have been coming to us on some really strange conditions, and always we kind of enjoy them. And of course, we're also kind of helping them to do that in a way that let's say, you're a start up, you're kind of like a little bit more flexible in how we work together and the conditions and stuff. These things we tend to do them for it for such awesome project.

Patrick Bönzli:

But what's always the case, and that's actually maybe the second part of your question, what's always the case in any of these projects that we do. If you're trying to find answers to solutions out of data, at the beginning there's this huge hassle to get to the data, and to actually understand the data, and do something with it. Afterwards is kind of the fun part.

Patrick Bönzli:

And I think this is like the general pattern that we have been observing in all these analytics projects, and for anybody working in enterprise is not too surprising, but it's really surprising if you do it your first time, it takes months to get some data. And afterwards the next one to take it and to use the state that they have a month again, and this is kind of the pattern that we observed, and this is actually why we have come to Kafka, a lot from also from integration perspective of how you can actually connect data and make it available in your enterprise.

Patrick Bönzli:

I think this is one of the most awesome, but maybe not so sexy as space ships, and stuff like that, but most valuable, like one of the most valuable assets to Kafka bus in our world today for let's say, normal enterprises.

Tim Berglund:

Take us through some of those enterprise use cases, because again, the spaceships like you need that to be inspired, and just feel like a little kid, and all the wonderful things that are part of that experience, but what are the different enterprise use cases that you see? These are the things that touch most of our lives in terms of our daily work.

Samuel Benz:

Usually when we were coming to companies, often at least in this year, last year, it's growing, we see there is already somewhere a Kafka instance running. Sometimes it's still in the left phase, sometimes they're just two teams which adopted that for their needs or something. But there is still kind of a step to put them somehow in production, or let's say, escaping the lab.

Samuel Benz:

And when we're coming in at that point, that's already kind of good. They understand that there is something valuable in that technology, but often they are start asking to... So now we have production data on top of Kafka, we should start secure this new tool? And that's already what we are observing many times almost the killer scene for adopting company-wide such data integration platform, as we normally see Kafka, because you completely lose transparency and everything.

Samuel Benz:

It's always the same when we are talking to the engineers directly, we are asking them, you should start writing data to Kafka, it's valuable data somebody else can use it, but there is nobody which is using it because on the other side, they're saying we are not adopting Kafka because nobody's publishing. But it sounds trivial, but these are usually the problems you have, they have their microservices, everything they have to super perfect service interfaces.

Samuel Benz:

But when it comes to the data architecture, they're just lost, they don't want to have analytics workload, or just data API things on top of their microservice infrastructure, which is clear. But they also don't have the tools to send data to Kafka, that's why we have to talk in the manner, it should be like Twitter for data now, it should be really simple to just send the data there.

Samuel Benz:

You need kind of tooling around, obviously there is Kafka Connect, it helps a lot to just getting the data on board on that. But anyway, once the Kafka secured, you need again, some kind of, I would call it controlled transparency. You have to give visibility inside of what's happening inside of your cluster. And once you could manage this step, then we really see that multiple teams are starting producing, consuming, and really they help each other. So that's the great point, the moment to see that it's also a cultural change within the company.

Tim Berglund:

What's the first thing that breaks the chicken and egg paradox there. I don't want to produce it because nobody's reading it. Well, I can't read it because nobody's writing it. You mentioned controlled transparency, I love that phrase by the way, because there needs to be data, it needs to be available. Governance and regulation are still things, so solve all those problems, and connect as a strategy for getting data in. But what have you seen is a way to... Who breaks first? Do people start writing? People have to start writing for us, but how do you get them to?

Samuel Benz:

Normally we start just by really talking to the engineers and telling them, "Listen, something what you never want to have in your application is basically state. Look, there is a great tool, which you can put everything you have inside, and you don't have to care about keeping the state if you have to recover, because somebody killed your container or it crushed anyway, you can start re-consuming the data, and do it like that."

Samuel Benz:

It's more like an education that they are start publishing the data. But sometimes, I mean, if it's kind of just anyway software, which they are writing, so they can have Java clients. So that's easy to work on, but for all the other teams, which has databases and things like that, it's really you need kind of a team, which is pushing forward. Some help, let's say repackaging for your company. Kafka Connect in a way that they can simple use it, or you have to give them support.

Tim Berglund:

Gotcha. Support and it sounds like you were selling some of the benefits of event logging there. Hey, it's good to have a history of what you've done, of what the state of your application has been. That's explaining event streaming to people.

Samuel Benz:

Yeah, you're getting basically the best of both sides. Now we don't have to real data, but you have also, usually not that infinite log, but a useful long log that you can have development data, or re-consuming something, or retrain a model or whatever. So my opinion is really kind of the best thing of the tool works.

Tim Berglund:

You mentioned microservices before, and monitoring of a large microservices system of potentially thousands of services. Could you talk about what you guys see? Because I know what I see when I get to talk to actual customers, which is occasionally, and then there's sort of the recommended... What do you want to call it? Whatever. What we see as the proper architecture microservices being event driven, communicating through topics. Is that what you see, is that what you recommend, and is there again another monitoring and observability layer on top of that, where you're tracking the activity of the services through Kafka?

Samuel Benz:

In that particular case, it's not that the communication of the microservices itself is not through Kafka topics. It's more they have still the traditional direct calls, you can see like a service mesh called like that. But what we have is just all the distributed tracing information, which we then sent basically to Kafka and for unto different backends for visualization and everything.

Samuel Benz:

I have to say, I never saw this type of architecture that you really use Kafka as the main communication bus for really that. I would always call it as the service interfaces. I see for the data interface for everything else, but not for the direct business critical calls, not-

Tim Berglund:

That is a thing I have seen in the wild and I ask because it's new and on a theoretical basis, I'm convinced it's the right way to go. And I know there are people who are enjoying success with it, but you know how this kind of thing goes, you have to test it in the field for five or 10 years to really know whether it was a good idea. And it definitely seems like the right idea to me. It gets away from some of the...

Tim Berglund:

You get costs from synchronous interfaces, service discovery is harder, and cascading failures are harder. And then there are these other sort of little pieces of infrastructure that kind of grow up to mitigate the problems of the synchronous interface that don't exist in the asynchronous interface. So always interested in experience reports on that.

Samuel Benz:

Personally, I would like to see more of that infrastructures, as long as we're not going back to these old enterprise service buses, which you can handle distribute to transactions and everything which are nightmare, and super complicated.

Tim Berglund:

And just wrong.

Samuel Benz:

Then it's fine.

Patrick Bönzli:

I mean, from our experience that we have, what we're seeing is, essentially, we are seeing four different fields of how our customers engaged going Kafka often in a combined or in a kind of an order. There's this general event processing cases, which is obvious, you have events and you act and process them. Like a little bit close to that the microservices, which is more like as we just talked about, focused on the communication part itself.

Patrick Bönzli:

And then there is a lot of events sourcing, and data liking going on in the last time. So there's this idea spreading around which we kind of like to use Kafka as the central store of events where you can just source your data at any time. Like I always kind of have this image in a bar, not only because I haven't been in a bar for a long time, but also because I like beer. So I agree yeah, I like this beer, I don't know how it's called in English, but where did you put like-

Samuel Benz:

The tap.

Patrick Bönzli:

Yeah, the tap, exactly. And then you just throw in the data that you like, in the favors and you mix it, and this is kind of the idea that a lot of the enterprises here in our environment are getting and they're liking. And this brings us very closely then to the fourth kind of reason, which is just like the integration architecture, integration data integration. In general, it's close to that or related to that. I think most customers that we have or companies that you observe start somewhere with event processing, and with some traditional use cases, somewhere around marketing, something like that, or obviously a microservice monitoring logs.

Patrick Bönzli:

And then they very quickly go to this integration or event sourcing point. I think it's a very natural thing to a cure once you have like the data in it, in Kafka that you just start to... You tap into it and then use it. And it really quickly and naturally grows into this integration thing. And I think there's no company that I know, or an enterprise team that is not or has started to think... Has not started to think about, or has not actually started to work on it, to start a Kafka integration layer.

Patrick Bönzli:

And this is actually where we find most enterprises and also, I believe this is one of the most awesome use cases from my point of perspective, because it's like kind of this... The only action and reaction on the world, but it's actually kind of the central nervous system in this image that I think that Jay Kreps is often using, where everything comes together, and this is like this beans that are posing the data through the whole company. I love this, it's science fiction, it's spaceships, and this is my [inaudible 00:20:49].

Samuel Benz:

On the other hand, we also see, I mean, this is extremely technical view. It solves so many technical problems, but we are also observing what we like in this sharing of data that somebody else can do something better with this data. On the cultural side between teams, this is not all always welcomed, we see often that people adopting Kafka, they don't have to solve the technical problem, maybe in the beginning, there are some technical skills we talked about that you can help them with specialized teams.

Samuel Benz:

But this shift in mindset that you're publishing something valuable data from your team and that this is open, and somebody can do something better with that. I saw companies which really tried within companies that some teams, they really tried almost to hide their data, because they're so scared that somebody could be faster in providing a great algorithm to finding some animosities.

Tim Berglund:

Oh, wow. It is a cultural problem.

Samuel Benz:

That's cultural problem, but that's the reality. We're seeing bigger components now that can happen. It's nice to have it in integration architecture, technical resolved many, many things, but there arise a couple of more problems.

Patrick Bönzli:

I think the one thing that I love most about the characteristics of Kafka once they hit the enterprise, Kafka has this kind of anarchy like nature on some level, it's one central point and you put something in it and anybody can listen to it. It means of course, topic of protection and permissions and stuff, but in principle, it breaks down all the silos that the enterprises have been working on for so long time.

Patrick Bönzli:

And I think this principle is very, very, very powerful, and some of these companies are very afraid of that, and it's been funny to see how they react to that. There's companies trying to isolate Kafka, so every application has their own Kafka instances, and then these they have for themselves for tests and dev and prop different Kafka clusters, again, so you multiply the amount of Kafka cluster inside of an enterprise, to dozens and dozens of Kafka clusters, which totally beats on some level, the sense of what we are thinking about as an integration layer.

Patrick Bönzli:

But it's the way how a new science fiction technology is applied to old thinking, and this is kind of fun to observe. And it really happens, I mean, and now that there's Confluent Cloud, it's really easy to actually spin up your own clusters with a simple click of the button. It's really temptous for a lot of these companies. And then of course, we're trying to bring them to more general and more wholesome kind of view of how to handle data and share it. But it's really trivial problems which are not trivial of course, but this is not technical. It's organizational human problems that Kafka creates as well. And I think it's good discussions, it's really good discussion, it's a catalyst for a lot more than technology Kafka.

Tim Berglund:

And you said something in there, I want to underscore, because earlier we had said, microservices talking through Kafka sounds great as long as it doesn't become a warmed over enterprise service bus and you're right. Then you said, just now there is a certain anarchy to Kafka, of course, there's data governance, we make sure that risk is managed and laws are obeyed in terms of who can see what.

Tim Berglund:

But given that if there's a default to data being available, that anarchy in the positive sense is really a liberty for new services to be able to do new value creating kinds of computations on existing data, without having to ask any more than, am I allowed to see this? You don't have to go into some brittle piece of infrastructure where you write XML config to change routing, and you implement some Java interface, and deploy a Java to some brittle piece of infrastructure that angry team doesn't want to let you touch.

Tim Berglund:

That's what happened with ESP, is that it was all centralized and organizationally controlled by small teams. And it had to be, because that was the nature of the thing. There really wasn't another way to do that. With Kafka, there's no way to do that, there isn't a central place for that to happen. There are logs where data lives and you can access and compute things on that data, but that's your application, you get to decide that. It's really a developer first kind of thing, and anarchy, not in the sense of throwing bombs, and gangs roaming the streets, but in the sense of an emergent marketplace, figuring out new ways to serve its customers in value creating ways.

Samuel Benz:

Yeah, definitely. Thanks for putting this into the right perspective. Of course, we as Europeans, anarchy is not like the worst thing to happen, but obviously there is some side effects that are really bad.

Tim Berglund:

There've been some experiments with it where maybe it was not the best.

Samuel Benz:

Exactly. But I mean, like this liberty, as you said really well, this liberty and freedom of choosing your data source, and being able to think for yourself, I guess it's more like a democratization of data. Like if we want to stay in political terms, and this is so powerful and so scary for a lot of enterprises.

Tim Berglund:

Like so many federated cantons working together toward a common goal, one might say.

Samuel Benz:

You have done your work.

Tim Berglund:

That's how you love Swiss people, right there.

Samuel Benz:

Bring something with chocolate and then we really have [inaudible 00:27:05].

Tim Berglund:

I thought that chocolate was a stereotype, but okay. I'll do it. We talked a little bit, and I think we were sort of like dancing around this. How about the process developers go through to adopt, people start with like something in a controlled lab, and then it goes from there. I've been talking organizationally, but zoom into how people learn stuff, if that's a thing you've seen, help me get an understanding of how you think the person on the street or the person on the keyboard goes through this process.

Samuel Benz:

What we are observing, it depends on how you use Kafka of course, like you said, more like in a small environment with that use case on data processing. But the general thing that we observe, especially once Kafka escapes lab, as you say, is that the users of Kafka run through certain levels of obstacles that they are always a little bit similarly. And for us, they remind us of this muscle of Spider Man that everybody is referring to if we are talking about human needs.

Samuel Benz:

There is a hierarchy, obviously like this physical protection, or the idea that you have to be able to run Kafka in production, if you run it yourself or you get it from Confluent, if you order for the services as well. If you want it to be wrong, but this security that you need in order to lie on the data, and for mission critical applications, obviously the first one. Afterwards comes protection of the data itself, who can see it's some kind of very dangerous or personal data, obviously not everybody can and should see it.

Samuel Benz:

This is like the second needs that normally wants you, you got working Kafka and cluster, and that enterprises tends to solve. And once you have that, you are actually ready to distribute Kafka inside of your enterprise. Once it's secured, once it's runs smoothly, you're at the stage where you forget that this was actually hard and you go out and you tell everybody, "Hey, there's this awesome Kafka, and I'm sure you have of it, but we have it too now, and please use it." And this is often greeted with a lot of sympathy and a lot of... They like it a lot.

Samuel Benz:

But at this point it's mostly the early adopters and in a way just kind of people that joined the party. These are people that love Kafka and have heard of it, and they're finally able to use it. And there's a certain amount of teams that normally join, we have already talked about the problem of getting data in and getting data out and who starts. I mean, at the end, mostly it starts with those people and those teams that have some kind of liking and some bias towards new technology. And also they don't mind if it's a little bit clumsy at the beginning, because it's still in the early stages of Kafka inside of an enterprise. They start out and then they start to develop like the use cases, and it starts to like really grow.

Patrick Bönzli:

If I can add once you reach that point, basically, then we see more, I mean, you don't rely on data, which you have stored in a Kafka topic, which are usually produced by a different team. All the developers, it's not any more the technical problems they have. It's just the standard things which they... What are kind of the operational things. Is this really the production stream, is how our schemers are changed on that. I don't know, what is the quality of the stream?

Patrick Bönzli:

It's not really the SLAs, it's more, OLAs, which are interesting for these people, but this is more than in the operational sense now. Once you really start working with that, that's I would say the first level of problems. I also saw sometimes companies really start losing control, because they really could do some circular dependencies on Kafka streams. So sometimes teams publish data, others use that, republished it, they'll re-consume.

Patrick Bönzli:

You can do messy things, which are not just opiates out of the topic name or whatever, or at least you only have to, there is not that much. This is all this transparency, which means it's not only that you see what kind of data it's inside you want to have, what's a basic knowledge about what's the throughput to handle, are the multiple different schemas inside? Is there really one schema? Do they really have this schema registry? Is this up-to-date?

Patrick Bönzli:

All this operational things, it's also always down to trust, now is it kind of a reliable source this other team or not. These are just operational things. And I think that at this point, this is really those two things happening. And I think these are the two things that I find most fascinating. One is, as Sam said, once there are multiple people joining in on this shared resource, like Kafka, it's kind of a shared resource, fun stuff happens.

Patrick Bönzli:

Somebody is changing the schema or there's some other people are getting angry, things like that. And on the other hand, at that point, often what you observe is that the growth of Kafka inside of a company starts to slow down. And if not paused, because at one point a lot of these early adopters, they have onboarded, or have done something, or would like to do something at least. And the speed of Kafka inside of company starts to slow down.

Patrick Bönzli:

And what we believe, what we see here is in many cases is a difference in how the early adopter, let's say people with bias for technology, and complexity, and detail. And the next segment, which is more like the early majority, which needs a certain level of convenience, which needs to serve through a certain level of... Or perceiving Kafka from a different point of perspective, more from the value point of perspective. And don't want to dive in all the details that are sometimes messy and complex.

Patrick Bönzli:

The next thing that often happens is that these enterprise start to build some convenience layers around it, which are really convenient, making it easier to publish data, making it more safe to publish data, because otherwise you have to use different tools to debug it, making it much more sound. And this is a huge thing to do, because at that point, obviously you have some first projects that have already been using it. And you start to at least rebuild a lot of that infrastructure of how they publish that data through your own connector, adapters, or whatever.

Patrick Bönzli:

And a lot of stuff is starting to happen around convenience. What we often see is like this idea that a lot of these companies are working towards this idea of take Twitter for data, this is how we like to call it. Everybody knows Twitter, anybody can publish something, anybody can subscribe to something, it's really trivial. Anybody can do it.

Patrick Bönzli:

And the same way, this is the same way how we think about Kafka at that stage. Kafka has a lot of complexity in it. It's a beast, but on some level you publish something and you get something, and that's it. And this is where all these companies are starting to converge towards. They're building something that basically mimics this pattern of Twitter for data. And I think this is one of the most awesome things that you can build around Kafka like these infrastructures around it.

Samuel Benz:

Which is trivial on a Kafka without ACL's and everything, but once you have all the secured layer on top of it, having really a self-service Kafka, which topics are correctly provisioned, created everything. And you can also subscribe on a topic, I would say, on a business level that it's configured automatically in the backend or the ACL's. It's not any more that trivial to have these use cases, but that should be the goal. So you have really then the secured level is enabled and everything, so again, this controlled transparency, and then you have to self-service.

Patrick Bönzli:

We've done that a lot. And then the crazy thing is, you might think that if you do it once, you can just apply it to anybody else, but enterprises are so complex in the technology stack in their history in how they built their systems and how these systems depend on each other. But every time you go to an enterprise, the probability is really high that you find something new that doesn't fit the puzzle you had previously.

Patrick Bönzli:

Coupling the idea of the self-service infrastructure of Kafka from the actual enterprise infrastructure is really, really key. And also the kind of magic that's always new, and you never know what to expect. Well, there's one thing that you can expect a lot of talking. But other than that, it's always new.

Tim Berglund:

When you said Twitter for data, my gut reaction was negative, I thought that sounds terrible, but then you explained, and that makes a lot of sense. It's a service, it's trivial to access, it's ubiquitous. Anybody can put things into it, anybody can get things out of it, asterisk subject to data governance. It's like Twitter, just without the screaming and the yelling and the implications, and the things that affect that platform from a social perspective. But from a ubiquity and that the trivial API angle, I get it. I think that sounds great.

Patrick Bönzli:

There's actually a lot of, I mean, a lot is a little bit too much, but there's some really interesting talks on old Kafka ceramics from companies like Microsoft, and what was it again, I think even Yahoo and Ancestry, something like that. Where they explained how they built their own platforms around Kafka, that actually fulfills aspects of what we have been discussing now, making it a little bit more easier to build or to come up with a meaningful topic name. There's something very basic, but you don't believe how many companies are starting to talk a lot about how should we name our topics, because at one point it's just really hard to distinguish them.

Patrick Bönzli:

There's some really good ideas out there already of how you can tackle these things. And what I really find fascinating is exactly this point that you said like, "Well, Twitter without this yelling, and all these social awkwardness." On the other hand, there is actually a lot of communication going on around that, which is not yelling, obviously, because we're engineers, we are ready are cultivated, but-

Tim Berglund:

Oh, yes, yes. Of course-

Patrick Bönzli:

Not everybody has such a nice podcast voice as you do. But I mean, we do speak normally. And this is really interesting. As Sam previously said, schema is changing, so what do you do? There is cost consumers, and you start talking to them, and you go forth and back to kind of find a way of how you can mitigate these changes. And once you have this kind of Twitter thing, there is actually an overlay of communication going on between these people responsible for these topics, and then producers, and consumers, of course, which is really interesting. And I've just started to tap into these challenges. And that I believe there is a lot to be done on that ground.

Samuel Benz:

Connecting the people, and not just the technical center.

Patrick Bönzli:

Exactly, exactly.

Samuel Benz:

So we want to get rid of brokers, and topic names, and things, but you want to address people, data stream. Now, that should be the goal.

Patrick Bönzli:

Yeah, I mean, what I like about the idea also on an integration layer of Kafka is like this decoupling, the idea that you put something in between, you lose a little bit of coupling, and you get a little bit more flexible. I think if you progress on that idea, where you end up is where at the moment, there's a lot of late binding stuff that you can do, if you do software development and then there's pros and cons.

Patrick Bönzli:

But if you go to data, I mean, what that speak against like addressing data, not as really like a topic, but generally as a description of the data that you need, any customer data from 2019 which has names and addresses in it and that's it. And then the system itself somehow gives you the right topic that actually gives you the data. And that's like a whole other level of abstraction, of course.

Patrick Bönzli:

But I mean, if you're talking about decoupling, that's kind of the trends that we are heading towards, like decoupling on many levels and therefore you need some kind of transparency to understand what data is there, how should you describe it? What people are behind them? It's kind of similar, I don't mind, I always... I think software development has gone through a lot of the challenges that let's say, like data as a general thing it's gone through today.

Patrick Bönzli:

There is this DevOps things, and people are starting to talk about DataOps, there's GitHub, which is obviously a collection of software packages that you can choose from. And I love this idea where you just browse for it for a software package, and you get a list of software, like open source software package. Then at that point, you just try to find out which one is the best for you. And software solved this problem, software solved it with GitHub.

Patrick Bönzli:

You see who are the committers? How active is it? What's in it? What language are we talking about? How much trust do I have towards this software? You get it from looking at GitHub. And this is exactly what I think is missing in data. You have the exact same problems, exact same problems. And then this is what we try to help companies to solve, to solve these problems of finding the right data for what you're looking at.

Patrick Bönzli:

And of course it's slightly more complex, because data is really abstract, and you don't have, I mean, in terms of software engineering, you have certain patterns that it follows depending on the language, which you don't have, or are not that obvious in data. But otherwise you can have some ideas, there is a schema, there's certain aspects that you can describe.

Samuel Benz:

So they want to GitHub for data, but as simple as Twitter.

Patrick Bönzli:

Exactly. And this is where we are, this is what we built actually Tim.

Tim Berglund:

That is a compelling vision. My guests today have been Patrick Bönzli, and Samuel Benz. Patrick and Sam, thanks for being a part of Streaming Audio.

Patrick Bönzli:

Thank you, it was really awesome.

Samuel Benz:

Thank you, it was nice to talk.

Tim Berglund:

Hey, you know what you get for listening to the end? Some free Confluent Cloud, use the promo code 60PDCAST, that's 6-0-P-D-C-A-S-T, to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 31st, 2021, and use it within 90 days after activation. And any unused promo value on the expiration date will be forfeit, and there are limited number of codes available, so don't miss out.

Tim Berglund:

Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D, or you can leave a comment on a YouTube video, or reach out in our community Slack. There's a Slack sign-up link in the show notes if you'd like to join. And while you're at it, please subscribe to our YouTube channel, and to this podcast, wherever fine podcasts are sold. And if you subscribe through Apple Podcast, be sure to leave us a review there, that helps other people discover us, which we think is a good thing. So thanks for your support, and we'll see you next time.

Many industries depend on real-time data, requiring a range of solutions that Apache Kafka® can help solve. Samuel Benz (CTO) and Patrick Bönzli (Product Owner) explain how their company, SPOUD, has fully embraced Kafka for data delivery, which has proven to be successful for SPOUD since 2016 across various industries and use cases.

The four Kafka use cases that Sam and Patrick see most often are microservices, event processing, event sourcing/the data lake, and integration architecture. But implementing streaming software for each of these areas is not without its challenges. It’s easy to become frustrated by trivial problems that arise when integrating Kafka into the enterprise, because it’s not just about technology but also people and how they react to a new technology that they are not yet familiar with. Should enterprises be scared of Kafka? Why can it be hard to adopt Kafka? How do you drive Kafka adoption internally? All good questions.

When adopting Kafka into a new data service, there will be challenges from a data sharing perspective, but with the right architecture, the possibilities are endless. Kafka enables collaboration on previously siloed data in a controlled and layered way. Sam and Patrick’s goal today is to educate others on Kafka and show what success looks like from a data-driven point of view. It’s not always easy, but in the end, event streaming is more than worth it.

EPISODE LINKS

Continue Listening

Episode 148March 17, 2021 | 42 min

Event-Driven Architecture - Common Mistakes and Valuable Lessons ft. Simon Aubury

Event-driven architecture has taken on numerous meanings over the years—from event notification to event-carried state transfer, to event sourcing, and CQRS. Why has event-driven programming become so popular, and why is it such a topic of interest? Simon Aubury (Principal Data Engineer, ThoughtWorks) is here to tell all, including his own experiences adopting event-driven technologies and common blunders when working in this area.

Listen Now

Episode 149March 24, 2021 | 50 min

Smooth Scaling and Uninterrupted Processing with Apache Kafka ft. Sophie Blee-Goldman

Availability in Kafka Streams is hard, especially in the face of any changes. Apache Kafka Committer and Kafka Streams developer Sophie Blee-Goldman shares about how to solve the stop-the-world rebalance and scaling out problem in Kafka Streams using probing rebalances.

Listen Now

Episode 150March 31, 2021 | 30 min

Building Real-Time Data Pipelines with Microsoft Azure, Databricks, and Confluent

Processing data in real time is a process, as some might say. Angela Chu (Solution Architect, Databricks) and Caio Moreno (Senior Cloud Solution Architect, Microsoft) explain how to integrate Azure, Databricks, and Confluent to build real-time data pipelines that enable you to ingest data, perform analytics, and extract insights from data at hand.

Listen Now

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog