Get Started Free
April 7, 2022 | Episode 207

Scaling an Apache Kafka Based Architecture at Therapie Clinic

  • Transcript
  • Notes

Kris Jenkins: (00:00)

Picture a company with a bunch of different revenue streams. Some of their sales are online, but some are in actual physical shops around the country. Some of the things they sell will get bought today, but some get bought weeks in advance so that complicates the picture. All the data's in different third party systems plus a few spreadsheets that get updated at the end of the month by hand. And your job is to get from that hodgepodge to a tech-centric business that has a real time and accurate view of everything that's happening across the company. You've got to do it while the company is still operating and you've got to do it during a pandemic. And if that weren't enough, you'll be starting completely from scratch. There's no in-house IT at all.

Kris Jenkins: (00:46)

Part of me thinks I'd relish the challenge, but I'd also be absolutely terrified. And today I am talking to Domenico Fioravanti who has spent the past 18 months doing exactly that, transforming the Therapie Medical Group with a bit of agile thinking and some Confluent Cloud. And I got the impression as we talked that he relished the journey the whole way. So before we start, let me tell you that the streaming audio podcast is brought to you by Confluent Developer, which is our site that teaches you everything you need to know about Kafka from how to start it running and write your first app, to architectural patterns, performance tuning, maintenance, and more.

Kris Jenkins: (01:27)

Take a look at developer.confluent.io. And if you want to take one of our hands-on courses that are there to teach you more, you can easily get Kafka running using Confluent Cloud. Just sign up with the code PODCAST100, and we'll give you an extra $100 of free credit to get you started. And with that said, I'm your host, Kris Jenkins. This is streaming audio. Let's get into it. My guest today is Domenico Fioravanti. Hi, Domenico.

Domenico Fioravanti: (02:02)

All good. All good, thanks, Kris. Thanks for having me here.

Kris Jenkins: (02:05)

Pleasure. So Dom, you are variously an entrepreneur, developer, tech manager, and now you're director of engineering.

Domenico Fioravanti: (02:17)

Yeah, yeah. I have a weird story let's say. I have a weird trajectory because I started after my computer science degree as an entrepreneur, small entrepreneur in Italy, with my own company building custom software, web application in Java, something that was very common in 2000s. And I had this role for 11 years and then I decided to go back to instead trying to be an employee. So probably the opposite of what many people do, but I'm pretty happy of the choice I would say.

Kris Jenkins: (02:47)

Yeah. But eventually you were pulled back into at least the high end of management, right?

Domenico Fioravanti: (02:53)

Yes. I mean, that management was always my core probably. Yeah. That's why I went back being a software engineer. But then, I mean, I did the trajectory so I became a tech lead naturally, senior tech lead engineering manager, and now I am an engineering director.

Kris Jenkins: (03:08)

Yeah. And so that brings us to today's story if you will. In October 2020, which pandemicky time, so the world's beginning to close down, you make a leap to something called the Therapie Clinic.

Domenico Fioravanti: (03:26)

Yes, it was definitely a crazy period like still today it is, but at that time we were in full COVID times and lockdowns, and I decided to do a move because there was an interesting project for me. So I was hired as engineer director in Therapie Medical Group. Therapie was a clearly known at the company, company working in the medical and aesthetic area since 20 years almost, so not a startup at all. And because of the trajectory of growth that the company was having in different direction, different businesses, and also because of the plan of growth for the future, the CEO decided at that time to invest in technology, and it was definitely a good choice. And he wanted to transform the company from a tech last to a tech first throughout the company scale. I was hired as engineering director together with me as CTO and a product director.

Domenico Fioravanti: (04:17)

And we were given the goal at that time to build an internal tech organization from scratch, from ground up. So definitely for me it was an amazing possibility because I could put in practice all the learning of my 20 years career in a real use case, which is something that doesn't happen very often. Normally you join a company which is already there. Yeah. You help the company grow, but in this case, we're really starting from zero. We started from three people and today we're fifty, almost fifty people in the tech organization. So today I would like to share the journey with all your audience.

Kris Jenkins: (04:54)

So that's a growth of what, 40-odd people in less than two years?

Domenico Fioravanti: (05:00)

Yeah. One year.

Kris Jenkins: (05:01)

I can see why that would appeal, especially given like it's almost an IT and management greenfield project, right?

Domenico Fioravanti: (05:07)

Yes.

Kris Jenkins: (05:07)

It's entirely your playground.

Domenico Fioravanti: (05:12)

Yes.

Kris Jenkins: (05:12)

But before we get into that, give me an idea, because I don't know what the Therapie Clinic do, what the Therapie Group does. So if I'm a customer, what am I buying from you? Give me an idea of the business on the ground.

Domenico Fioravanti: (05:24)

Yes. Therapie Clinic has different business. The most known is our aesthetic and clinic. So we offer services in UK and Ireland and today we are 75 plus clinics. We offer services like sculpt skin, laser removal, Botox fillers, all things that are definitely non-tech and non-appealing for engineers.

Kris Jenkins: (05:51)

I know everyone needs to wax from time to time, right?

Domenico Fioravanti: (05:58)

Yes, yes. That's true. That's true. I did some face treatment for free, so, and it helps at a certain age, but yeah, definitely one of the struggle was that definitely is not a field that is appealing, let's say for tech people. So this was one of the struggle, but we managed to overcome this by offering a way of working that was very appealing for engineer. So going back to your question so Therapie has this area which is called Therapie Clinic, but there is much more. We have more businesses. One is Therapie Smile so we offer teeth braces.

Kris Jenkins: (06:30)

Okay.

Domenico Fioravanti: (06:30)

One is called Optilase where we do laser eye surgery and we have 15 clinics or more and growing between in Ireland and Northern Ireland. And there is a brand new business that we start in 2021 which is called Therapie Fertility. So we opened in 2021 the biggest Irish fertility clinic. So IVF sector, egg freezing, all these things that, again, I didn't know anything about at the time. I mean, what was very appealing was definitely the trajectory of growth of the company. And in fact, the size of the company kind of doubled since I joined. We hired 800 people since I joined.

Kris Jenkins: (07:09)

Okay.

Domenico Fioravanti: (07:09)

We opened more ... as I told you several clinics on top, and part of the trajectory of growth was the tech organization. So definitely a very nice situation to be if you're a tech person, let's say, you see scale, right?

Kris Jenkins: (07:26)

Yeah. So you're in the situation where you've got lots of physical locations around the UK, somebody who presumably on the other side of the business is worrying about no end of medical training and legal issues, and presumably they're far too busy to think about the IT. So you come into what kind of situation technically?

Domenico Fioravanti: (07:50)

So the situation was the typical situation that probably you have seen many time if you've been consultant, right? So a company which was non-tech. So every software was outsourced. So third party software, software built for us by third party company, software as a service, and worst case scenario was no software at all, so manual processes, especially in the reporting area is very common that when you have software that you don't control, it's hard to have an automated reporting because you have different sources, right? So especially in the reporting area there was a lot missing. There was no software. There was manual processes, manual aggregation, spreadsheets. And this was done obviously manually. And all the reporting was coming end of the week, end of the month, and in a world which is definitely data-driven, we know that data, they lose value the more you wait to read them. So that's definitely one of the key area where we needed to intervene let's say. And that's what happened actually.

Kris Jenkins: (08:50)

Yeah. I mean, we often think here the batch processes are a dirty word, but we're usually thinking automated batch processes. If people are actually manually compiling reports, that makes me shudder.

Domenico Fioravanti: (09:05)

Yes. That's definitely the sensation I had when I joined. But that's also cool. I mean, as engineers, as tech person, again, when we see a possibility, that's the moment where you're excited, right? Because it's too easy when you arrive and everything is there and you do a small improvement, but this was really a greenfield. And you could see so many things that you could improve because in this situation, your work has as a big impact, has a huge impact, which is not always the case, right, when you do a job, right?

Kris Jenkins: (09:36)

Yeah, yeah. It's a highly visible, big challenge. And it feels like nearly everything you could do from that point would be an improvement, right?

Domenico Fioravanti: (09:45)

Yes. Yes. And I mean, you cannot fail because whatever you do it's an improvement. I think the complexity there was more the communication, right. So if you're working at that company it's easy to communicate with the board, with the leaders because more or less, they're all tech or they have tech background. So it's easier to explain what you're doing and why it takes so long or why there are delays and what is the complexity of what you're doing. In this case, I mean, communication channel has to different, so definitely non-technical. So the communication channel is delivering value I believe so one of the key factor, and we knew that since the beginning was not building the perfect system of the words, the ideal one, which is always in our minds, because that's what we would like to do.

Domenico Fioravanti: (10:29)

I mean, but you always have to ... that's the role an engineer put in place the ideal scenario versus what is the reality? What are the parameters of the environment, right? So we knew since the beginning that the key factor would have been try to deliver quickly value to the company to demonstrate them that they did a good choice in investing us, right?

Kris Jenkins: (10:51)

Right.

Domenico Fioravanti: (10:52)

And that's why the way you implement the software, the technology you choose, the architecture you choose cannot be the ideal one. It has to be the right one all the time. So the right one was at the beginning simple. Go simple, go with the skills you already have in the team you're building, because obviously the other complexities that we were building software while building the tech organization. So we were hiring teams, putting them together, and obviously you cannot have all the skills you need as soon as you hire people, you need to upscale.

Domenico Fioravanti: (11:23)

So the idea at that time was even though in our mind, was definitely the idea of introducing Kafka just to connect to why we are here, right? Kafka and having a partner like Confluent was in my mind. I worked with Kafka in the past and I definitely believe that when you build the tech organization, you need to keep in mind the famous Conway's Law, which I always mention, [crosstalk 00:11:47] because as most of the audience probably know, I mean, Conway's Law says that if you are an organization trying to build, design a system, the final design that you will have will match the organization communication structure and team structure. I've been there before. It's always like that.

Kris Jenkins: (12:08)

Yeah, so-

Domenico Fioravanti: (12:08)

So if you were in a company with a monolith software, the company organization, it's a monolith. So it's a lot of dependencies, a lot of communication channels, because that's like that, when you want to deploy and code into a monolith, there are tons of communication that has to happen. There is a separate QA team. There is a separate developer deployment team because that's the way it is. And then I worked in a company doing microservices in the pure way, not the same as the monolith, and the company was super agile, the teams were autonomous. They were able to deliver value super quickly to production. And so that law never fails. So that's why this was always in my mind, in our minds. So we had in mind the ideal architecture, the ideal scenario, the fact that Kafka could have been the backbone of our even driven architecture.

Domenico Fioravanti: (13:01)

But we also knew that you cannot introduce such a big complexity at the very beginning, especially because you need to upskill also the people before that. So that's why we went very easy at the beginning, and we were able to build teams and the teams were able to deliver value to production in one or two months. And then there was a moment where naturally we understood that we reached the critical sides of the tech organization. And we were able to then introduce Kafka and partner as it happened with Confluent to have managed conflict Confluent flow cluster.

Kris Jenkins: (13:34)

Cool. So you're there, you're thinking you want to end up with a real time reporting system. You're going to use Kafka, but if you arrive with an announcement of a two-year project, then you're not going to last very long, right?

Domenico Fioravanti: (13:49)

Yes.

Kris Jenkins: (13:50)

So what was the first thing you built and how did you put it together? Actually, did you start with, "I'm going to deliver something in two months and then figure it out?"

Domenico Fioravanti: (14:00)

Yes.

Kris Jenkins: (14:00)

Or did you start with, "I'm going to build something small and how long will it take?"

Domenico Fioravanti: (14:04)

No. So we started first of all to identify the first focus. First focus as probably understood was the reporting area. Because again, if you don't have numbers, it's hard to plan anything else, right?

Kris Jenkins: (14:15)

Yeah.

Domenico Fioravanti: (14:15)

So that will also decide which are the priorities, even in what software are you going to replace first? Right. So definitely the first team was built in the reporting and data analytics area. So, and yes, and the idea was already to be able to deliver quickly something, right. While ideally I would've introduced Kafka so that, I mean, we would have ... I think Kafka is an enabler, right?

Kris Jenkins: (14:38)

Yeah.

Domenico Fioravanti: (14:38)

Because Kafka versus the standard request response paradigm instead it helps you to build a system which is very decoupled, right? So if you free up the data from the third party software that I mentioned, you publish on Kafka, this will ... you know that it will enable many other use cases, right? But it would've taken too long, right? So what we did is that we started building a team on this data reporting area and we started going with an easy solution. So this easy solution was quite simple. We started building ETL pipelines on AWS. We used serverless technologies. Everything was managed. So we used AWS AppSync, like managed GraphQL fuel to communicate with the third party API and abstract again.

Kris Jenkins: (15:28)

That's the thing where you can ... it kind of reminds me of Kafka Connect in a way.

Domenico Fioravanti: (15:33)

Yes.

Kris Jenkins: (15:33)

It's like we can connect your third party and present it as a GraphQL API easily.

Domenico Fioravanti: (15:39)

Correct. Correct.

Kris Jenkins: (15:40)

Chewing from that. Yeah.

Domenico Fioravanti: (15:41)

Yeah. The idea is always obviously to try to build something that is reusable in the future. So GraphQL at that time gave us the ability to obstruct from the third party API. So the naming the domains system because we didn't want to rebuild something identical to what we had. So this was the reason. And also we went for a serverless, so managed GraphQL and AWS, because, so we didn't have the burden of managing it so, and then next step was like introducing, obviously using AWS Lambda schedule Lambda to batch call the third party API and mine the data, produced the data on an S3 bucket. And then another AWS Lambda react into the S3 bucket event of five creation to do the transform and load. And the load was done in a Redshift instance, still on AWS with a looper.

Kris Jenkins: (16:32)

Right.

Domenico Fioravanti: (16:32)

... as a BI tool on top of this. So what we did is that in a month we had this pipeline, which was kind of mining transaction and sale data from the external system that we were using for transaction and booking. This system was used both in our call center, and in our front of house in all our clinics. So all the transactions were there and we were able in less than a month really to start building our first dashboards in Looker. So Redshift was our warehouse database. We built the star schema on Redshift.

Domenico Fioravanti: (17:05)

And the data analysts that we hired in the meantime were able to already build the first dashboard and those dashboard, I cannot say they were real time because we still depend on a third party system so our lambdas were scheduled once an hour, but from the situation that we found when we arrived, which means reporting at the end of the week and at the end of the month to have an aggregation of all the clinics numbers, let's say, we are able to deliver dashboards that were one hour late, worst case scenario, which, I mean, again, talking about big results, that was an amazing change for the company and the company already saw, "Okay, so these guys, they know what they're doing." And they started using those, all the board ... I mean, the managers of the company starting using those numbers to take decision, which is really amazing. And it happened in [crosstalk 00:17:59]-

Kris Jenkins: (17:59)

Especially in this kind of time, when you need to adapt more than ever, right? The past couple of years business-wise, you need to make those decisions faster and larger than I think we've ever had to.

Domenico Fioravanti: (18:11)

Correct. I mean, COVID is an amazing example, right? So I think that ... I think the complexity for us tech manager is not building software I mean, at all. We know how to build software, right? I mean, you hire good people and they will build software. The complexity is to build an organization that is able to build software at scale. And it's very agile in change direction if needed. Again, like now we go back to the two years plan. I mean, if you do a plan of two years, but the world changes in two months, that can happen.

Kris Jenkins: (18:42)

Yeah.

Domenico Fioravanti: (18:42)

So with COVID the companies that were able to move fast and change the direction fast, for example, company without an e-commerce, right, they had to close their shops and start selling quickly. And the company that weren't able to do this very fast, they just closed, right?

Kris Jenkins: (18:57)

Yeah.

Domenico Fioravanti: (18:57)

And that's why, again, in this kind of world, the ideal is that you need to build an organization which is able to build and deliver quickly software at scale, and eventually be also able to change direction if needed pretty quickly. And this is feasible if you do like again, continuous delivery, pure CI/CD if possible, small batches, quick feedback loop from the business to validate what you delivered. That's the secret sauce. I don't think that's a problem is building any commerce. We did it many times. It's not building again the data pipeline. We've been there. The problem is building the organization that is able doing that well, let's say at scale.

Kris Jenkins: (19:38)

Yeah. I think it's sometimes that ability to step back and see beyond the deliverable to the slightly bigger picture, getting that organization right.

Domenico Fioravanti: (19:48)

Yes. I think what we should always have in mind is that again, the delivery in the short-term and the big picture, right? That's the role of our tech manager again, I say it again. There is always a balance, a trade off that you have to make. You need to deliver value. That's what you have to do. I mean, there is, again, I'm not a big fan of delivery date, right? I mean, everyone knows working with me because, but I know that real business, they need dates. They need to know what happens, right. So that's why, what you always need to have in mind the big picture, the plan, the long-term plan. But you also know that this long-term plan probably will change and delivering quickly in small batches will make you able to change the plan if possible, because the business will come to you and tell you, "Yeah, we say that this is the priority, but now we have another priority so switch."

Kris Jenkins: (20:42)

Yeah.

Domenico Fioravanti: (20:43)

So we need to be able to do that. We cannot say, "Oh no, guys, we said that. Now we need to finish this." No, doesn't make sense. That's why whatever you build has to have this in mind. So be abstract enough to have in mind to the final goal, the long-term goal, but also be realistic enough to be able to change the plan if the plan change, the business plan change.

Kris Jenkins: (21:07)

Yeah. Yeah. I think it's that kind of balance that between, I suppose, at the extremes, you'd say Waterfall and Agile, right. You've got Waterfall big, long-term goal in trying to bite it all off at once and you fail. And I think Agile, if it has a fault sometimes it's so focused on the delivering something quickly that it doesn't have a long-term vision.

Domenico Fioravanti: (21:30)

Yes. Correct.

Kris Jenkins: (21:31)

Doesn't know where it wants to go, right?

Domenico Fioravanti: (21:33)

Correct. Perfectly. I think you definitely nailed it. So Waterfall we've been there, it doesn't work. We all know. Right? So it's impossible to decide now and have all the details of what we're going to do in the next year. And especially whenever you're going to deliver whatever you've delivered is not what you need because everything has changed, right. So definitely it makes sense the Agile approach, but again, like going back to the Agile, you need to know that all the choices that you make and sometimes they're building a tech debt, right. That's what we need actually, right. Going back to our example, before we knew that we were building a tech debt, but we know that it was the best thing to do in that moment. So building a dedicated pipeline in Python on serverless, specifically to deliver value and quickly have reporting dashboards almost real time.

Domenico Fioravanti: (22:21)

That was the main goal. But in our backlog, the backlog of the team, actually, there was already an epic saying, "Okay, we need to migrate this to Kafka." It was already there. And we all knew this would've happened. It was in our mind because obviously while the pros of the decision we made was like, okay, we were quickly improv. We used the knowledge that the team had. We quickly had the feedback from the business that was the right thing to do. But obviously the cons is that while we were doing an effort of freeing up data, which means mining data from the third party system, but we were using the data only from one use case, right. Because you have the dedicated pipeline. So let's say another scenario of another use case that wanted the same transaction data we would've needed to build another dedicated pipeline. So we were not really freeing up the data because every use case needed another pipeline, dedicated pipeline, to do something, right?

Kris Jenkins: (23:18)

So you've got a dedicated, but quick pipeline for every single thing you're doing.

Domenico Fioravanti: (23:23)

Yes. Which means-

Kris Jenkins: (23:24)

Yeah, I see what you mean.

Kris Jenkins: (23:26)

I mean, you have to maintain 50 pipelines and again, if there is a change you need to change all of them. Obviously we knew that this was tech debt. It was pretty clear, but sometimes again, you have to do the right thing, not the perfect thing. And that was the right thing to do at the moment while having in mind the fact that we know this is not the final solution because otherwise we'll have to maintain 50 pipelines, right. Just to give an idea, one of the scenarios, the next use case was like, for example so we have offline business which are those the clinics. We also have an online business because we have any comms obviously where customer can buy product, buy services, and also book appointments, right.

Kris Jenkins: (24:05)

But most of our business is on offline or 85% at the moment, probably less with COVID, which means that, for example, for our Facebook marketing campaign, we weren't using the offline data, right? We were using only the online data because we have obviously a Facebook pixel in our website, which is sending data to Facebook and then marketing people were using those to target customers with marketing campaign. But we weren't using offline data, right, which is most our customer-

Kris Jenkins: (24:36)

Most of your data. Yes.

Domenico Fioravanti: (24:37)

So what we were doing is spending a lot of money on marketing campaign because we were targeting everywhere, everyone. Right? Because we were not sending the behaviors of 85% of our customers. So lots of money with small result. So one of the needs, another of the many needs we had was to use transaction data coming from offline behaviors to feed Facebook KPI so that the marketing people can then use those information to target 100% of our customer because we're 100% of behaviors or non-target them maybe, right, so exclude them from the targeting maybe. So this was another use case.

Domenico Fioravanti: (25:13)

So in this use case, what did we had to do to build another pipeline mining the data and then sending it to Facebook, right. So that was the moment where we understood okay, probably because then in the meantime we already arrived to be being three teams, three development team.

Kris Jenkins: (25:31)

Okay.

Domenico Fioravanti: (25:31)

And so we said, okay, that's probably the moment where ... because if we had the transaction data in the Kafka topic, instead of rebuilding another pipeline dedicated to the Facebook, we could have just add a Kafka consumer on that topic, which would've just then spoken to the Facebook KPI. Again, Kafka is an enabler. Once you have data in a topic, this will enable different teams to consume them. So there will be probably one team owning the data, but then other team can consume. And we know, right, in Kafka who consume ... producer doesn't know about the consumer. Doesn't need to know these Kafka scales if you need. So, and that's the beauty of it. And again, this is definitely a scenario where we started understanding, "Okay, probably is the moment where we need to introduce Kafka and use it as an enabler," right?

Kris Jenkins: (26:19)

Yeah. To get a unified picture that you can then read from for all these different purposes.

Domenico Fioravanti: (26:25)

Correct.

Kris Jenkins: (26:26)

Do you know, I like the idea that you're sitting there in the office and the business people are thinking, "This is great. We've gone from a week or a month to an hour." And you're thinking, "This isn't nearly good enough."

Domenico Fioravanti: (26:39)

Correct. Yeah, but again it's hard to as explain, right. But that's our role, that's how a tech manager is the one that has to translate complexity to additional language, to be able to have the business understand because the business has to understand and I think another key factor here is communication again, like the short fit, but look allows you to communicate quickly the status of where you are at, right? So why if you say, "Okay, we're going to be ready in a year," and you'll be never ready in a year because you'll probably fail and be two or three months late. If you deliver continuously, you're also able to continuously communicate the status where you are.

Kris Jenkins: (27:22)

Yeah.

Domenico Fioravanti: (27:22)

So, and I think it's important while we know that dates are just dates or guesstimates, because there are so many unknowns that can happen when you build software. But if you're able to communicate quickly and give a status update very often to the business, you can tell the business early if you're late so the business can adapt to that situation. You cannot arrive one month before the deadline or one week and say, "Okay, sorry, we're late two months." So that's another part which is very important by using this kind of approach.

Kris Jenkins: (27:54)

So, yeah. So you're basically doing it on both sides, right? You're worrying about real time data, but also real time person-to-person feedback within the business.

Domenico Fioravanti: (28:03)

Correct. Correct. Correct.

Kris Jenkins: (28:04)

Ah, you've soaked this right into your bones, I can tell.

Domenico Fioravanti: (28:09)

Yes. I mean, again, when you start from zero, it becomes like your baby, right. And personally again, like probably many of us, I really like what I'm doing and this is probably the secret sauce of-

Kris Jenkins: (28:21)

That goes a long way, yeah.

Domenico Fioravanti: (28:23)

Yeah. I mean someone say that you'll never work a day if you like what you're doing. And definitely, I like a lot what I'm doing. It's good also to see that all that you have learned in the past, when you put it in practice, I mean, you do some errors still, but you see it working because definitely the journey that we had was the right one. That's why, I mean, I'm very happy of sharing this with you and your audience because I think that other people will probably be in a similar situation and will do similar errors and similar learnings.

Kris Jenkins: (28:59)

Yeah. I can imagine a lot of people want to be in that situation too. It's quite a journey. So-

Domenico Fioravanti: (29:01)

Yes.

Kris Jenkins: (29:03)

So here you are, you've got a number of custom-built pipelines. You want to get to Kafka. What's the point ... What do you do to prepare for the move over? And when do you know it's the right time to say, "Okay, we're biting the bullet"?

Domenico Fioravanti: (29:21)

Good question. So, let me tell you a bit more about what we did in the beginning. So at the beginning we started building those teams which are so-called stream aligned. I like to mention many times the book Team Topologies. I mean, I'm not getting any money from them, but it's an amazing book.

Kris Jenkins: (29:36)

I've heard a few people [crosstalk 00:29:38] I have to read it.

Domenico Fioravanti: (29:40)

Yeah, it says a lot of things that probably already knew, but it's good to read on paper from people that had similar experience to mine. So we started building so-called streamline teams. So team focus on delivering value. So the streamline team, they own ... they're long living first of all, because in software teams has to long-living. A team has to own an area or a domain of the business. They need to be the master of this domain to be able to deliver in that area, because knowledge of the domain is part of what you have to know to be able to build software, right? So what we started doing at the beginning building only those kind of team. Those teams has inside, they have inside all the skills needed to build software from the beginning to the end.

Domenico Fioravanti: (30:19)

So product manager and engineer which is a tech lead, other engineer, back end or front-end, based on what they're doing, designer, if there is front-end work and also data analyst, because we are also trying between all this to implement a distributed data mesh paradigm shift as I've seen and Kafka again comes back because this was in our mind that Kafka would have enabled us to do that. Okay. So that's what we did. So the first three teams they were streamlined, delivering value, going easy, trying to minimize the number, the DevOps and platform for work needed. Because that's something obviously that gives you additional burden if you want to deliver software. Once we reached the three team sides we started understanding that there were a lot of things that were cross teams, right? So there were a lot of ownership that was like cross teams. So we understood that was the right moment to invest building an internal platform or DevOps team, let's call it. There are many names.

Kris Jenkins: (31:20)

Right.

Domenico Fioravanti: (31:21)

So we started hiring the first platform DevOps engineer. And so now we started having someone that could be an owner, an entity that could be owner of something that was cross team like Kafka cluster, or Kubernetes cluster, just to give another example which is not Kafka related. So that was really the moment because we understood that every team was doing a lot of DevOps work, while I believe that every engineer needs to have a DevOp background as well, because that's the way you are able to deploy production without having to depend on a separate deployment team, but when this burner, DevOps burner becomes too much versus the development part, I think that's the right moment where you need a team that can remove this burden, take ownership and be enabler of the streamline team by owning the cluster, working on infrastructure as code in place or best practices and so on and so forth.

Domenico Fioravanti: (32:17)

So that was the moment where we started building the team. And that was also the moment where we started doing a small spike on trying to understand what was the best choice for us between having totally managed by us Kafka cluster, going with Amazon MSK, or going with Confluent Cloud, right. So we did a spike, the senior engineer, the platform engineer, they did a deep investigation trying to compare plus and minus of the different solution. Definitely we discarded the entirely managed Kafka cluster since the beginning, because again, that's something we couldn't afford.

Domenico Fioravanti: (32:53)

I mean, we had the one person platform team. You cannot manage a Kubernetes cluster, Kafka cluster on your own with such a small platform team, right? So otherwise the other solution would've been, "Okay, guys, wait, we build a four or five people platform team and then in six months, we start again." We arrive to have a critical mass of development teams so we had three teams and we started filling the need as expected, let me say of having a dedicated platform team to be able to, and to have an entity could take ownership of the increasing DevOps and infrastructure work that was needed and remove it from the team to make them even faster in the delivering, right?

Domenico Fioravanti: (33:32)

So we started investing in that, but again, we couldn't invest to have a fully fledged platform team, because again, you go back to cost and time and all those variables that you need to take into consideration. So we went, started hiring the first senior platform engineer, and with his help we started investigating doing a spike on which was the best solution to go for to have a Kafka cluster up and running quickly. So we discarded since the beginning to have fully managed by us Kafka cluster because of the reason I said above, we don't have a fully fledged platform team. So we started investigating which was the best choice between Amazon MSK, Confluent Cloud, standard or dedicated cluster. We did a very deep investigation, a long list of criterias involving senior engineers in the company as well because obviously this would've been a choice that could influence our next year or two, probably so-

Kris Jenkins: (34:28)

Everyone has to buy into it, right?

Domenico Fioravanti: (34:30)

Yes. Correct. Correct. And we also involved at that point Confluence. So we started meeting with some of Confluent people to help us also in this decision, by giving us some more information, having a test class, or just to be able to go and test and see what are the limitation or not. So the list of criterias were like setup time and cost, maintenance cost, limitation, security, availability, migration costs, Kafka version update frequency, observability, and monitoring in general. So definitely a lot of different criterias plus obviously the cost and the cost was not only the cost of running the cluster base on our throughput and how much data we were having on Kafka, but also cost in terms of resources, which is probably the highest cost in IT, right?

Kris Jenkins: (35:29)

Yeah. Nothing's more expensive than people, right?

Domenico Fioravanti: (35:31)

Yes. That's something I always ... I mean, trying to save money like 1,000 a month and something, and then you know that engineering costs, resources, are the one that costs more, right. So I think it's very important there when you try to go with the super cool decision of managing everything on your own, which is fancy, right, idealistic let me say, because it's, yeah, you can have control, looks like great, but then if you need four or five people, I mean, that's amazing amount of money that you're spending every year. And certainly you're spending half of a million just to have a team that then can manage something for you while instead, I would say that in our situation was perfect, because we did have that ability. So it was pretty an easy choice to go for something managed again.

Domenico Fioravanti: (36:14)

So that again, after doing this investigation, let's say that I would say that mainly even the cost factor helped us deciding because the two solutions were similar in a sense, but the cost on the cost side, especially talking about resources, the Confluent Cloud was the best choice for us. And also the dedicated one because of some of the limitation of the standard clusters. So we wanted to get the cluster. So, that's why we went for Confluent Cloud dedicated cluster. And we were able in some minutes, as soon as we decided to have a production environment, test environment, cluster are up and running, and we were able to start producing and consuming pretty quickly. So again, we invested a lot, I have to say we invested a lot, probably a month on this decision-

Kris Jenkins: (37:00)

Okay.

Domenico Fioravanti: (37:00)

... because of the importance of it. So it was a lot. I mean, again, and you can imagine platform people, maybe they push for something they want to manage. So it's always try to find a balance and try to do again, try to do always the right choice again. I have a technical background. I have to say, I'm not a fan on any technology, even though I like Kafka, clearly, that's probably very clear, but I think this shouldn't influence our decision. Our decision has to take into consideration all the factors and we should do always the right choice for the people that are paying us, not for us or because we like the technology.

Domenico Fioravanti: (37:35)

I think this is fundamental. So that's why we invested a lot on this decision. And yeah, I think after now, I think we started with Confluent Cloud mid-2021, and now definitely we're pretty happy of the choice because the cluster is there up and running. We have never a problem. We don't have to worry about the dates and all the rest. And what we do is just we produce and consume. Now we have our Kubernetes cluster where we run our Kafka Streams and Kafka Connect application. So cluster is .... so I have to say that I'm very happy of the choice.

Kris Jenkins: (38:11)

Good, good. Infrastructure's one of those things that you want to just not trouble you, right? It's like, you don't want to manage your own water pipes. You don't want to have to think about how water gets into your house. You just want to use the water.

Domenico Fioravanti: (38:27)

Yes. Correct. So sometimes again, and I've been there in the past, sometimes we try to do the things that are more interesting, right? And again, like in this case, probably running our own cluster would've been more interesting, but then you have to ... the reality is something different, right? In this case, why are we reinventing the wheel? Why we need to build in-house knowledge and hire people that are just dedicated to run something while the most complex stuff is that it's understanding our domain and build in the domain. So deliver value in that domain. Our focus is not to learn how to run a Kafka cluster. Our focus is how to build front of house software, call center software, how to do dedicated reporting, real time reporting, how to apply machine learning to our data in order to get insights.

Domenico Fioravanti: (39:15)

So that's the focus we should have. And again in a sense, I would say that we are lucky that in our situation we were kind of forced because we were forced to look at the reality of the things like again, and also we are not even though Therapie Medical Group is a very successful company, but we are not Facebook. We're not Google. We don't have huge amount of money that we can spend, which makes you a better engineer, I would say, because then you need to take into consideration money in all the equation you do. And I think that's important.

Kris Jenkins: (39:51)

Yeah. Weirdly some of the worst companies I've worked for have too much money and it results in a complete lack of focus. I mean, it sounds like a nice problem to have, but sometimes it's not. Sometimes a few constraints actually make you do more valuable things because that's all you can do. You only have the choice to do something useful. You don't have the luxury of doing everything.

Domenico Fioravanti: (40:17)

Exactly. Exactly.

Kris Jenkins: (40:20)

Not that I ever complain about them having too much money to give me a raise, but-

Domenico Fioravanti: (40:23)

It's good to have the right money I mean, let's say.

Kris Jenkins: (40:28)

Yeah, the right amount and the right technology and just, it's all about fine-tuning those decisions. So there you are. You've got your Kafka cluster up and running. I'm assuming, knowing you by now, you didn't do a big bang migration?

Domenico Fioravanti: (40:46)

No, obviously.

Kris Jenkins: (40:48)

What did you do?

Domenico Fioravanti: (40:49)

No. You know me very well. No, no, no. We proceeded by step as usual, right? Because again, we were in the meantime building, hiring more people, upskilling people, a lot, something we didn't say I mean, invest, please invest in learning and development all the time. In Therapie we have a budget of, I can say it is not fancy, 1,500 per person per year. And I definitely believe this is another of the key factor. Make sure that not only you create an environment that it's good to work, but make sure that your engineers or any people, I mean are always upskilled. My mantra is that all the engineers that work for you should be able to go tomorrow on the market and find three, four jobs in a week. But then you have to make sure to build a place of work where they don't go out and look for something else because they like it, so that's definitely important.

Domenico Fioravanti: (41:37)

So again, we continued investing in upskilling the people to onboard, for example, in this case, Kafka knowledge and all the Kafka environment let's say knowledge. So what we did going back to that example, which is only one of the things we did, so the data pipeline, what we did is that, okay, started from that pipeline. We knew that in the backlog we had this tech debt to remove and we wanted to move the pipeline into Kafka. So what we did is that we still continued using GraphQL to interact with the third party API, schedule Lambda to do the mining of the data because that's definitely an easy choice, but then instead of using S3 and another Lambda with the Lambda mining the data is now publishing into a Kafka topic.

Kris Jenkins: (42:19)

Right.

Domenico Fioravanti: (42:19)

The raw data into a Kafka topic. So this is the extract part, right? And then we added the Kafka Streams application to do the transform. So consuming from the raw data in this case transaction data into a transforming that and publishing in other topics where the transformed data. And then we had a Kafka Redshift sync connector to sync into Redshift so to do the load part of the thing. So we removed most of the dedicated pipeline to instead publish all the data into topics. And then we used a sync connector for the end goal, which is syncing on Redshift which is one of the use case. But then by introducing Kafka, the data finally can be mined only once from external system, but made available through topics to be consumed by any other team for any other use case, right.

Kris Jenkins: (43:19)

Right.

Domenico Fioravanti: (43:19)

Again, without depending on a certain team or-

Kris Jenkins: (43:23)

You've kept the same two ends, but now you've got this Kafka pipeline in the middle that opens up all these other readers.

Domenico Fioravanti: (43:30)

Correct. And here now we go back to the example I was telling you before. I mean, and that's the only one of the example like now one team, a different one from the one that built the pipeline was able to consume transaction data to feed Facebook KPI again, close to real time, let's say one hour at worst after the transaction happened. And we were able at this point to start building focused custom marketing campaign on Facebook by again, saving lots of money because we were able to target better.

Domenico Fioravanti: (44:01)

And this is only one of the use case, right? So I can tell you more than one because we knew that was the moment where we really Kafka demonstrated to be a real enabler because there was, for example, another team working on the booking system, they were able to consume appointments almost real time to be able, for example, again, while the appointments and the booking system was still an external system, third party, that we are replacing at the moment, we still don't own it, but by mining appointments and putting them on Kafka, that team was able to consume them and do almost real time actions, trigger action, which could be message confirmation, schedule an appointment reminder on different channels to drastically reduce the no show rate for example.

Domenico Fioravanti: (44:50)

So again, even though we still don't own the booking system that we are still rewriting in-house, but by mining data and making them free in Kafka, we're able to already add value because no-show is in a company like ours, where we have appointments in the clinics, no-show is like ... I don't know the numbers, but I know it's a lot. So being able, and we were dependent on the third party booking system which has limited functionality in order to reduce no-shows. But because now we have the appointments which are on Kafka, we can define any kind of behavior so again, we can schedule what's our messages. We can send email, we can send SMS. We can trigger a scheduled message one day before the appointment to make sure that there is a reminder and all the rest. This is another scenario, for example.

Kris Jenkins: (45:44)

Yeah.

Domenico Fioravanti: (45:44)

And I'll tell you another scenario that in the meantime, there was another scenario we didn't even know at the beginning. So the company, because of the big scale, there was the scale that was becoming wanted to decided to introduce an enterprise ERP system, which was definitely a blocker for our growth, lots of problem with them. If you have an ERP or you feed transaction manually, or you have to feed automatically, right? So, and in our case, obviously the tech team was involved to build something that was able to feed the ERP with real time transactions without having manual effort, to do manual effort. And there was an additional use case that was not known when we started, because this was not in the plan.

Domenico Fioravanti: (46:27)

But again, going back to what we said before, the plan changed. The decision was made. The tech team was there and not only, but also the data was there because we have, by having already the transaction, sales transaction of Kafka, the team working on the ERP integration now is able to consume, transform on the format that the ERP needs, obviously it's a sub ERP so you can imagine SOAP protocol. So definitely we need to do a transformation. But the good thing is that this team can consume a transaction almost real time and feed the ERP automatically.

Kris Jenkins: (47:05)

Right.

Domenico Fioravanti: (47:05)

So again, this was a scenario not known at the beginning, but all the work we did without knowing it, but definitely we expected that was enabling that use case and we didn't have to implement another dedicated pipeline in order to work on this new scenario.

Kris Jenkins: (47:25)

Yeah. So it sounds like the unifying thing for both of those is if you can build these features, you've got the people able to build the features, but there's no way of doing that until you actually own the data.

Domenico Fioravanti: (47:38)

Yes.

Kris Jenkins: (47:39)

Yeah.

Domenico Fioravanti: (47:39)

I think that's a key point of this, right? So we are in a journey of replacing external software by rebuilding in-house, right. So, and again, going back to what we said at the beginning, one option would've been, "Okay, guys, leave us alone for a year or two and then we are going to come back with the software that delivered," but I mean, that's a nonsense, right? So what we're doing now instead is first of all, take ownership of the data, which is, I believe, the secret, right? If you own the data, even if you don't produce them, at the moment we don't produce them, but we are freeing them.

Domenico Fioravanti: (48:14)

We can already deliver value and unblock the company in doing different things. In the meantime, we have five teams. Now we have five development teams. We are rebuilding the software in-house, right. And let me say also something else, Kafka in this case, and Confluent, is not only enabling us different use cases, but probably you've been there before, right. So building software from scratch, it's easier than migrating from an existing software to another one, and rebuilding in-house.

Kris Jenkins: (48:43)

Yeah.

Domenico Fioravanti: (48:43)

Right. Because there is a work of having those two software in production at the same time and migrate the data and all this has to happen without any impact on customer internal and external customers we know, right?

Kris Jenkins: (48:56)

Yeah. It's like a Hollywood film where someone's trying to jump from one bus to another as they're both moving along the highway.

Domenico Fioravanti: (49:08)

Yes. That's a good metaphor so it's easier to build software from scratch from zero, ground up, start up, then replace something, especially if you don't own it. We're not replacing an internal monolith that we own and we have control of the monolith so we can do some tricks, dual rights and things like that. We are replacing a software we don't own. So Kafka, and again, this was in our mind since the beginning, the fact that Kafka is there and now we ... even if we don't produce the data, but we have, we freed them, we can consume them, Kafka I know that it will help us a lot in the phase of migration from one system to the other, right. Because what we are doing now is that we are building the new system.

Domenico Fioravanti: (49:47)

So we're replacing the booking system, front of house software, the eCommerce platform. And we know that there will be a moment where we'll have to do kind of dual rights, right. So right to the old system intercept right to the new system. Do a validation and then there will be a moment where we will shift the master of record from the old system to the new system we've built, and Kafka in the middle, I mean, Kafka topics and connectors will help us in this process 100%, right? Because you can intercept event and have them real time feeding your system. Vice versa, you can be the one master of record and then send to still feed the old system for a certain period of time. So again, not only enablers of features, but definitely having an architecture like that will help us in the part of replacing those external system.

Kris Jenkins: (50:40)

Yeah. And you've massively broken down the size of migration by freeing the data first and then worrying about what to do with it.

Domenico Fioravanti: (50:49)

Correct. Correct. And there is also another step that we need after that. So again, as I said, we need to replace those systems. So there is a moment where you need to become the master of records, right? So we need to become the source of truth, right? If you read book about software architecture, migrating from Monolith to Microservices, there is always this kind of dual write moment where we write into system. And then there is a moment where you switch which is of the two the master of record, where do you read from, so you do dual right and you read from the old system when everything is fine, is validated, your internal system becomes the source of truth, right. And that's a key moment.

Domenico Fioravanti: (51:30)

That's why the next step again, in this example that I'm using, one of the many, what we did is that we added another component to the pipeline, which is in this case, DynamoDB. I mean it could have been any other database, in our case obviously we are an AWS so clearly we used Dynamo. So we have two problems. Well, so one is that now we are mining data from external system. And by mining data, we call APIs that gives us information, right? Those information are not tracking deletes, right? Because you say, you just say, "Oh, give me all the appointments from this date to this date." And you get the appointments. What happens when appointment gets deleted?

Domenico Fioravanti: (52:12)

And again, going back to the fact that you have to schedule a reminder and all that, you need to know if an appointment is deleted. So these APIs was not giving us deletion, right. So what we did is that we introduced Dynamo so that we have an internal master of record that we can compare to. So what we do now is that the Lambda, schedule Lambda, mining the data is no longer publishing directly to Kafka, but it's writing in DynamoDB.

Kris Jenkins: (52:37)

Okay.

Domenico Fioravanti: (52:39)

And then we choosing DynamoDB because there is Dynamo streams, which is like part of Dynamo, one of the Dynamo features that you can enable and will do kind of change that capture for you, right?

Kris Jenkins: (52:53)

Oh, okay.

Domenico Fioravanti: (52:53)

Publishing the screen. So that what happens is that as soon as when we mine data, we compare with what we have in Dynamo and we can understand updates and deletion as well. So that on Kafka, instead of publishing the raw data, we can publish events. So what happens? So create new appointment, updated appointment, deleted appointment, and so on and so forth. So that's why we needed to introduce this middle layer, which is definitely very interesting because now we have the ability of understanding what's happening, right? So through using a change that captures system, which is a very common approach when you use Kafka.

Domenico Fioravanti: (53:32)

I mean, transforming the state instead to ... I mean, doing event sourcing is essentially transforming the fact that you have a state, so losing a lot of information instead, and moving to the events that will bring you back to that state. So that's why we did this decision. And again, I think it's another very successful partner because we, first of all, we now have much more information that we can react to like deletion event, for example. But also now we have also master of records inside the company, even though we are not the producer. So we have a DB where we have all the appointments, for example. And again, this will help us in the path of replacing the booking system.

Kris Jenkins: (54:18)

You end up knowing more than your third party provider knows about your business, because you've captured more of the information, right?

Domenico Fioravanti: (54:25)

Correct. Correct. Exactly. Yeah. I mean the third party system, for some reason, for historical reason, they don't track deletion. So they don't have this information while now we have, so we actually have more data than they have, which I think it's pretty-

Kris Jenkins: (54:37)

It's funny. It's one of my favorite use cases of event sourcing where you capture what the customer tried to do and failed. Instead of just throwing that information away, it's an absolute goldmine of where your business is almost making the sale, right?

Domenico Fioravanti: (54:54)

Yes. Exactly. 100%, but imagine, I'll give you another use case, which is very important, for example, appointment is definitely a typical scenario, but transaction, right. We are mining transaction from this system and you expect that the transaction is mutable, right?

Kris Jenkins: (55:10)

Yeah.

Domenico Fioravanti: (55:10)

A sales transaction. So if you have to give back money to your customer, you do a refund. But the system you're using has a special feature which can be unlocked by a manager that can modify a transaction, a sales transaction, something by definition immutable so ideally we have a transaction, we read 100 ... Kris spent $100 and then manager goes there and changes 100 to 80. So something that by definition is immutable and again that you should use to feed the ERP, in this case because we don't have control, it's not our software, could mutate.

Kris Jenkins: (55:46)

Yeah.

Domenico Fioravanti: (55:47)

That's where the DynamoDB, the first implementation with Dynamo came from, because while mining the data, we discovered that there was a discrepancy between our reporting and the third parties of the reporting. And then we discovered that some managers that were, for some reason, I mean, I'm not saying that there is not a right reason, but they were doing an error. They were just, instead of doing a refund, they were modifying the transaction, just to be quicker maybe. Right?

Kris Jenkins: (56:10)

Yeah, yeah.

Domenico Fioravanti: (56:11)

But this then happened that instead of having the right transaction in our system, we had the wrong one, right. So that's why the need of having a master of record to be able to compare to. And then as soon as we introduced Dynamo, we were able to track an updated transaction event and produce it through Dynamo streams into Kafka. So then we were able to update the ... whatever, the end consumer of this information. So definitely yes. I mean, this is a pattern, which I think it's no value. It gives you so much ... as you were saying, you have more insight than the place where that is generated.

Kris Jenkins: (56:52)

Yeah. Yeah. That kind of insight. It's something you only realize is valuable once you can actually see it, right. There was one thing going back to Conway's Law, there was one thing I wanted to know. As you've changed to ... you've got the system now where you're splitting up producers and consumers, and you can separately worry about getting the data into Kafka and how you read it later, has that kind of Conway's Law changed the way you structure your teams at all?

Domenico Fioravanti: (57:21)

So I think the Conway's Law has always been in my mind, because again, like, so back to what I was saying before. So in order to have teams that can move quickly and deliver quickly, they need to have few dependencies, right? So you cannot have ... they cannot depend on another team if they have to test something or if they have to deploy production, or if they, again if they have to consume something. They cannot wait or ask, "Oh, can you expose an API for me because I need to do that." And that's very common even in old approach in microservice architecture where request a response was the pattern, you know, synchronously consuming APIs was a thing, right. It was very common, right. But I've been there and I know that then the team was waiting for another team to expose some information if they weren't there, right.

Domenico Fioravanti: (58:19)

Or not only that, but the team that is exposing and owns the API needs to know that you're consuming because then they have to scale. Because if you have 10 teams consuming the same API, or you're bombing the API, they need to be able to know that and to scale, right? So this creates dependencies between teams. So, and again, like Conway's Law says, I mean, you need more communication channels. You need more meeting. You need the team waiting for the other team. And everything moves slower. Kafka instead is an example where really the design of your architecture matches the communication channels and how many communication or dependencies that are between teams. Because again, in this case where if you expose an information, you can be the producer or in this case, mining it and producing it, but the teams that have to consume it, they don't have to go back to the owner team to tell them, "Oh, can you ... I'm consumed, be aware."

Domenico Fioravanti: (59:16)

I mean, there are teams that they don't know I mean that other team are consuming because I mean, all the information are in the topic and then everything consumes what they need and transform them in their own view of the data, right. So they don't produce it. They create their own view of the data, of what they're interested in. So again, definitely the Conway's Law has been always in my mind since day zero. Because of this, I mean, this maniac idea of that ... because again, I've been in places where it was so hard to deliver because of all those blockers, and this create an environment where first of all, I mean, the company, it's not good for the company at all, because you're very, very slow in delivering, but also it's not good for the people.

Domenico Fioravanti: (01:00:04)

It's not a place where you like to work, right. And then you risk to ... you have a high retention problem because people ... I mean engineers, if they don't like what they do, they leave. It's like that. I mean we are lucky because we're in a world where we have a lot of options. We have a lot of choices. So I think it's important to make sure, I mean, first of all, we need to do things which are good for the company, but things, the way of working that is good for the company is also a good way of working for the engineer. Because if you're very close to production, if you're able to deliver quickly, if you're able to have an impact and you can see the impact on the company, that's the way most engineers want to work in my opinion. I want to work like that. So I know that most of the engineer, they want to work in this way.

Kris Jenkins: (01:00:49)

Yes. One of the great joys of programming is when you actually put what you've built into the hands of someone who isn't technical and they're using it, and they're like, "This is changing the way I live and work."

Domenico Fioravanti: (01:01:00)

Yes. Yes.

Kris Jenkins: (01:01:03)

That's something you want as a programmer, which you can't define in technology.

Domenico Fioravanti: (01:01:07)

Yes. Correct. Correct. Yeah. So that's why going back to my idea, that was, for me, it was like a mantra. Be able to build something like that, because this would've made also people happier. Again, going back to what I was saying before, we are not Google, we're not Microsoft. For us, we need to invest in being able to retain people, first of all offering them an environment that where is nice to work which in my opinion is the secret. Money is important for sure. You have to pay people the right amount, but also offering them a place where they can learn, evolve, have an impact, I think it's fundamental together with salary to be able to hire people and retain people. And again, in this case, the fact that you don't depend on three, four teams, whenever you have to go to production, right?

Domenico Fioravanti: (01:01:56)

You don't have to wait or hand over your work to someone else which will deploy for you nighttime, maybe and maybe the only a moment where you get pinged again is when there is a problem in production, right? And you get pinged for something that was deployed together with some other changes of someone else. And you don't know what to do. What you want to do is that being the one deployed production, being the one fixing monitoring what's happening and eventually fixing the work you've done, that's the best way of working for an engineer. And again, it matches the best way of working which gives more result to the company.

Kris Jenkins: (01:02:33)

Yeah. Yeah. No one wants to be woken up at three in the morning, but if you're deploying the software that risks waking you up at three in the morning, you've got that feedback loop where you are making sure it's high quality software.

Domenico Fioravanti: (01:02:46)

Correct. Correct. That's exactly the point, right. I believe that if you are the one deploying, the quality of your work would be much more because you know. I've been there, I've been there in the ... if you have to hand over your work to someone else, what's the point, you say. I mean, my commit will go to production with probably under 20, 30 commits. And they probably break my code. Who knows? That's what was happening with monolith work, right. So two or three, four or five teams writing in the same repo and then QA team testing it and the platform team deploying it so like, and I was there. I remember my work was not quality work at that point because I didn't care. I really didn't care, right. When instead when I worked in place, yep.

Kris Jenkins: (01:03:33)

Yeah. I mean, you can get lazy when the problems are somebody else's but also, if you don't get lazy and you want to keep your software high quality, you simply don't have the feedback you need to know what high quality software is.

Domenico Fioravanti: (01:03:48)

That's true.

Kris Jenkins: (01:03:49)

And again, it boils down to ... so much of this boils down to tightening our feedback loops.

Domenico Fioravanti: (01:03:55)

Yes. Correct. So in this case, not having the idea of what's happening, I mean, of the impact of your work into production. I mean, it's very hard to understand if you're doing well or not. In our case I mean, every team has ownership of the feature toggles, AB testing. So everything is inside the team. Whenever you go to prod, there is a part of the team which is then monitoring what's happening, conversion rates are changing, what's the impact of your change. And then there is a quick feedback look inside the team saying, "Okay, this change looks like it has a good input. Let's open the feature toggle to the entire traffic."

Domenico Fioravanti: (01:04:31)

This has to happen inside the same thing. I mean, that's the only way of doing this quickly. I mean, the knowledge is all there. Why should you delegate someone else outside? And again, that's why I think it's fundamental, and in this case, Kafka building an architecture that allows you to build teams in this format, right. And vice versa, again, Conway's Law is bidirectional, right. So if you move around something here, you will have an impact here or vice versa.

Kris Jenkins: (01:04:57)

Yeah. Yeah. Do you know, I could talk about this all day, but maybe we should try and pull this together and say what's next on your roadmap? Where do you take the story from here?

Domenico Fioravanti: (01:05:13)

I mentioned it before. Yeah. I mentioned it before. So distributed data mesh paradigm shift is one of the things that is always in my mind. I mean, I know it's been mentioned many times. Zamak is like ... it's like I follow there. I did some trainings as well. I mean, I'm not a data person. I don't have background in data, but definitely, I know that's the key of the future. That's an area, that's so ... I mean, I always say software is important, but without data it's nothing. What data alone they're already an amazing value. So that's definitely, that's the area where you should invest. And again, I've been there before. I've been working in places where data was a separate silo.

Domenico Fioravanti: (01:05:59)

So all the software teams were producing software, producing data and then throwing them to a data warehouse or then a data lake, and there were different evolutions. And there was a separate team that are synchronously analyzing, manipulating and transforming the data and giving back then feedback, right. And again, this is too slow. It's not enough. It's not enough.

Kris Jenkins: (01:06:21)

Yeah.

Domenico Fioravanti: (01:06:22)

Especially now that, I mean, even machine learning when I studied it 25 years ago was, I mean, something patch, very slow, very separate. Machine learning model now they can be trained and used almost real time, some of them, right?

Kris Jenkins: (01:06:37)

Yeah.

Domenico Fioravanti: (01:06:38)

So saying that this is all a separate silo I think it's just wrong. Right? So what we're trying to do here, we are trying to involve in the, again, streamline team, also data people. At the moment, data analysts, but next step is to start hiring again. Now that we own, we control the data, start hiring data engineer and data scientists, but put them inside the team. Because again, if you're able to start investigating and using the data as soon as the data is produced, you'll be quicker than your competitor. That's a no-brainer to this.

Kris Jenkins: (01:07:14)

Yeah.

Domenico Fioravanti: (01:07:14)

So, and also of the data, part of the data team is inside the streamline team, this means that they will know the domain already while instead what happened in the past is that every team was producing data, throwing them to the data team and then they had to pass the knowledge of the domain all the time. If there is a new table, if there are new fields, new entities, you need to pass the knowledge all the time to a team that has to have the knowledge of everything, which is impossible, right? Cognitive load is limited. Well, instead, if you have a distributed data match in your architecture, baked by maybe Kafka, right? And then your team, again, come back to Conway's Law matches this design by having data people inside the delivery team, you'll be able to have distributed data match. And not only that, but you'll be able to give value to the data almost real time instead of having to wait, right. So this is definitely the next step. And I mean, Kafka and Confluent will be partner and part of it.

Kris Jenkins: (01:08:18)

Well, I'm glad we can be. And I hope we have you back on the show in a year or two, and you can give us an update.

Domenico Fioravanti: (01:08:25)

Oh, for sure. For sure. We can plan it. In a year I'm sure we'll have a lot more to say.

Kris Jenkins: (01:08:31)

I get the feeling you'll accomplish a lot in the coming year.

Domenico Fioravanti: (01:08:35)

Thanks.

Kris Jenkins: (01:08:36)

Domenico, thank you so much for talking to us today. It's been a real pleasure.

Domenico Fioravanti: (01:08:40)

Thank you very much, Kris. It been a pleasure for me. Thanks. Thanks for inviting me.

Kris Jenkins: (01:08:44)

Cheer. Well, there you are. You know, that kind of business transformation, it's hell of a magic trick. I can see how it's done, but I'm still impressed. You have to somehow free up your data and then it ends up serving you in ways you don't even expect when you get started, and all while you've got this balancing act of having a long-term plan, but unless you're delivering something worthwhile today, there won't be a long-term to plan for. I've got to admit, I kind of envy Dom. I think that's a heck of a journey to be on. And I really hope he comes back in a year or so and gives us an update.

Kris Jenkins: (01:09:23)

But before we go, let me remind you that if you want to learn more about event-driven architectures, we'll teach you everything we know at Confluent Developer. That's developer.confluent.io. If you're a beginner we have getting started guides and if you want to learn more, there are blog posts, recipes, in-depth courses. And if you take one of those courses, you can follow along easily by setting up a Confluent cloud account. If you do, make sure you register with the code PODCAST100, and we'll give you $100 of extra free credit.

Kris Jenkins: (01:09:54)

If you liked today's episode then please give it a like, or a star, or a share, or whatever buttons you have on your user interface. And if you have a comment or want to get in touch, please do. If you're listening to this, you'll find our contact details in the show notes. And if you're watching, there are links in the description and the comment box down there so you can use that. Find me on Twitter if you want to talk to me directly. And with that, it just remains for me to give huge thanks to Domenico Fioravanti for joining us and you for listening. I've been your host, Kris Jenkins. I'll catch you next time.

Scaling Apache Kafka® can be tricky, let alone scaling a team. When he was first hired, Domenico Fioravanti of Therapie Clinic was given the challenging task of assembling a sizable tech team from scratch, while simultaneously building a scalable and decoupled architecture from the ground up. In addition, he wanted to deliver value to the company from day one. One way that Domenico ultimately accomplished these goals was by focusing on managed solutions in order to avoid large investments in engineering know-how. Another way was to deliver quickly to production by using the existing knowledge of his team.

Domenico's biggest initial priority was to make a real-time reporting dashboard that collated data generated by third-party systems, such as call centers and front-of-house software solutions that managed bookings and transactions. (Before Domenico's arrival, all reporting had been done by aggregating data from different sources through an expensive, manual, error-prone, and slow process—which tended to result in late and incomplete insights.)

Establishing an initial stack with AWS and a BI/analytics tool only took a month and required minimal DevOps resources, but Domenico's team ended up wanting to leverage their efforts to free up third-party data for more than just the reporting/data insights use case.

So they began considering Apache Kafka® as a central repository for their data. For Kafka itself, they investigated Amazon MSK vs. Confluent, carefully weighing setup and time costs, maintenance costs, limitations, security, availability, risks, migration costs, Kafka updates frequency, observability, and errors and troubleshooting needs.

Domenico's team settled on Confluent Cloud and built the following stack:

  • AWS AppSync, a managed GraphQL layer to interact with and abstract third-party APIs (data sources)
  • AWS Lambdas for extracting data and producing to Kafka topics
  • Kafka topics for the raw as well as transformed data
  • Kafka Streams for data transformation
  • Kafka Redshift sink connector for loading data
  • ​​AWS Redshift as the destination cloud data warehouse 
  • Looker for business intelligence and big data analytics 

This stack allowed the company's data to be consumed by multiple teams in a scalable way. Eventually, DynamoDB was added and by the end of a year, along with a scalable architecture, Domenico had successfully grown his staff to 45 members on six teams.

EPISODE LINKS

Continue Listening

Episode 208April 12, 2022 | 10 min

Confluent Platform 7.1: New Features + Updates

Confluent Platform 7.1 expands upon its already innovative features, adding improvements in key areas that benefit data consistency, allow for increased speed and scale, and enhance resilience and reliability. Following the standard for every Confluent release, Confluent Platform 7.1 is built on the most recent version of Apache Kafka 3.1, including KIP-768: extend SASL/OAUTHBEARER with support for OIDC, KIP-773: Differentiate consistently metric latency measured in mills and nanos, as well as KIP-775: Custom partitioners in foreign-key-joins.

Episode 209April 13, 2022 | 38 min

Monitoring Extreme-Scale Apache Kafka Using eBPF at New Relic

New Relic runs one of the larger Apache Kafka installations in the world, ingesting circa 125 petabytes a month, or approximately three billion data points per minute. Anton Rodriguez is the chief architect of the system, responsible for hundreds of clusters and thousands of clients, some of them implemented in non-standard technologies. In addition to the large volume of servers, he works with many teams, which must all work together when issues arise.

Episode 210April 21, 2022 | 51 min

Using Event-Driven Design with Apache Kafka Streaming Applications ft. Bobby Calderwood

In this episode, Bobby Calderwood, founder of Evident Systems and creator of oNote explains event modeling—a converse approach to the reductive data model system. Event model system is enabled by tools like Apache Kafka, which effectively saves every bit of activity generated by the data system.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free