You don't have to work very long in this industry before you encounter a system you wish you could rewrite, or at the very least overhaul significantly. It's not usually a bad system when that happens, the bad ones don't survive long enough to reach that point. More often than not, it's a system that solved a real problem. And so the problem grows and the need grows and the solution space grows, and eventually, it all starts to outgrow implementation number one. And when that happens, you've got some hard questions to answer, you've got some hard decisions to make.
Now, I hope our industry is getting wise enough to realize that big bang rewrites usually fail. But if you don't do the big bang rewrite, then how do you evolve piece by piece? There is no definitive answer to that. But there's a heck of a lot to be learned from the people who are in the process of answering it, especially if they're doing so successfully.
So today on Streaming Audio, we are joined by Jean-Francois Garet, who's a technical architect at a company called Symphony, and he's going to be telling us about his approach to the growing pains of a successful but straining monolith, his thoughts on systems architecture, and how you communicate those thoughts and architecture to the rest of the company. Because sometimes that's the hardest part, how do you get people to come along for the ride when you've got good ideas? That can make or break a project. Before we get into the details, Streaming Audio is brought to you by Confluent Developer, which is our education site for event systems and Apache Kafka. It's a treasure trove of blog entries, code samples, hands-on courses, and more. So check it out at developer.confluent.io. And if you end up taking one of those hands-on courses, you're going to need a Kafka cluster. The easiest way to spin up one of those is with Confluent Cloud. You can be up and running in minutes, and if you add the promo code PODCAST100 to your account, you'll get $100 of extra free credit.
So with that said, let's take a peek into the realities of being a systems architect. One with solid ideas and genuine problems. My guest today is Jean-Francois Garet, welcome to the show.
Hello, a pleasure to be here with you.
Good to have you. So I must admit, I've woken up feeling reflective today. So before we get into the meat of the podcast, how did you get into programming?
So actually, I started programming as I started my career. During my education, during college, what I did was mostly focused on networking and I would say computer security. And while through that, I was introduced to programming but more to build tools around what we were studying and to do some, I would say, modernization of... I don't know, protocols and things like that. And in the end, I got into programming doing my master thesis. In France we have a degree where we go to [inaudible 00:03:16], we do a six-month internship between company where we did the master thesis.
And that's where I did more programming, doing some stuff inside the Linux channel and things like that. [inaudible 00:03:27] Yeah. Well, it was simple things. You just start getting knowledge into how you build it, how you modify just very tiny things here and there. And then I started my first job, which was for a company that was doing software development at that point in time, I started working in 2009. So it was just after the 2008 crisis, so the job market was not very great at that point in time.
Yeah, yeah.
So I took a job position which was about programming, and I actually happen to like it a lot then. What I was in during my studies was more formal mathematical stuff, and I was not so much into that when it came to pure programming. And then when I started doing actual programming and discovering that it's more a little bit like solving small puzzles every day, that's what I liked in the end.
Yeah. I think a lot of us come to it with a "Kids who solved puzzle books.", mindset. Yeah, that is where a lot gets lots of fun. I can see now the seeds of where what you're going to tell us about today comes in. If you've got a background in networking and kernels and infrastructure things. But so you currently work at a company called Symphony, right?
Correct.
And they are something in technology for finance, tell us about that.
Yes. So we are a collaboration platform for the financial services industry. So what we build is a chat platform, but that has well real-time communication, and collaboration capabilities. But that we want to make sure we can beat workflows on top of. We are not just here to help people chat as they would do without their chat platforms, but we want to make sure that we can help them solve their daily problems, and their daily problems, or most of the time related to workflows. So how can we streamline the efficiency of those workflows, how can we bring automation into those workflows through bots or other types of integration? That's really what we try to do.
Give me an example. So someone presumably on a trading floor at a bank?
There can be many ways that we can do it, one example I can take is freelancers more in the post-trade scenario. So you have to trade USD, they create on the price, they were on the phone call, they say, "Yeah, I want to buy this amount of this stock." for instance, at this price. Then comes the time where you have... These instructions are given to the teams that would actually carry out the trade, there was a formal agreement that is going to happen now it needs to happen, there need to be assets exchanged, there need to be verifications about the quantity, the price and so on. And so that's when you can start filling expenses where there can be exceptions or breaks in the execution of the trains.
And you have different people that may be from the same company as the trader or not, because there you may have other parties that get involved to actually execute the trade that may happen in a different region where the trader is located on the different stock exchange and on. And so you need to make sure that you can both convey the context of that conversation, while also connecting the people that may not know each other, and may not have the full context so that in the end they're able to provide a satisfactory answer quickly to this problem.
Okay. Yeah, I can see why that would be something where you want an interactive chat that was fast. But also, Slack isn't going to quite cut it for something like that.
Well, you need something that should be more specialized for the financial industry to do that. And you need to start thinking of modelizing what's a trade operation, how you can convey that context, how you can make sure that this is also potentially very sensitive information that you have in the trade and that's not necessarily public knowledge so you need to have strong guarantees in terms of security encryption of the data. This data should not be shared with us even, for instance, that's why we set up end-to-end encryption.
And also you need to have compliance on top of it, because any trade operation that's happening may be subject to investigations later on, so you need to make sure that you have archiving of those conversations and so on. And a lot of checks about who is actually able to talk with anyone, things like that.
I also see hints of that, of why you want an immutable log of everything that happened. I may be getting ahead of myself. In internet years it's a fairly old company, right? It's about eight years old.
Yeah, it's about eight years old. It was funded in 2014. At its inception it was [inaudible 00:08:07] through a conglomerate of banks, actors from the financial industries that got together, recognize that they had a need for which there was no solution in the market for them at this point in time, and that's how they, they started Symphony.
Okay. That already seems like something of a miracle, my experience of banks is they can barely agree with themselves some days.
Yeah. I guess there were some doubts at the beginning, about whether it was going to be successful or not. But well, we're here, we have managed to grow outside of this core group of banks that funded Symphony. We now have more than 1,000 customers worldwide, and we have about 500,000 users. A bit more than 500,000 users on our platform.
Oh wow, this is some very serious scale then, in terms of users. Cool. Okay, so this is... I know you have a transformation story for me. So what was the original version of the software like when you got there?
Yeah. So as I mentioned, it was formed by a conglomerate of banks. One of the banks did hand out a piece of software that they had built internally to do the communications for them, that we inherited. So it means that the software we started operating on is even older than 2014, probably 2010 or 2012, I don't know exactly. Well because we were a startup up at that point in time, now I think no more in this scale up stage, but still the main focus was how can we get the return of our investment about this asset? How can we make sure that we can deploy it to more and more customers? And so, this thing that was built to be single tenants, so meant to be used by a single company, we took it, we enhanced it, I did more capabilities, but we also kept this model of having a single tenant architecture but we are deploying dedicated instances for each of our customers. Which also fits nicely with their concerns around security and compliance.
And well, we reached the point in time where we have 500 different instances for those, and it starts being hard to manage operationally speaking, having the right level of service to all of those customers. And knowing that we have customers of varying sizes, we have very big customers which will bring tens of thousands of users on the platform, and we will have smaller ones that with bring 50, 100 users on the platform, and they don't necessarily have the same needs when it comes to operations to scale and so on.
That makes sense.
The model that was coming from a big bank was more tailored for our tier one customer and proved to be costly for our tier two or tier three customers so that's when we start thinking about how can we do things in a multitenant way. And how can we break this component that we had, which was a monolith in the end, that was doing multiple functions at the same time? How can we break down into microservices so that we can better manage those services, and better adapt what we need to adapt when come to scale resiliency or patterns of usage, based on the need and based on the actual load for this function that is handled by the microservice.
Right. I can see the motivation to start refactoring that, but where do you even begin? That's nebulous.
Yes, that's nebulous. And that's one of the reasons why there were several trials and errors, and not everything went smoothly. So well, the first thing was to try to say, "Well, can we do multi from a pure infrastructure level?" Are there some things that we can co-host or not? Can we co-host, for instance, the data stores only while keeping the computing more in the single tenant fashion?
There were different groups that were brought in, this project has been going on for something like four years now. So there were, as you can assume, different iterations. We didn't get the outcomes that we were expecting at the timeliness that we're expecting, so it's been going on. There was intense to stop building everything as a microservice, and then people realized that we can't stop delivering function.
You can't shut down development while you refactored the whole thing, right?
Yeah. Yeah. Definitely. And so that's where, I would say, different ways to try to go through this. And now we think that we have a good handle on how we are going to complete all of that, hopefully before next year.
So what was your approach that's finally starting to work?
So one of the things that we came to realize is, to build microservices you also need to start understanding what are the level of isolation in which you want to put your microservices.
You don't want to end up having a distributed monolith where everything is coupled and you cannot do independent [inaudible 00:13:05] of your microservices, you cannot easily limit the impact when there is an incident happening on a service, and all of those things.
Yeah, yeah. The classic if you've got... If every microservice depends on several others entirely, and one can bring the whole thing crashing down, it isn't really a microservices architecture.
It definitely is not. And you get wrong... The worst things coming from the monolith, and the worst things coming from the microservices, and you are just [inaudible 00:13:35] overall. So we do want to make sure that we have proper isolation of our microservices, and so we also want to define what are the interfaces for our microservices, and how they communicate with one another, and exchange data. And that's where we came to think, well doing everything synchronously is probably not the best way to do it, even though you can use the service mesh, there's still significant complexity to build like that. And that's where we thought, well we need to focus on asynchronous communications. And most of the data that is being exchanged actually exchanged in an asynchronous way, so that should be fine.
And so we really want to start thinking about how we do event streaming, how we choreograph all those microservices together, how we start building documenting the event streams that we have built, and what is the data that's available within them.
How do we start to think about integrating governance there as well, so that we can ensure that we are creating services that are going to be reusable, and creating data sets and even strips that are going to be reusable by other parties? A lot of these things.
So how much of the foundational thinking was event analysis? How much of it was data modeling before you got really stuck in?
I think in the original phase we were thinking, well we know we have these big categories of functions, and so we just start creating microservices for them. Then at some point, we started realizing that well, to be able to model, for instance, the... So as I said, we're a chat platform so we have streams of messages that represent conversations, and we need to make sure that we can archive all of those. If you just say, "Well, we have a retention which we... For every message that comes in, we'll make sure that they are in the table, and we have a service that basically reads the content of this table, as in after the fact, to be able to start archiving those data in a different format, somewhere in the file." We are failing at understanding what is the core of what we're trying to do, which is building the archive.
We are not here to build the service, we are not really here to build the retention itself, we are just here to make sure that we have the right pipeline that goes into building the archive. And having this level of understanding, it's just not event analysis. It's a bit more, I would say, a high-level understanding of what are the product domains, and what are the value they add. And from there on, what are their areas of responsibility?
And once we started doing that, then that's when I think we started realizing that the events were a way for us to convey the information between the different domains, but they were not the... Well, and that's then afterward when we start thinking, well how do we improve our interfaces so that we capture the right development of events? And we start entering really into the event analysis per se.
Okay. So it was more driven by the value you think you can get from a particular architecture, and then you got into the event modeling analysis?
Yes.
Yes, okay. That makes sense. So you say that building the archive was a core concern.
Well, it's one of the concerns. I think it's just one example of where we can... That properly distress that there is a dichotomy between the operations, making sure that you can have a chat that is responsive, and so on. And a secondary function is we can make sure that we can build the archive. Now we have other types of use cases, and the one I'm describing it's also one of the most complex that we are handling, so it's not the first one that we tackle. And well there is also because we are in a transformation phase, we also need to make sure that we demonstrate that what we want to achieve is to evolve, that it allows us to cover different patterns of integrations and patterns of usage of event-driven architecture. And so we started more with those simpler use cases where we can show, "Okay, we are going to use, use Kafka for doing this. Why do we use Kafka? Do we have the right provider for it? Do we get what we want in terms of operations and the automation that we want out of it?"
And then we build on top of it and say, "Well, now we are not doing... We start with simple published subscribe model, then we go, "okay, maybe we have a security risk type of integration and we show that we can do it and that is box fine." And then we start going more and more into what we want to integrate multiple services, so we start defining the governance for the event schemes. And on top of that, we'll come and we say, "Well, now we have enough foundations that we know we can handle this compliance archiving use case." And we start working on it.
Okay. Make that a bit more concrete for me, what was the actual first thing you built?
Yeah. The very first thing we built was something to create web books on top of the platform. So as I said, we want to make sure that you can have more data flowing into your chats. And one of the use cases we had for our customer is, we have systems... I don't know, you can think of GitHub or Jira or whatever, they have the ability to identify that there are events happening on their side and call an endpoint to say, "Well, this thing happened."
And then we wanted to make sure that we can map what we would receive on this endpoint, and define a form of abstraction to say, you can... Our customers can configure endpoints so that whatever happens in those systems will be transformed into a message in a specific chatroom.
Okay.
So to do that, we were in the stage where we have to expose a web book API, and we want to DEC correlate the actual ingestion into the right chat room from just receiving the calls from the top party systems that are calling the API, so that we can make sure that we accept the messages as fast as we can, and we take care all the work that's related to ingesting and identifying the right chat room, doing the encryption that we may have to do with things like that, asynchronously.
Okay. That makes a lot of sense to me because it's one of those... The thing I was struggling to understand is, how do you refactor a chat thing when chat is the absolute heart of it? But yeah, getting third-party data hooked into the system, seems like a manageable chunk that you can chew off all at once.
Yes.
Yeah.
Definitely.
How did it go?
Well, it went very fine. We also knew that we were not taking a higher risk doing that, so it was a fairly well-bounded problem with the simple interface when it came to doing events there because we are just decoupling... It's the same service that in the end exposes the API, and then we'll do the ingestion of the message. So that's the same service that's talking to itself, there aren't too many problems too, just put in the middle Kafka topic just to de-correlate the pure API response from the actual processing of the messages.
And did you hook... So you put that data into Kafka, did you hook Kafka into your existing monolith, or start feeding the monolith into Kafka?
So we didn't do that. What we did is, that the monolith still exposes the REST APIs, so we are going through the REST APIs to effectively push the data into the monolith for the time being. As we're in the middle of having this single tenant to multi-tenant transformation, it's one of the things where... Given while using Confluent and Confluent Cloud in particular, and we have lots of instances, we didn't want to go to hitting limitations about a number of connections to the Kafka cluster, having to handle hundreds of potential service accounts to model each of those different groups of instances. So we were keeping this integration with the monolith, when we'll be fully multi-tenant for this monolith, or what we're considering doing now, which may be happened earlier, is build an adapter.
The monolith already has some events that it exposes that are going to SNSSQS today on then... We're on the AWS or GCP, [inaudible 00:22:11] we're on GCP. And what we're planning to do, because the topics where they're publishing are single tenant as well, what we're applying to do is build an adapter that would consume all those single tenant data, aggregate them, and then publish them into a multi-tenant Kafka topic afterward.
Okay. So you're gradually moving the data over that way.
So that... Yeah. We can get the data out of the monolith that way. And then to push the data into the monolith, well then that's just something we haven't fully figured out.
Are you hoping to? You're hoping to gradually pull away from...
Yes.
Okay. So there will be two-way operation for a while?
Yeah. What we're intending to do is understand also what are the pieces that we can take out of the monolith, and when we take them out, at this point in time we say, "Well, we'll use Kafka to do the intercommunication with other components that may already be pulled out of the monolith." And that's how we are moving as we go step by step.
How long has that been going on for, that transformation?
So purely even streaming. So we started thinking about it, and designing it, by the last quarter of 2020. So we came up with the actual... What it is that we want to do, beginning of 2021. We handled three first-use cases during Q1 and Q2 of 2021. Out of the three use cases, so two went up to production, so this universal web book, another one that's a little bit similar for integration with third-party partner systems but that has dedicated APIs to fly to our system. And the third one did not go to production for other business reasons, nothing related to Kafka.
Okay.
And so, from the end of 2021 up to now, what we are doing is more finding those new use cases where we can add the usage of Kafka. So that's more opportunistic. And we are also starting to... So part of those use cases, there is for instance the business intelligence, currently for everything that's related to data analytics in Symphony. We do have separate bite plants and your dedicated team with their dedicated platform to do everything that's related to the actual hard work on the data and then fit into a visualization tool.
So we are now also starting to say, "Well, anything new that you need to fit into this BI complex, we are modeling so that you go through Kafka." And that's one of the things where we are hoping that we'll start building our catalog of events so that whatever we fit into the BI through this channel would be reusable also by other components if there needs to be.
Okay. You're already capturing that information in one place, and you're wondering if it would be useful in multiple other places, right?
Yes.
That sort of relates to.. Because your job is technical architect, right?
Yes.
So how do you go to the BI team and say, "I've got this new idea of how we should do architecture and I want you to follow along with me."?
So actually, I had some discussions with the BI team for some time. And we know that there are pain points today with their architecture because to ingest the data they have something like seven or eight different pipelines, which have different technologies, which require their own operating model, and so on. And we know it's a pain for them today.
And so, what we go to them is to say, "Well, we understand you have pain and you have problems, and we have a solution to help you fix it. And we also understand that you have concrete deliverables, and concrete production use cases today that you cannot break." So we are not going to come to you and say, "You're going to replace everything." We're going to come to you and say, "Well, we have this new thing that you are going to have to do, so we propose to you that we follow this new approach that we have." We discuss it together, we agree on what are the principles, what are the benefits for you that you're going to get, and we validate that it works with this use case. And if you think that's fine and you want to go further, then we'll have these other use cases that we have already started conceptualizing, that we can add to your map, integrate into, and we build collaboratively in the end, how we go there.
There must be a certain amount of trust if they've got seven pipelines and you need to add an eighth temporarily, right?
Well, the point is, that we are showing them that this pipeline we're adding is able to replace all of their seven pipelines. And that's, in the end, what they're interested in. So we are giving them a way forward to project into saying, "Well, this is the one we're going to consolidate upon, and we are going to be able to migrate from the others progressively."
Okay. So it's a convincing way out of the pain.
Yeah. It's two things, first showing that we know what we are doing and being able to help them, and giving them tooling in the conversations with the other teams as well. It's not only about tools, it's also about people in the end. And yeah, and it's being able to give them a way to project.
Yeah. Yeah. You raise an interesting point, it's not just about tools, it's about people. Which I always think architects must encounter a lot more or fail. What's been your experience dealing with other teams, dealing with the team you are working closest to, and implementing it with management? From the blue sky, thinking to the actual implementation is a long journey, how do you interact with all those different teams to get it to work?
So that's where... What I try to do is to show up all through their life cycle to get from idea to production. So it's at the very beginning when there is a new idea, a new concept, be there to help them design it. And the thing is, we're here to help, we are not here to tell them, "You must do this, absolutely."
So we are here to help them define what are the objectives of what they want to build, identify what are the properties that are necessary to meet those objectives, and discuss them with them. We'll have different ways of working, depending on the team. Personally, I try to adapt to the teams and tell a level of maturity. If there is a team where there is a tech lead that has a lot of experience in building systems and knows very well their stuff, I'll say, "Well you do the design, I'm here to ask questions. If you need anything, if you need any help, if you need some insight about this tech that you have never used before, you can come to me and we'll work on it together."
What we do want to make sure is that we understand what it is that we want to achieve. We will make trade-offs, and I'll ask that person to make the trade-off, and I will make the trade-off as well so that we can meet and have a consensus on what we can do. There are other use cases with other teams where they have less maturity, and where they will... The engineering manager will ask, "Well, Jeff can you do the design?" And yes I'll do it, and then I'll hand it over to the team, I'll do probably a POC, I'll do some concrete code that I can show them how this will work. And that's usually... And then it's a balance, depending on the maturity and what the team needs in the end.
Yeah. Yeah. You make it sound like a large part of the job is clarifying the problem.
Oh yeah, definitely. But I guess that's true for any development, no?
Well, that's great-
I would think to say... And that's one of the things, whether it's architecture or development or anything, the first thing is do I understand properly the problem, do I understand properly the solution that I want to bring to the problem? And once this is done, I feel that the code part is the easy part, because you don't have too many questions left, [inaudible 00:30:04]
Overflow, right?
Yeah. Anyway, you can just use cookie paste somewhere. Oh well, that's what we use open source as. Most of the problems we're facing have already been faced by someone else. Either there's stack overflow, or there is an open source library somewhere that you can make yourself-
Open source, the sophisticated version of copy and paste. How does that translate into dealing with management? Because that's not quite the same thing and they must have high expectations of you, right?
Well when it comes to management, I don't think it's really different. Because in the end... Well, managements are concerned about the results, that's one thing. And as long as the results show up, they are just fine. And they can also understand the difference between... As an architect, I provided some guidelines, there were some trade-offs. So as long as we have clear documentation and clear explanations for the different phases and why it is that we propose this original design, what was the reality check that did not pass or did pass? And if there are reality checks. And when there are failures, as long as we're continuously refining, integrating the feedback, and making sure that in the end that would be a successful outcome, management is fine.
Okay. Okay. I can see that, plus a bit of patience could work out.
Yeah. Well, then it depends on the culture, really. I can say here that in Symphony, I have very good relationships with the different layers of management, whether it's the top management or the engineering managers. They understand what we're trying to do, they understand that we're not here to force them into doing things and that we are just here to try to get into production with the least amount of cost and build on them, but also making sure that when they are into production they won't be loaded with production issues, or they won't be facing a customer that's not happy because the system is not doing what they thought they would be doing and things like that.
Okay. Yeah. So clarity and trade-offs, let's take that into the future. If that's your core philosophy of technical architecture, you must have some clarity on where you want to get to, and what you think the trade-offs will be. Tell me about where you are going.
So really, what we want to adopt is data mesh overall for all our event streaming. So we are currently focusing on building what we call the event mesh, which I would describe as the subset of data mesh that's focused on real-time event streams. And from there on, we will like to bridge the gap to build... To integrate other types of data sets, of other types of technologies, so that we can be more agile in the way we handle the data inside the company.
When you say event mesh, let me just pause you there. Would you include in that the notion of publishing data as a product?
Yes, definitely. That's really where we want to go. So when we started this journey on event streaming, we set ourselves some principles, and out of those principles there was the fact that we wanted to well isolate the product domains, making sure that we can reach eventual consistency with the event streaming, making sure that we have clear contracts onto the event streams for both the producers and the consumers.
And the last one, which is a bit longer time for us, is making sure that we can handle multi-region and global sharing of the data. And when we're going there actually, I think the first and the third thing that I mentioned are core pieces of data mesh. So we came to the realization that what we wanted to do was actually data mesh a bit later on, and from there on we tried to understand really what was meant by this concept, double-checked all the good documentation coming from [inaudible 00:34:15] on this, starting getting more knowledge about other types of tools that are used in the context of data mesh, and trying to really build that thing so that we can go there fully afterward.
For me it's really more generalization of what was considered before, good practices in event-driven architectures, and that we're just generalizing by making sure that... The events that you are building, the data that you are sharing, is actually a product so you want to have good quality on it, you want to make sure that it's accessible, that it's documented, discoverable, addressable, and all of those things.
Yeah. [inaudible 00:34:58]
That you can trust it.
Yes, absolutely. I sometimes think that, if you start with the core idea that event facts are worth capturing permanently, and when you do that you can use them in multiple different ways, you almost inevitably start discovering some of the data mesh ideas. Because you're going to say, "Well, this stream of facts is useful in different ways. We need to make that a primary product that we can republish to different people who have different ideas about how to consume it." Right?
And I think it's also very important from the operational standpoint, and understand what components are mission critical, and what are the ones that are not. How do you make sure that the components that are mission critical have always the data that they need available, and that they don't need to depend on the top party system? So you really start going into generalizing the concept of the domain-driven design, which were more focused on the analysis of the data that you have, and you can apply the same thing to your services and you start building those bounded contexts of services, which are independent, which have all the data they need, which have clear interfaces, which you can control, and from which you can also ensure that they will not be impacted if there is another context that fails, and also that they will not impact another context if they fail.
This is very important for machine-critical applications if you start going to microservices and having lots of services that you need to manage.
Yeah, yeah. That segregation is exactly the solution we're looking for to that microservices that depend on each other problem that is just a distributed monolith, right?
Yeah.
Yeah. So you say you're gradually coming along to the data mesh way of thinking, but I know... I think I read somewhere in one of your blogs perhaps, that you're doing something like the governance piece, in that you are keeping a lot of contextual information in event headers.
Yeah.
Have I got that right? And if I have, tell me more about it.
Yeah. So there are two things that we started doing, one is building an event catalog. So we're using Git to be able to do that, and we are applying the same practices as we would apply to our APIs or our code, to our event models. So we have this Git repository where we document all our events. So not only do the schemas mask for the events, but also have additional information like which topics are this event going to be available. Who are the producers, and who are the consumers that we know? What is the format, are we using Protobuf, JSON? This kind of thing.
We'll apply a pull request process on top of it so that we can get the producer and the consumers to agree on the contract that is represented by the schema. And then we'll start applying governance on top of it to do things like providing standardized names for common attributes, implementing linting backward compatibility checks through the CI processes, having the [inaudible 00:38:10] with not only the participants, the producers, and consumer teams, but also the architect, they can provide their guidance as to how things are moving.
And in the end, we also want to set up a continuous deployment pipeline so that we can take those models that we have and push them into the schema registry so that we are sure that whatever we have in the live system matches what we have documented.
Okay. So you're using Git as a schema registry with pull requests and CI.
Well, the thing with schema registries, at least what we saw today, is that they are very bound to an environment, and make sure that what you have in a given environment matches what flows into the topics. However, the events that we're publishing, they're published by code. And so they will follow a similar lifecycle as what you have with your code, in the sense that they will evolve. They need to be created at some point in time, they will evolve. Sometimes there will be changes that are breaking your interface, so you need to think about having new versions and so on.
And because we have that, it felt natural to us to say, "Well, we're going to replicate what we have for all codes." So to say there are some events that are pushed into the schema registry of the development environment that represents what is known there, and when the software that those events move to the qualifications stage, then we also push the schemas then, and when we reach production will push the schema to production. For us, it felt natural that the events that are published by software follow the same life cycle as the software.
Yeah, that makes sense. I wonder if that would be... How many languages do you use? I wonder if that's connected.
We are using, I would say, three languages. So we mostly do Java, we have some JavaScript a bit, on the backend side there are a few companies that are written with no JS. And we have a bit of Closure as well.
Oh, Closure.
Yeah.
Oh, that's a favorite of mine.
Well, it will make some people from the company happy.
Yeah. It's a lovely language the JVM if you are thinking in terms of mutual data structures. But we mustn't go down the Closure rabbit hole, because I could do a whole podcast on that. Maybe we will one day.
Okay. Yeah, so I was wondering if the number of programming languages would affect how much benefit you get from centralizing your schemas in Git?
Well, the thing that we also try to do is to generate the code so that it's easy to use those schemas. And we also, in that way, make sure that the different teams are not somehow rewriting the schemas before they actually use them. And while we can do the code generation in any language we won't know the tools that are available, although to do it in JavaScript, Java, or I don't know if we want to do tomorrow for our mobile applications in Swift or something else, we can do it as well.
At the risk of sidetracking, are there any plans to turn... To use that to generate TypeScript, any moves towards TypeScript in the company?
So we use type script for the front end, but not for the back ends, as far as I know.
Okay. Okay.
So yeah, so that's the first thing that we do. The second thing, specifically for the headers, is that we do have a subset of common properties that we convey through our headers, and they are the things that can either be used to filter the data. Maybe not all consumers are interested in all the events, so we do put some pre-information inside the header so that they can easily discard it if they're not interested. And I think that's necessary to be able to read the events.
So information about the type of the event, the tenant, the encoding that is used, is it Protobuf, or not? Did we use Base64 representation or did we represent in JSON? All of the things.
Also, if we do put encryption, for some data that may be sensitive we are likely to put encryptions, so we'd have the encryption key ID for instance, so that the consumer could easily know which key is used to be able to encrypt the data.
Okay. Okay. So yeah, that makes perfect sense for header-based information. Okay. So where do you think... Let's see if we can summarize this. Where do you think where the company will be on this project 12 months from now?
So I think 20 months from now we'll have product adoption of the event mentioned in the sense that we'll have had enough event streams created that it's become a more natural way of thinking for the developers. So I would not say it would be yet their default way of thinking, but something most of our developers would have been experts with and will know how to use.
I assume that we'll start having also a bigger share of our data that is actually flying into the event mesh being documented there. Yeah, I expect that probably 70 to 80% of our data would be there by this much in time.
Oh, wow. Would that include the core chat information do you think, by then?
Yes. Yes, I think we will do it. And one of the reasons is the archiving use case that I mentioned earlier, it's one of those use cases where we have more and more challenges into... So currently the way it's done is that we have a standalone batch that queries the monolith to retrieve all the data that was ingested between two points in time, then tries to summarize that into a file. That requires a lot of processing power, it requires to be done off-peak hours, it requires... And we have more, more requirements about having multiple different schedules for different business units in our customers, different formats because different business units will use different providers for the archiving and stuff like that.
And that's something that's not going to scale very well, and we know it. And so that's one of the things where we know where we'll get a lot of benefits if we go by having this imageable log of data flowing in, and that we can build in the cloud more processing for this, doing this asynchronously, preparing the data to go into [inaudible 00:44:32] as it streams, literally. And because that use case will also entail coaching our data and many things related to it, I think that's when we would have progressed enough.
Yeah. Yeah. I can see that. It's funny, I think if you want to find a good use case for something like Kafka, like an event streaming model, you start by asking what business processes have to run a batch in the offer hours. That's always where real-time ought to be happening, and totally isn't. Not to put a downer on it, what do you think the biggest risk is to getting there?
The biggest risk is... Well for me, it's this transformation journey that I mentioned, we have multiple transformations happening at the same time. So we mentioned single tenant to multi-tenant, we mentioned one to microservices, and we're also switching cloud providers. So there is a lot of technical stuff happening, and we just need to make sure that we were able to secure the time for the people to also focus on this thing which is contrary to the others. The others are more about how we reduce our footprint to better control our costs, where this is more, "How can we open up ourselves so that we can handle more use cases in the future in a more agile way?"
So it's just making sure that we have enough focus put on this so that we carry it out. But from the technology point of view, the design, what we want to do, I think we have laid out amongst all of the foundational pieces and it's more now just executing.
Okay. Well actually, I hope we have you back in a year to see how you've got on.
Yeah.
I know you've written up a blog post about some of this with more gory technical details. But the thing I'm going to walk away with is if you're a technical architect, clear awareness of problems, and have a focus on the people involved, that's what you seem to have to inspect.
Yeah. I didn't say it because I think it goes without saying that, but you need to know what you are doing technically speaking as well.
Hopefully. Hopefully. Cool. Well on that, I think we'll leave it. Thank you very much for joining us, Jean-Francois. It's an interesting business case.
Thank you for having me, it was a pleasure to discuss all of this with you, and I hope it will spring some conversations. And if anyone wants to reach out, feel free. You'll know where to find me.
Sure, we'll put your contact details on the show notes. Thanks very much.
Thank you. Bye. Bye.
Well, we'll have to get you France back in a year to see how that story ends. Hopefully, it doesn't end, hopefully, it just keeps on growing. In the meantime though, check the show notes for... There's a good Symphony blog post that goes into some technical details and we'll link to that. And if you want to get in touch with Jean-Francois or with me, the show notes are also the place to look for our contact information, we'd love to hear from you.
While you're there, now is an excellent time to send us some feedback. So if you've enjoyed this episode, please leave us a thumbs up, or rating, a review, or whatever feedback your podcasting app offers you. Again, we'd love to hear from you. For more general information on Kafka and event systems, head to developer.confluent.io, where you'll find everything from getting started guides, to architectural patterns that help you build a successful event system.
And somewhere along that journey, you're going to want to spin up a Kafka cluster, so head to Confluent Cloud, which is our fully managed Kafka service. Sign up, add the promo code PODCAST100 to your account and you'll get $100 of extra free credit to use. And with that, it just remains for me to thank Jean-Francois, Garet for joining us, and you for listening. I've been your host Kris Jenkins, and I'll catch you next time. (music)
Inheriting software in the banking sector can be challenging. Perhaps the only thing harder is inheriting software built by a committee of banks. How do you keep it running, while improving it, refactoring it, and planning a bigger future for it? In this episode, Jean-Francois Garet (Technical Architect, Symphony) shares his experience at Symphony as he helps it evolve from an inherited, monolithic, single-tenant architecture to an event mesh for seamless event-streaming microservices. He talks about the journey they’ve taken so far, and the foundations they’ve laid for a modern data mesh.
Symphony is the leading markets’ infrastructure and technology platform, which provides a full communication stack (chat, voice and video meetings, file and screen sharing) for the financial industry. Jean-Francois shares that its initial system was inherited from one of the founding institutions—and features the highest level of security to ensure confidentiality of business conversations, coupled with compliance with regulations covering financial transactions. However, its stacks are monolithic and single tenant.
To modernize Symphony's architecture for real-time data, Jean-Francois and team have been exploring various approaches over the last four years. They started breaking down the monolith into microservices, and also made a move towards multitenancy by setting up an event mesh. However, they experienced a mix of success and failure in both attempts.
To continue the evolution of the system, while maintaining business deliveries, the team started to focus on event streaming for asynchronous communications, as well as connecting the microservices for real-time data exchange. As they had prior Apache Kafka® usage in the company, the team decided to go with managed Kafka on the cloud as their streaming platform.
The team has a set of principles in mind for the development of their event-streaming functionality:
Jean-Francois shares that data mesh is ultimately what they are hoping to achieve with their platform—to provide governance around data and make data available as a product for self service. As of now, though, their focus is achieving real-time event streams with event mesh.
EPISODE LINKS
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.
Email Us