If I said behavioral IoT to you, would you be scared? Well, you shouldn't be because what I'd really be talking about would be gamified fitness trackers. [Yoni Lew 00:00:09] and Nick Walker of the South African consulting firm, Synthesis Technologies used Kafka to replace some legacy messaging technology and enable what amounted to a whole new business with Kafka at the core of it. They'll tell you what they did and how they did it on today's episode of Streaming Audio, a podcast about Kafka, Confluent, and the cloud.
Hello, and welcome to another episode of Streaming Audio. I am once again, predictably every week, your host, Tim Berglund. Glad you're joining us today, and I'm glad to be joined in the virtual studio by Yoni Lew and Nick Walker. They are both with a company called Synthesis Technologies. Yoni and Nick, welcome to the show.
Hello, welcome. Thank you.
Thank you very much, Tim. This is Nick.
Yeah, so that was Yoni and Nick, in order. Everybody, memorize those voices. This is an ensemble cast today. We don't have a lower third. We don't have a video component, Streaming Audio. So, I can't be reminding you who Yoni is and who Nick is. You just have to memorize the voices. Anyway, we're going to talk about behavioral IoT and Kafka today. And Yoni and Nick, you guys both work for a software development and consulting firm called Synthesis Technologies. You're based in South Africa, but I would love it if you would tell us a little bit more about what you do before we dive into the project and the cool things that you've done with your client. Yoni, you go first. What do you do there?
Cool. So, started at the client in 2019 working mostly with the DevOps maintaining clusters that they have of Kafka and the Confluent stack around the world. I believe we have five clusters, if I'm not mistaken, five production clusters that is around the world all supporting the client's infrastructure and their software as a service. Then at the beginning in around March or April of 2020, as COVID started hitting South Africa, actually, I started moving a bit more into the deferral of building out consumers and producers for a specific area within this bigger structure at the client.
Awesome. Nick, how about you?
Cool. My background is really from a point of view of Synthesis, I'll give a little background of Synthesis as well because Synthesis is really like a cloud consulting partner. It's probably about 20 years old and it's basically does sort of like AWS advanced consulting partner, and obviously became a Confluent partner probably about two years ago in 2018. When I first joined, I was involved with this company Vitality, Vitality Group. And initially I was involved a little bit with Kubernetes because they were basically redeploying their regional staff for AWS using Kops and Kubernetes. And I noticed that there was a... Actually, Kafka in that particular project product suite and I knew Kafka from some previous IoT days in my career. And we got involved with it. It was quite exciting. And more recently with, Yoni, we tried to get from this initial service into PDP, which is more like an IoT sourcing product, which basically uses device data from clients.
Awesome. So, tell us about the project. I guess, I should say, you mentioned Confluent stack, which is great. We can talk about that as much as you want, but in the interest of full disclosure y'all are a partner, Synthesis Technologies is a partner of Confluent. Is that right?
Cool. You'd be amazed. Maybe you wouldn't be amazed. How seldom I know that just before we started recording today, I was talking to a colleague and it was about is so-and-so a partner? I'm like, "I don't know, but here's who you go ask." That stuff is never on my fingertips, but glad to know that. Glad to know that we're working together on this, but tell us about the thing that we're working together on. I love the title, behavioral IoT because honestly if you don't know what the project is it sounds incredibly creepy and something very, very bad. And instead it's actually something very ordinary that a lot of people do. So, whichever one of you, Yoni or Nick, whoever would like to get started, tell us about this project.
I'm happy to jump in and get us started on it and give you some background on it. So, Vitality Group, and there's a organization in South Africa, medical aid company in South Africa called Discovery came up with a brilliant idea to encourage people to live healthier, and to reward them for living healthier. And that's where all of this started. As I say, Discovery started this in South Africa. They then created a company called Vitality Group. And they offer this as a service to international companies, medical aid companies, to encourage people to just live healthier, to exercise more, to eat healthier, to sleep better, all of these types of things.
And so what that then means is we need to find a way of getting data into the system based on the big thing, and Nick mentioned a VDP, which is very much good at pulling fitness data and exercise data and routine, sort of walking around, how many steps you're doing a day. All of these types of things to see that people are actually getting their fitness goals. And that's exactly the way that Vitality sets it up is by creating these fitness goals. And that is where this behavioral IoT comes from of seeing all these people exercising. Are they eating healthy? Every time they eat they'll be logging their food. When you go to sleep and you have a fitness device or a sleep device on your arm, on your wrist. It's picking up that information and it's letting us know that you're sleeping healthier and you're sleeping better. That's very much that. Then gets rewarded. You get calculated. You get points rewarded. And this obviously helps the medical insurance company as well as helping the individual to live a lot healthier.
I like it. I like it. I especially like the sleep tracking. That sounds particularly timely for me. I don't think we need to get into that on the podcast, but I'm just saying I'm intrigued. Obviously, so behavioral IoT means fitness trackers in this case. So this is not some creepy government thing where people are tracking you for nefarious purposes. It's because you want to be healthier and live longer and be stronger and have a more focused mind than all the stuff that comes from just caring well for your body. So, that's pretty cool.
Also, it's pretty obviously, a good Kafka use case because this is a world, the domain we're talking about here, the Vitality Group's project is a world filled with events. So, take it from the top. I mean, I'm imagining a number of things that are difficult about this. If it's global in scope, there are probably interesting regulatory hurdles. And if you're covering... I don't think I heard you say any particular fitness tracker. So, if you're covering a broad array of fitness tracking products that sounds like there are some neat problems in there. So dive in. What was hard about this?
Good. Let me just give a bit of background. Tim, it's interesting. It's a very exciting area, actually. There's two sides to it. There's basically the story of getting information from devices and there's lots of devices. Probably, about I think at the moment we've got between 12 and 15 vendors, which provide information. And they are obviously well-known ones and some obscure ones, but they provide all the data that is required from the system. It's basically lots of stuff, and not common. They're not normalized. They come from each vendor's got their own idea of how they're going to report fitness and routine stuff and calorie intake and lots of stuff. So, we have to deal with all those interfaces. Each vendor's got a different API and you have pull modes of operating or your pull/push points of doing it. Or you have like a push notification, a couple of techniques appear and we have to deal with each one of those.
And then we have to essentially normalize that. So, we've got a good bunch of people in Vitality we work with and they know the system pretty well. And then we have, try to get a homegrown solution. The initial vendor, the initial system was actually outsourced in terms of history. But I decided to say, "Let's just try and take ownership of the system and try and create all this sort of like data pipeline dealing with all these vendors, trying to normalize their data, and provide information into a system of compliance, which basically... Or database, which is I need now to provide points for this stuff.
So, there's two sides. There's vendors, and there's also partners who provide rewards. So, like insurance companies or even airlines, or whatever reward somebody could get, you can think of, if you are fitter. And that's quite nice. You can get a discount for the fact you've got a... You're a certain level you can say, "Can get a discount of say 10% in a flight." Although, it's not easy to fly anywhere now, but that's the idea.
But it still feels nice to have those points there. I still have plenty of miles in my MileagePlus account. And I just like to look at them sometimes. It's probably something that does not speak well of me, but so two problems. One is the integration problem. You've got all the vendors, all the fitness trackers. And the other side of things is the application marketing business side of how to gamify it, how to create rewards. And I assume those detecting and distributing those rewards is also an automated part of the system.
Yes, there's essentially two components, really, and we've talked so far about VDP, which is really like a centralized personal behavior feed. It's almost like a... I guess, it's becoming like a data bureau. It's like a service which can say, "Okay, I'm getting all this information from lots of people in different places in the world, in different regions, in AWS regions." And there's obviously these people belong to different countries and they have certain tenets. So that's a lot of stuff that comes in there.
The second, the other product is really the system does the rewards and facilitates the user interaction with the system circuit. I come in and say, "Okay, I want to look at my rewards." Or, "Can I use a certain level?" So, that's an initial system, which was called V1 initially. VDP is more interesting. Although Kafka is involved with V1, the Kafka side in VDP is [inaudible 00:12:18]. It's more like a streaming scenario. In reality, we are trying to lay the foundation now to put Kafka into it, into the data pipeline. That currently the existing system uses TIBCO streaming functionality. We try to say, "Let's build down a scaled modern architecture of Kafka in there to start and build up layers of functionality and pillars to provide all this functionality for the customer and for the reward partners."
All right. Interesting that you mentioned TIBCO. I obviously don't want to drag it or any other product in this conversation, but there's a migration away from it. So, with full respect to it and the vendor and all the people behind it, what are the motivators that are pushing you to move to Kafka from it?
[inaudible 00:13:12]. Yoni, you can take on that one.
Cool. So, the truth is there's two main features behind it, and I'm going to use your words of not putting down TIBCO, but there's some features in Kafka that are important, and I want to focus on those. So the first one is, obviously, we're getting all this data. It's coming in, as Nick said, very different ways and very different features. And once we get that data in, some of the manufacturers want responses within X amount of time, which takes us some time to get that response. And you sign you'll process this data and Vitality Group are looking at having a lot of people on the system and obviously growing the system. So, this all comes into play.
And so what we've done in the way we built the system is that first initial connection is made between any of these manufacturers and our system. Whether it's done through a push or a pull or whatever it is. And we get that initial notification to say that there's data and then we send that response. That notification is stored in Kafka. That is then obviously consumed in another service. That service will then go and get the actual data. And obviously this is where all the streaming comes in and all of that, and the data is achieved and then stored in another topic on Kafka. And then it gets normalized in a third service and then sent on to the rest of the system through multiple topics and multiple different workflows.
This is important and why it's nice using Kafka is because TIBCO is a queue. It comes in, it reads the message. Once that message gets taken, it's no longer on the TIBCO queue. Whereas with Kafka, obviously, you've got storage and you've got these offsets that you can reset to various times to replay data. Whether that needs to be a replay of the notification that there is data or replay of the data itself. It just gives us the opportunity to even if something goes wrong downstream and someone doesn't process something correctly downstream, we can process that in our time and at our rate and the way that we went to because we've got that data. As opposed to, previously, they've had to go, and they've had to ask the manufacturers to re-send data if something's not processed correctly. So, that's the first idea of it.
The second one, and I like to look at it this way is that Kafka very much is built for, as you mentioned, and Confluent, the whole Confluent stack is built for this and streaming of data and this mass streaming and mass scaling of data coming through the system. And yeah, those are the two big reasons that we've got.
Queues are not logs. I think that's a very, very important point.
Queues have a reason. And every once in a while, somebody will ask, they'll have a legitimate queuing use case and they'd be like, "Well, can I make a message go away after it's read once by a consumer?" And [inaudible 00:16:38], "I know you can't, actually, Kafka doesn't do that." It's not a thing, but I think the usefulness of a distributed log is probably bigger than even a potentially scalable queue. There's a lot that you can do with logs, and logs show themselves in lots of the systems that we build, lots of the subsystems that we build. They're just a ubiquitous data structure that underlies a lot of the components that we build systems out of. And so, that's not surprising that would come up that you need a log.
And scale also, I mean, the system that you're talking about sounds like honestly, a little bit of a larger scale than most. This sounds like it's potentially a big thing. And so scalability matters to you too. There's a few things that I've heard you guys say that are interesting. I want to pull on them a little bit. You've mentioned push and pull notifications. Now, I'm going to speculate for a minute and you tell me if I'm right about this, but there's an ecosystem of fitness trackers that you want support. You're not an Apple watch company. You're not a GoPro. Not GoPro, sorry, a Fitbit company. Does GoPro have a fitness tracker yet? They probably will soon.
There are all these fitness trackers in the world and you want to support them all. My guess is, I mean, obviously they all have APIs and those APIs are as utterly different from one another as probably human languages are. It's like you are randomly sampling 10 languages from all over the planet and you have to learn them. It sounds number one, difficult, and there's a living to be made there. Let me just stop there. Is that on the integration side of things, speaking all those languages is a problem, right?
Yeah, this is complex. And in reality, the pipeline is quite complex because of that. And also the fact that there's a requirement to derive rewards out of it. The other thing, Tim, is there's a compliance issue. You're dealing with people. People do anything. They can pretend to do some exercise. They might put their Fitbit onto their dog and we have to detect that stuff. We have to say, "Is this real? Is this person doing so many thousands of miles a day? Is it real? How can it possibly be like that?" So, we do some compliance testing, and a lot of that stuff could be done in streaming, particularly in Kafka's streaming or KSQL. There's lots of potential.
We are still at the moment just really getting Kafka in. So it's quite exciting. We're replacing stuff that [inaudible 00:19:24] queues and putting Kafka in, and it's quite incremental. So we start with low profile vendors initially to get it going. And then we start to add the bigger ones. Once the stuff is in, embedded in, we can start to say, "Okay, what else can we put on top of this? What sort of stuff which was tricky to do beforehand with existing technology? What can we do?" So we're quite excited about that stuff. We're quite excited about ksqlDB, and also obviously Kafka, using Spring Kafka as sort of like an interface to Kafka, into Confluent and whatever. So there's lots of potential there. We got go ahead to use Kafka, ksqlDB. So that is quite nice, but it's still early stages, but we are, Yoni and myself are very excited about what we can create as a new functionality to use a thing as a proper data service within Vitality Group and people who want to use this data and different types of data that this could possibly produce.
It could be AI stuff for running models or influencing or it could be just derived data, enriched data, which should be beneficial not just for the consumer, but also for the partners, the rewards partners might get quite excited about that. Because you want to recruit, you mentioned Vitality Group wants to recruit, reward partners for each country or region. If it's appealing to sell to those people, it's quite nice to have that functionality.
Absolutely. I want to dig into the rewards partner side of things too, if there are technical issues there to talk about because it's, I don't know. It sounds cool, but it sounds like on the integration side, you've got fraud detection, which I hadn't thought of. But yeah, if this is gamified and I'm earning points for all my steps or my running or whatever it is I do, put it on my dog. That's just the worst, but I guess, and I have a childhood friend who spent some time at in the employee of a company that made a fitness tracker, and he would talk about the things that their data scientists could infer from the data. It's a fairly broad array of kinds of behaviors that are detectable. And so, I imagine dog walking versus human running in terms of the motion data would be easy to see or at least possible.
There's a wonderful little URL you can visit. I mean, if you search around, there's people who do amazing stuff to try and mimic the behavior. It's incredible what the people have decided to focus on, try to outsmart the system.
Right. I don't know. I guess everybody needs a hobby and some people need two of them, but if I find any of those, maybe I'll include them in the show notes just for security research purposes. I probably won't find any though. On the less data science and more event driven architecture side of things. I think, Yoni, I think it was you who has said push and pull?
And in all of these I'm sure bizarrely different APIs where you go through and you work on them and you think, "Wow, how different can people think of solving the same problem?" There are probably many different ways. I'm guessing these are not typically event driven APIs. These are synchronous things where you have to go and ask, could you talk us through how that works?
Yes, so some of them are, some of them aren't. And that's exactly this idea of push or pull. There's some APIs that you can connect to with some token. And that allows us to receive a push notification, which just says... I'm trying to remember if I've got this saying correctly, that has, yeah, it's a push notification that just says, "We've got data for you based on this user." And then we can say, "Okay, cool, perfect." And we'll go and get that data. And then a pull notification will work differently in that we'll go and we'll say, "Do you have any messages or any data for this user?" The manufacturer will say "Yes." And we'll say, "Cool, please send us that data." And that's just the difference of thinking of these different APIs.
Sure. And on the pull ones, I suppose you have some sort of agent or worker that's just doing that pulling that's kind of like a Kafka connect sync source connector doing its thing.
Yeah. That makes sense, yeah.
You mentioned to me, you asked about the story about the rewards side. The vendors on the other side. There's numerous quite well-known ones and they're probably about 10 or 15, but in rewards it's very different. Each country, each region has totally different way, companies to deal with stuff. So you might have an insurance company in one country, it doesn't exist in another country. So the whole, when you try and move this from one region to the other, or each one country to another, you have totally different partners. So it's quite tricky to deal that side, the reward side is dealing with different companies.
Yeah. And are those, on that side of the system, are those API integrations or are those primarily human driven business to business interaction?
Across the board. You can have manual stuff, files coming through, or even just an API. It's a cross section of different technologies. It's quite a mess if we think about it, but you have to deal with it. If you go into a new region or go to a new country you have to say, "Okay," you have to find partners initially. And usually the partner approaches Vitality Group. But you say, "Okay, what can you reward? And what information do you want to reward with?" So it's a complex business processes to go through with a little technology to pave the way.
Yeah. It's obvious. It really is obvious on both sides of this why there is a business here because the integration still, the fitness trackers. I mean, that's work that has to get done and maintained and we all know that's just data integration only more so. And then on the reward side. Yeah, that's a boots on the ground kind of thing where you actually have to have people in each.
Yes. Actually, if you look at VDP, which we're talking about, which is really the behavioral stuff. There's another side, which is really called VDX, which is basically the manual stuff, the files, the interaction with other partners, which is much more batchy oriented as opposed to the real Kafka sweet spot, which is a realtime data and feeds and stuff.
I suppose a lot of the interfaces, which are automated are probably on the rewards side are probably very batchy at this point in time. We are working on convincing everyone that their business should be fundamentally event driven and that there should be an event streaming platform at the heart of free company. This is part of the mission of this podcast. We're getting there. We're trying to help. And in another 10 years, that work will be, I think, pretty far along. What else is next? Where is this in terms of completion and deployment and what's coming next for it?
You take that, Yoni.
Cool. So there's, I mean, there's so much. As Nick said, we're very much in the early stages of just getting Kafka in, getting that foot in the door and just doing a migration over onto Kafka. There's a lot of stuff that we're looking at, at the moment. I mean, everything from... So, at the moment we're running Kafka in a self managed service. So looking at maybe moving that to Confluent cloud is one option. We're also looking at, I know I remember watching quite a few times your announcement about second tier storage. And I mean, obviously, with the data that we're pulling and the historical data can be very useful and can be very important for many, many different reasons.
So, looking at doing something like that where you've got maybe a week's worth of two weeks worth that you're storing on the Kafka brokers, but then having that second tier storage in AWS, or in some sort of storage system, and being able to look back at that data and replay that data, and replay that data if that ever becomes necessary. Really looking at, as Nick mentioned earlier, using ksqlDB to find different ways of streaming these things because something we haven't quite touched on a 100% is obviously we're getting all of this data in, but then that data needs to be split out and sent to each different environment in each different market for their users only. Because obviously we can't be sending user data for people around the world to everywhere. So using some sort of streaming system to, and obviously ksqlDB will fall very much into that. Sending them to the right places and to the right topics to allow for markets to be consuming from there. And I mean, there's so, so much that we're looking at, and so much that we're looking forward to get doing and working on in the system.
Tim, the thing that always excites me is the reactive nature of this stuff. My background is IoT. [inaudible 00:29:50] manufacturing, we dealt with sensor data and whatever. The fact that we could actually start to create reactive applications, which are providing information, which is timely to the customer or the partner to say, "This is the right thing at the right time." And that architecture model is very appealing and I'd like to get that stuff into this data pipeline in the long-term.
That sounds powerful. I like where you guys are going, event native Kafka as system of record, do things on an event basis, and stream processing applications in ksqlDB. This seems like a good ground from the ground up way of building a system that is itself fundamentally event driven and has a nice goal. You're just trying to help people be healthier. I don't wear a fitness tracker. You're making me want to, and I don't like that. I feel like just going somewhere that is going to just be not good for me. But hey, you know what, maybe I'll have you back on and we'll talk about how I sleep better now because I was earning frequent flyer miles by sleeping longer than seven hours. I mean, that might do it for me, actually, so we'll have to see.
Look, I can give you just one warning about it. I do have a fitness tracker, of course, but it kind of becomes addictive in just seeing, okay, how many steps have I got? Have I got enough steps, and am I getting my points for this week? It really does become addictive and it makes you want to live healthier. It really does.
Yeah. I think also it just seems that some of these devices pick up oxidative levels in your blood, which is quite relevant now with COVID. So, interesting stuff can come out of it in terms of the current scenario we're living in at the moment.
My guest today have been Yoni Lew and Nick Walker. Yoni and Nick, thanks for being a part of Streaming Audio.
Thank you very much, Tim.
Hey, you know what you get for listening to the end, some free Confluent cloud. Use the promo code 60PDCAST, that's 6-0-P-D-C-A-S-T, to get an additional $60 of free Confluent cloud usage. Be sure to activate it by December 31st, 2021, and use it within 90 days after activation. And any unused promo value on the expiration date will be forfeit, and there are limited number of codes available so don't miss out.
Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me at @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on our YouTube video or reach out in our community Slack. There's a Slack sign-up link in the show notes if you'd like to join. And while you're at it, please subscribe to our YouTube channel and to this podcast wherever fine podcasts are sold. And if you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover us, which we think is a good thing. So thanks for your support and we'll see you next time.
Synthesis Software Technologies, a Confluent partner, is migrating an existing behavioral IoT framework into Kafka to streamline and normalize vendor information. The legacy messaging technology that they currently use has altered the behavioral IoT data space, and now Apache Kafka® will allow them to take that to the next level. New ways of normalizing the data will allow for increased efficiency for vendors, users, and manufacturers. It will also enable the scaling IoT technology going forward.
Nick Walker (Principle of Streaming) and Yoni Lew (DevOps Developer) of Synthesis discuss how they utilize Confluent Platform in a personal behavior data pipeline provided by Vitality Group. Vitality Group promotes a shared-value insurance model, which sources behavioral change information and transforms it into personal incentives and rewards to members associated with their global partners.
Yoni shares about the motivators of moving their data from an existing product over to Kafka. The decision was made for two reasons: taking different forms and features of existing data from vendors and streamlining it, and addressing how quickly users of the system want the processed data from the system. Kafka is the best choice for Synthesis because it can stream messages through various topics and workflows while storing them appropriately. It is especially important for Synthesis to be able to replay data as needed without losing its integrity. Yoni explains how Kafka gives them the opportunity to—even if something goes wrong downstream and someone doesn’t process something correctly—process the data on their own timeline and at their rate, because they have the data.
The implementation of Kafka into Synthesis’ current workflow has allowed them to create new functionality for assisting various groups that use the data in different ways. This has furthermore opened up new options for the company to build up its framework using Kafka features that lead to creative reactive applications. With Kafka, Synthesis sees endless opportunities to integrate the data that they collect into usable, historical pipelines for long-term models.
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.Email Us