Hello, you're listening to Streaming Audio. And this week's episode might have to come with a hurricane warning because the ideas fly pretty thick and fast out of this week's guest, Ben Gamble. I first met Ben a couple of months ago at a monthly hacking night I run here in London. And it was definitely the right place to meet him. He is a very capable hacker. He's a tinkerer, he's a builder of interesting things, and he's got a very active mind as you're about to find out. He has a background in things like event processing, stream processing, but also augmented reality systems and the world of online gaming, like multi-player games. And it's that last part that brings Ben specifically onto the show this week.
Because when you think about it, the online gaming world has been doing real-time event processing at massive scale since, I guess the days of dial-up modems. And there's a lot we can learn from them. Like to give you one example, we could learn strategies for dealing with 200 users or trying to pick their seats on a flight all at once. Because when you think about it, that's not actually that different from multi-player games of capture the flag, right? Everyone's trying to acquire a resource. There are conflicts. We have to deal with it in real time.
Ben gave a whole talk on these different strategies at Current last October. And we'll leave a link to his talk in the show notes. But for right now, join us for a more conversational run on what we event streamers can learn from online gaming. This podcast is brought to you as ever by Confluent Developer, our site to teach you about Kafka. More about that at the end. But for now, I'm your host, Kris Jenkins. This is Streaming Audio. Let's get into it.
With us today on Streaming Audio, it's Ben Gamble. Ben, how are you?
I'm very well, thank you. How are you Kris?
I'm very well. I'm glad to have you here. I want to pick your brains, and pick apart some of your history.
You're right now, you are, I forget the job title, but you work at Aiven, right?
That's correct. I'm a developer relations manager. I head up the team that focuses on the developer persona. Whereas we've got one team focused on data professionals and another team focused on DevOps professionals. Mine's the one in the middle with all the overlap.
Right. And you work with at least one former Streaming Audio guest, Francesco Tisiot, right?
Yes. Yes I do. He's one of our main leads on the data side. It's been great fun working with him, and should we say baiting him around pineapple pizza?
Okay. Okay, we'll stop for that. Where do you stand on pineapple pizza?
To be honest, I am not a fan of pineapple to begin with. So it was more of a thing of where it actually was a bit of a joke at a game studio I used to work at, that everything was the cherry pizza crew. So my Discord handle and my old socials actually have cherry pizza as the logo because that was the joke at the studio we worked at for a long time. But to be honest, fruit on pizza is just, it's strange to me. I'm not so viscerally angry about it, but to me it's just a, but why?
I must admit, I was converted, I'm team pineapple. But let's talk tech instead, and you've already hinted at it. Because you currently work developer relations, you've done a lot of work with Kafka, real-time systems, event-based systems. You're an inveterate hacker, I know this. And you have a background in the gaming industry, which is what we're really going to talk about.
Yeah. So it's one of those things where, as I found out over the years and everyone starts to realize is everything is sort of the same when you start stripping away layers. So of course, like a lot of people, I got into this thing in the first place because I wanted to build games. And as you did in the '90s, you went to a library and found a book that said, "How do you build games?" And it said Learn C++.
So I did. At age 11 was taking a course with a load of postdocs, because I lived in Cambridge, around C++. And was just trying to hack this stuff together, followed by years of Unreal Engine modification, as my way into actually building stuff. I got stuck at this and then realized it was what I wanted to do and then became an image processing person building augmented reality apps at a management consultancy.
What year are you building augmented reality?
So this is 2011 actually. So way back when, in at least that terms. But when you actually start looking back at history here, another thing that came out in the '90s because us for always nothing is really that new even in software. Like The AI went to was the '70s, augmented reality was really the '90s. SLAM wasn't invented back then. PTAM was from the mid-2000s. And then when you get to things like what I was doing, which was the iPad 2, the second iPad generation with multi-people building AR rooms for everything from submarine warfare simulations to... Oh yes, I do all kinds of nonsense.
And it was just this idea of saying how do I put multiple people into a simulation that they could actually experience and get something out? So this was everything from building manufacturing lines to see where machines would fit, to such things as just simulating designing products in 3D. And so I give it a look-and-feel on a shelf sort of thing.
Okay. So I begin to see roots of both gaming and multi-event collaboration stuff.
Yes. And the stream processing angle came from building inspection machines. So this was the idea of gigabit ethernet cameras, streaming actual frame by frame, or even line by line from high-spec cameras. And then having to do stream processing on these lines of frames to restitch them and then build up this kind of processing engine for that. And this is working with everything from medical, like actual tablets and capsules themselves. So on a production line, are they clean? Are they safe? And that was where the stream processing came in from all this angle.
Oh, right. Okay. So there's a lot to dig into there, but I have to start with the technical one. Why would they send it line by line rather than frame by frame?
So what happens is, this is a circular object you have to remember. So it's a three-dimensional circular object. And what happens is as it rolls by, you can basically unwrap the frame and if you control the CCD, which we did, you could actually pull each line of the CCD off the backend and stream each line of the camera as the ring rolled by. So you automatically unwrap this 3D shape into a 2D shape and you wouldn't have to do any extra clever deconvolutions.
Oh, I see. I see. Okay.
We actually had custom firmware on the camera to do this. It was quite a lot of... You're only looking back did I realize quite how novel all of these techniques were. But it's like wait, this makes sense if you think about it. But it's challenging as well.
God. Okay, so I'm going to struggle to draw the line from there to you being developer relations manager at Aiven.
It's actually not as hard as you might think. But let's keep going.
Okay. But yeah, where I want to get to is you gave a talk at Current, which sounded fantastic, but I wasn't able to attend. So I want to get the potted version of your talk, which marries together the idea of stream processing and gaming, right?
Yes. Well the funny thing here is due to the same thing at the Kafka summit in London where I saw your talk but couldn't attend. And then ended up watching it later, about yours, about building games with Kafka. I was like, "I like this." I have done a lot of this. Let's see what I can do, which looks a bit like this but is more in my area of expertise. And the short version of this is that, as we go more online focused and all the tools move from being these desktop applications, there's a growing need to have collaborative software from Google Docs to Miro boards, to just Slack. And all these things are just there to allow us to work together.
So the idea then is how do you make these things both more resilient as we've seen what happens when they all go down/ but also how do you add the features in, in a sane manner without having to re-engineer everything under the hood? And then how do you decouple it? How do you scale it? Because it's all well and good to do it once, but how do you do 20,000 sessions at once? And this kind of session-based scaling along with multi-user scaling are two different axes. And this is where Kafka actually becomes a really powerful tool to actually deal with this stuff.
So I can see that, but I need more detail because I think of gaming, right? Game as a slightly different world where I'm shooting at you, you're dodging me, we have to have this shared session that's reacting in real time for that kind of stuff. But it almost seems to me like it wants to be object-oriented, mutable state because you have to coordinate these state changes. But I don't see how you possibly scale that out to hundreds, thousands, millions of users all sharing a shared state. So I am lacking this mental model and that's what I want you to fill in.
So this is where it actually gets good fun. This is where we have to unzip the layers of abstraction that are going on here. So the core of everything is actually mutable. Your RAM, you do need to keep reusing that RAM somewhere, eventually. So what happens when you add a layer on is actually things become very functional and very immutable once we add at least one layer on top of that. It's far closer than you might think. There's a really good John Carmack blog all about using Haskell and modeling game state with Haskell, which I'm sure will appeal.
Which is really good fun, because if you think about it, what you actually do have is a lot of immutable state, which is there are various time slices. And when you start collaborating and putting more things, everything is done on a time slice. So these are the actions I've pulled it off the network in this time step. I then have to process that and emit a new state. And what I've done effectively is a monadic view of a world where I have a time slice of network events. I do something in a different space and I have to send it back over the wire, which is my encapsulation of a function here. And what happens...
Sorry, is that the way games tend to work is they just slice up the network, a time chunk?
Absolutely. Have to. And this is why actually it's really important to have very fast ticks in games is because then the network pulls off smaller. So what happens is you're pulling off a UDP buffer normally, but what happens is if your network step is more than a few milliseconds, your UDP buffer gets larger, therefore your application of that gets larger again. Therefore, you have to pull even more state. So you have a generative curve of how much data you're pulling.
And this is where it comes good and functional is because what happens is now you've got 10 events in a row, which is I've stepped left, step left, step left, step left. And you step right, step right, step right, step right. And now you do something that needs to interact with me. What happens is in the actual game world is in the server is you maintain this graph of all the events that happen in time, right? Which is now an immutable time stream, which is good fun. Because what happens is if you then are out of sync with me by an hour at a time, what we do is we rewind time down my graph, right? Jump back in time to say, would that have done something interesting? Is it sufficient enough to actually do a mutation or a merge and the kind of git philosophy, right?
And then we say, yay or nay on your event. And then if that happens, we diverge the graph, throw away the new stuff and keep going forwards. And reapply the immutable logic again, which is why it's good fun.
This is actually happening in real time on games?
Oh yes, this is 60 times a second.
Or 120 times a second. So there's a slide in my talk which is, now imagine doing a git merge with 100 users between eight and four milliseconds. Yes, with live conflict resolution in a way that doesn't annoy them.
And really angry users.
Oh yes, the most angry users. There's a great talk about GameDev where they actually, they alter the time. They basically fiddle around with the time in multi-play settings until the complaints stop being of the wrong type. They accept there are going to be complaints. And it's basically that is your user testing is you optimize for feel.
And what actually happens here is, you have a classic dichotomy of how do you do shared distributed state? Do you rely on consensus models? And the short version in gaming is yes, but. And the but is that you can't wait for eventual consistency, you just have to take your call immediately. So CRDTs is what I talked a bit about in my talk, and this is something where conflict-free replicated data types are great except eventually consistent is not bounded and humans are really picky about how long things take.
Right. So this is the idea where it's fine if you're editing a Google document that you can have a delay before the other person's edits come in, and you eventually get them.
Well, had. But you can't have a big delay. And this is where it gets actually quite fun is, so this is why when Google Docs started doing this, they started winning was the delays were short. There were previous versions. So a fun fact is, are you familiar with Mother of All Demos, this thing from the '60s in Menlo Park with the Xerox demos and everything?
Let's go through it because not everyone listening will anyway.
So back in 1968 there was this thing called The Mother of All Demos. So Xerox proceeded to demonstrate basically the future of computing. So they had the mouse, they had graphical user interfaces, they had networked users, but they also had collaboration software built in. Because it turns out they did everything. And it's kind of funny to look back at this. So this was famously done as a live demo as well, hence, The Mother of All Demos. And so many things we take for granted now, or as innovations in the last 15 years have just been things they demonstrated back in the '60s. And pretty well as well, which is the shocking thing.
And what happens is, as you start to see this thread going through is everyone wants this to happen. And it's a big core thing here is we keep trying to do it. But it keeps failing because these are experiences isn't good enough. And this is where it like challenge the point of, everything has to be time bounded when humans are involved. It doesn't matter if you want to be right, it has to be right now and good enough.
So we're getting into proper real time. Because we haven't talked about real time here on Streaming Audio, as faster is much, much better. But this is hard real time where if you can't do it in 10 milliseconds, don't bother doing it at all. You've failed.
Well so it's hard on the grounds if there is bounds, but it's not that tight, do you know what I mean? So it's like humans' reaction time is with a game, yes, you will feel something between 32 and 16 milliseconds when you cut that down. That's 30 frames a second versus 60. You will feel that change. And on a network you'll feel it even more because of the way the latency adds up. But when you start dealing with edits, you've actually got a bit longer. It's about 250 milliseconds is generally the bounded effect of the blink of a human eye. Because you're not actually viewing my screen and your screen at the same time, you can get away with a lot more behind the scenes.
It's like chat messages don't come inside 50 milliseconds. They often take longer, but you don't mind because the flow is there and it takes you more than 200 milliseconds to type your reply. And you get away with a lot simply because you can't see each other's screens, which is the best kind of cheating. If you want to see hellish game development use networks with split screen at the same time. And then what happens there is you have people who can literally see each other's stuff and they have to be 100% in sync combined with people who can also sometimes see their stuff but via a network connection.
Oh my God.
It gets very bad very quickly. And this is where you have to take these assumptions into play. Now, what's interesting is this is still true when you go to the actual just compute world when you're trying to let's say solve some distributed algorithm problem. Because of you don't want this to take too long, do you? You don't want to have a GC lock going on in the background, which tanks one server, therefore your order processing has failed. That can't happen. So the same logic applies, which is do I then discard this dead-letter queue? Do I want this dead-letter queue? Because that's what's happening in the game is, these events are late, got to discard them. But in the real world they can't be discarded. So we dead-letter them and hope.
You usually find that that dead-letter queue is full of 50 million messages because we weren't monitoring it.
Kind of part of why I really like the Kafka approach here is, that's not really very representative of the world to have dead-letter queues everywhere. Instead, it's just a lock. So we can just keep the offsets of the ones that didn't work and move on.
Okay, I've not heard about doing that technique. That's interesting. So you just record the offsets of broken messages instead of hopping to a dead-letter queue?
Yes. The problem ends up being that I can't... If they've broken once, the likelihood of them not breaking again is quite low. They're likely to have continuous problems. But also I don't want to have to maintain a separate topic because topics are heavy. I don't want to dump them in some third-party data, another database in a different system. And I'm now building more architecture. So instead, I already have them in disk and I can pull off offsets pretty easily. So why not just have a slow worker in the background, possibly even a stream that just pulls this list of records which are known to be strange and then move on.
Because the key thing here is, if they're logged, I know roughly what they are. I know if they're worth reprocessing. Or I know they're just things they want to check later, or throw a user like a human at them. Because we're often than not, if you start having broken records going through, either you're going to have so many thousands that it's impossible or you're going to have a few and you want to check what they are.
Yeah, okay. I see that. I feel like we're being a bit scatter-gun in this conversation, we're all over the place.
Yes, sorry. To bring it back, it's the ever classic thing of let's say in collaboration, I send a conflict through, right? You and I both edit the same word. You want to make it an America spelling, I want to make it British spelling, and someone else wants to make a typo. So this happens. More often though, it's me with the typo, I'll be honest. But the key thing here is that these things are three changes that have gone down the wire. Let's say they're all inside that one processing step.
So you're pulling records off Kafka for all these inputted messages. And you pull four of them at the same time. And there are three edits to this one word. I apply them in an order, often by timestamp. But the problem is there's going to be a hard conflict of this one is now editing something that no longer exists. So either you reconcile it with algorithms or you basically mark it as, sorry this edit won't happen. Put it back as a thing, pass it on, just send the event back to me and say, "Sorry, not edited."
And this is that kind of, is it worth retrying my edit? The answer is very rarely.
Yeah. So how does this translate? Because I see this core idea of we've got a whole bunch of potentially conflicting events trying to manipulate state for the sake of responsiveness. Some of them are going to drop on the floor, but is there an obvious R world, or a streaming businessy world parallel for that problem?
Yes. So the classic thing here is, think the distributed transaction problem. So you have end users changing the state of something like buying stuff. So classic one use is ticket booking because this is a massive problem. So how many tickets are available? Where are those seats in say the theater or equivalent? And I want to book them or cancel them or change them. So I can create state, I can modify my state, or I can delete my state. But you can do the same thing as well. But you need to see an accurate view of the world before you're allowed to book them, right?
Yeah. Yeah, yeah.
So what happens? That's exactly...
It's very much like a multi-player game, isn't it, to book?
It is. It really is.
We're all in different rooms trying to compete for a resource.
Exactly. And we're also trying to work out if I have say a group of 10, do I want five and five, three-three-three, three-three-four or something like this? Because I now have to seat them all together so I have the right people sitting together, which I now need to have an accurate view of the world to get. But I shouldn't be able to block you, pardon me, booking your party of 20 at the same time because that would suck. So I need to show your reservation of tickets before you can confirm your purchase. Or do I then invalidate that somehow?
Right. So if I wanted to build a theater ticket booking system at massive scale with lots and lots of theaters, you're saying I could do this with an event-based system, which feels good to me for the scaling part. And then what I got to do is set up a stream process that goes by theater, let's take a time slice of request. Call that a session. Try and resolve all that batch at once, and emit the, "You booked it. Sorry, you missed it."
Effectively, yes. So I actually built a demonstration of this on top of KSQL a while ago. But this was a slightly simplified one for just the actual just bookings, which is I have N tickets available. I then auto wait list by having a queue of people who are currently not there. So built up a KTable, series of KTables per event, and of each of the people. And then effectively had streams which would resolve in order of, these are my wait listed people. So if I cancel the ticket, I'd automatically resolve the next one into the list.
This works. But then the thing is like, as per always with streaming processing, these are when you have the nice polite abstractions. But when you start looking into proper state, which is where the seats are, then you need to start being clever again. Which is why Kstreams comes up really well for this. Take a slice, you resolve it, have a distributed state between things.
And you want to open up the power of a full programming language to resolve a complex state like that.
Yes. And this is always like my puzzle here is how much can we push into SQL land? Versus how much we actually have to really think about it? Well, the big best questions I got out of the talk was about this distributed state in things like KTables. I was using a Python tool called Faust, which is analogous to Kafka streams, but just a bit more friendly and quick for demo building, should we simply say?
But the key thing ends up being is things like global state in global KTables is very risky because you have multiple editors, multiple writers, and in general you shouldn't use it. Because let's say I have 40 or 50 venues, I can easily isolate each venue per local state in my stream or a local table, which is great. Because it means I can do my partitioning to shore up the load and everything makes sense. But what then happens is occasionally I will have to signal to another one, which is where the global KTables come in quite nicely.
Or I can simply say, I am full, I am running out of events, suggest someone else, something else. And this is where you have to build up quite a complicated stream topology to handle the edge cases. Because this is all great in the happy state of everyone doesn't want to sit center front, which they do, of course. This is the problem of hotspots on Cassandra tables or a famous one here is like how do you not have a hotspot is you use murmur hashing. But you can't murmur hash people on seats. But you can actually, and you do bizarrely enough, you suggest seats on a murmur hash and it works really well.
Hang on, what's a murmur hash?
So this is the idea of consistent hashing to distribute things. So the idea here is, let's say you have 20 servers and I have 100 million requests, how do I evenly distribute a 100 million requests? So murmur has takes a key, a high cardinality key and says this is what I'm going to use to assign these. Basically, it's basically a random number sort across a ring of things.
So what you do in let's say a seating problem is, I'm going to recommend you sit somewhere and I'm going to recommend you sit somewhere random, which is less likely be booked from somewhere else from my remaining seats. So if I suggest somewhere, a good percentage of people will not change it. So you have this kind of sarcastic method of basically putting people in random places to make sure they don't collide. The objective here is trying to reduce your resolution space.
It's all the same again. And this is part of why I thought it would be fun to talk about this at Current was because these are the same problems you end up in a game. Except the games, the two different things about games is if it goes wrong, it's less important. But also, the timeline is single-digit milliseconds, not minutes.
Right, yeah. We have higher stakes, a bit more time. These seem like proven techniques that we're not hearing about.
And this is the funny thing because when you start looking into what state resolutions actually happen in games. So Age of Empires from what, 1997?
That's a strategy, world-building game.
Yes. So think, multiple thousand agents walking around in a collaborative space on 28K modems. So this is the kind of level we're talking about here. And the idea is you just share a series of inputs and it modifies. And everyone has a local collected state. As long as we have concurrent, we have the same set of modifications as states would diverge. That's one way to do it.
But then, let's go further and faster. So I think it's 2001 Tribes 2 came out, and this is kind of the seminal piece of work in networking, in my opinion. But 128 players on dial-up to ISDN type modem. So slow end of broadband, think 256 kilobits per second. 128 players, live FPS. This is first-person shooter. There is no hiding anything. So this is one big world with that many people at a very high speed game at that.
Yeah. Bullet-level reaction to all of it.
Oh yes. And this is one where you can literally fly and jump at really high speeds. So you can traverse massive maps in single-digit seconds sort of thing. So this is really high speed, really high-fidelity accuracy. And no one's willing to tolerate things so you've got instant hit scan weapons. So there is no travel time for a bullet to fudge missing.
Yeah. It's a full on array cast to check. So you have [inaudible 00:26:42].
Real time, many times a second.
Geez. Okay, so seminal work. How did they [inaudible 00:26:48]?
So there's a paper on it. There's actually a paper they published. It's the tribes to networking paper is something I always recommend people read when they come into this field. The fun thing is if you actually look into how Kafka works, you start seeing the... There's a lot of similarities here because the idea here is I need to buffer stuff, pull it in chunks. And then do I pull in single-digit chunks, like my linger time being low, to try and keep latency low. Or do I keep my linger time slightly longer, have more throughput? And this is similar thinking again.
And they're tuning exactly the same kind of problem.
Exactly. Because fundamentally nothing is different, if you think about it. This is one of the things where if you look at EVE Online, EVE Online literally uses RabbitMQ behind the scenes for a lot of things. I think they could use some Kafka now, and I might actually poke some friends there to say, "Hey, have a look at this, it might be better."
EVE Online, giant space simulator, right?
Giant space simulator, multi-thousand users per game. And that's the kind of thing here, it's like, it's mostly a trading game and often referred to as spreadsheets in space.
Spreadsheets in space.
I know, but it's glorious because it's like... But it's so close to corporate software it's scary because you run corporations in space.
So it's actually disturbingly similar to what we do day in and day out. And this is where it gets interesting, it's like if you do have a decent size broker, which can do this proper logging thing, so you then collect all your results into one post to process. What does that free up? And what I found that, now as I said, using Kafka, I used to use Kafka for my logistics company and then when I started using it in games thinking, I originally used it for matchmaking. So this is like, I have 20,000 players who want to play against similar people.
Right. Yeah. So this is the thing where you join an online game, you need to compete against 50 people. But you don't want anyone so much better than you or so much worse than you that it's no fun.
Yes, exactly. So let's take the chess example to make it just two because it's a nice and easier number. So you have rankings in chess, and they go between 1400 as your start point and the highest people are about 2,800. And what happens is you don't want this difference to be too large. So you need to match someone with similar enough ranking that you have a good match and you go up and down afterwards. So what do you do for this? And the best way once again, is I put my join request into a Kafka topic. I then pull out a series of chunks, time sliced up again. And once I have someone close enough, I emit a new event to another topic, which then gets both of us to either accept or reject this parent.
Right. Okay. So, you're essentially streaming into a pool.
And as soon as something in that pool is worth matching, you throw it out and try and get a few more things into the pool?
Well I continually add to the pool. So this is why it has to be a stream process. The pool keeps growing. And the idea is, how the work is running on top of this to pull out changes. But then the idea is that if this pool is accepted, it's like an event source, I then send a, "I'm done." Change my state to match.
Oh okay. So they're constantly in the pool, but once they accept that proposal, another event comes in which takes them out of the pool?
Because you don't know how many people are going to accept or reject. You often add them to multiple pools and you oversubscribe these pools. So what happens...
Is to fail. So this is where it gets quite fun again is, there's a really good thing here is, why TCP networking exists as is. And how IP networking succeeded is because they built with failure in mind rather than building with success in mind. TCP only exists because it was like how do I deal with congestion? Well I'm just going to retry, sensibly. And that kind of mentality, you take it further, it's like why Kafka exists, which is I can't handle this stream at once so I'm just going to store it logically and keep going.
Separate out that storage part from the processing it later, and then you've got a much more friendly buffer.
And you have this massively resilient, surprisingly fast system with hilariously strong semantics. Which has always been powerful to me because I came from the, let's just build it from scratch in C and C++ world, right?
And every time I move to this stuff I'm like wait, what? You have all the toys. They work and it gets good fun because suddenly you can rely on this buffer to actually just give you what I want every time. I can rely on it not to die. I can rely on it to have multiple access points. And I can rely on it just to take whatever I give it from the other end. This is why torrenting stuff works really powerfully for this kind of stuff. But you can still type them if you want to.
And then you start pairing into this idea of expecting failure. So this is where the stream processor comes in again, is I have these multiple pools of I think you're going to match into this pool, but I'm going to add you as a lower priority match to the second pool. So if this one falls through, I can immediately match you into this one.
So multiple matches across different pools. And is that actually happening in real systems out in the world now?
Yes. So I can't tell you all the details behind the scenes of some of the people who use this because of once again, NDAs exist, which are sad and fun.
Oh, trade secret, huh?
But I can easily tell you how some of this stuff actually does work because in some respects this was stuff I invented and told people about. And the best thing here is, invented is a strong word of, saw how I thought something was working and invented something that worked along those lines.
Reverse engineered perhaps?
Not even that. Just like, how do I make this behavior happen almost? And then go for it. And for me it always came down to two things, which is the experience your users want is always what's going to determine how this stuff works. So I want to, let's say book a group thing. It's same way as booking. So I always use booking as the analogy because it crosses so closely. So I want to have this event which requires 10 people in the same location to do, maybe it's an escape room, maybe it's paid bowling, maybe it's a video game, doesn't really matter.
The key thing is that what I'll do is I'll be added to the group that fits me best. I'll be then a followup to the group that fits me in the next best in a kind of cascading down. Because when there's 10 slots, I'll probably try and find 15 people and I'll order them by likelihood of match, and then the least likely matches. Therefore, if two drop out I just move the next two up and say, "Hey, you've matched."
Because particularly on games, you've got people joining, to matchmake in a pool. They wait, some people will wait two minutes for a game. Some people get bored after five seconds and drop out.
This is constantly evolving pool.
And it gets worse when you start look at some of the bigger games. So when Amazon did the launch of New World, there were wait queues of 36 hours.
Yes. To actually join the game itself because there's a capped user population in the world itself. But that's like big stream outlier. A more common example is on a mobile game, let's say you're playing something like, one of the League of Legends games like Teamfight Tactics. What happens there is I'll join a queue, but I'm on a mobile, I might lose signal, so I might drop out. And therefore, I don't want the people in that world to lose out just because two of the people lost mobile signal at the wrong time. Or I might have gone up to get a cup of tea. Happens all the time. And you don't want to punish people by having timeouts happen behind the scenes.
Yeah. Yeah. It feels like that particularly is ripe for some business application. Because I don't think we quite have that extreme, or do we? Do we have quite that extreme dropout?
So think about something like, have you used Hopin back in the pandemic of course, I'm sure.
Yeah. So these virtual conferencing softwares, right? They often have a networking field where you can hit and say match me with someone to network with. So now I'm trying to pair N people who have some dimensional matches to say, "Hey let's talk." Right?
Oh, right. So this was just like a drop-in chat room for virtual conferences?
Exactly. So think chat-rouletted businesses. I'm a big fan of the absurd.
Things I try not to think about in business context, chat roulette.
Yes. But it's exactly that kind of thinking if you think about it. And as per always, that same logic applies. So a classic example would be dating apps because you have to match people with N-dimensionals. But let's flip it right back to real business context. So where I actually designed the algorithm I originally used for this stuff was logistics. How do I match the right number of people in a multi-step delivery system? So I've got to get something from Singapore to somewhere, let's say Toronto. And it has to be delivered within a certain time window and a certain cost bracket.
Therefore, I need to then book a multimodal multi-step journey. But I can't predict this until I can actually book various things. So what I'm going to do in fact is query 10 different options along my graph and say, what kind of prices can you give me to get to these various cross points, which are logical. So I basically path walk through asking these questions and then matchmake across the path walk as it goes along. To try and build up this probability space of these are some actual routes I can take to get to this place.
Okay. And then you've got the best available and then you've got to deal with some of the bookings finally failing in that last complex transaction?
Exactly. And then this is where you get that distributed transaction problem coming back. And this is why it's fascinating and why I think it really applies into the business world is, you can't guarantee a transaction will work. That's the first rule is resistance. The next rule is always that user experience is paramount. And this is where I think the gaming philosophy of, the one thing you can't do is confuse a user, has to come in.
Yeah, yeah. Because they've got brutal levels of instant customer feedback, right?
Absolutely. Because someone going on blast on Reddit can kill your game immediately. But let's say we've all suffered at the hands of various travel booking services. I won't bash anyone publicly today, but they're all varying levels of awful. You've done conference travel to feel this, I'm sure.
Yeah. Yeah. Yeah.
So imagine instead when you're booking it and that confirmation takes too long because of sadly your manager happened to just be away from their desk at the wrong moment. Imagine instead came up with a tier thing of these are the kind of things in the acceptable budgets, which are all within two hours, either way of transfer times, same day, costs going up and down by a few hundred pounds only. And you say, look, I'm going to try and reserve all of these in a line if someone can get back to approve the first one it'll go through. But I have a queue of somewhat good routes options for you.
Yeah, I can see that. I can see that working. I can see it being too confusing for the users.
The thing is you can hide it from them, that's the best thing.
You don't have to see it, that's the thing. All you have to see is you just see the top one, right? And you just display the current flight you're going to be booking is the 5:00 PM from Heathrow to Austin. And what happens is it just has a tick down on it and it just changes to the 6:30 PM to Austin, when it becomes irrelevant.
And this is why doing it as a queue, thinking it in terms of an actual event source rather than anything else actually makes sense. Because fundamentally it's an event. It's either going to happen or not, right? And this is why you create all these things and then you amalgamate some state and then you emit a series of probabilities in order.
I expected to get to the whole how gaming maps to event streams part of this talk. But I didn't expect that the idea of optimizing for the user and then hiding away some of the experience to absolutely optimize that, right?
And that bit actually is actually the really critical thing is, the problem is we're optimizing for a series of humans who are the best reconciling problems you'll ever match. I see text edit, I can deal with that, that's fine. I rub out my whiteboard every now and again to change something when we're collaborating, that's not a problem. We're actually okay with a lot of this stuff being shown to us.
But what we're not okay with, and is having too many choices or too many versions of the truth being presented to us. So in games famously, when we're working on Sea of Thieves was, let's say there are eight of you in this pirate sea battle going by. There are not two ships, there are in fact 16 ships in the server. Because every single person has their own view of time and the server will maintain all of them, right? And there's only one, which is true, which is the service version. And everyone else is it's just an impression of the truth, which happens to be locally consistent to their own mental model, which is, it's what you see.
Right. Yeah. So one thing I've always wondered is how do you then nudge the user's experience back on track? If I'm at home playing a game and I think the ship's there, but the server says it's actually over here, what's the server doing to gradually nudge me back to the truth in real time?
It quite literally is. So what happens is it's like, this is where the cleverness really comes in. So if you look at the algorithms here, this thing called client-side prediction, which is that you move and it moves you locally immediately because it's likely to happen. We assume that's going to be true. But the server will effectively invalidate your previous stuff or drag you to where it's supposed to be by just editing every single step you take slightly, going forward. So you might think you're running forwards and you end up slightly left of where you started. Well you think you should be, even though you're just holding down the forward key, is because the server has actually said, actually you weren't there, so we're going to just nudge you to the right place. It's like pay no attention, [inaudible 00:40:48].
The invisible hand of God.
Exactly. And glitches in the matrix per se, are a common thing. Except we're a lot better at concealing them than the black cat happening twice.
I didn't realize that was going on.
Continuously at full on... The fun thing here is how fast it really happens. So in tournament games you're looking at two millisecond loops in the server side. Literally, whole server state, two milliseconds.
So it's constantly nudging you in a way you couldn't possibly detect?
That's the idea. That's the dream state. There's some really good talks by Riot and by the guys behind Overwatch about this exact thing, how you predict stuff. So let's say I fire my weapon because I have an outdate ammo count and the server says actually you don't have that rocket to launch, therefore I'll predict it. But then I'll rewind my local state and just go, sorry not there. And how you handle that is local... There's a lot of cleverness you can do locally to make that not appear. Like you have a really long fire animation, you go whoop, bang. And what happens is inside the whoop bit you can check back forwards and then go click, click, click.
So everything is fake, is the short version.
They're actually pulling all these tricks to give themselves a bit more network bandwidth.
Absolutely, 100% everything is a trick. So animations, fancy animations are my favorite, favorite way to hide latency. Because I'll just, on a run, my character's running. So I'll have this lovely kind of pose stuff thing happen. Then it'll be like, sorry, out of stamina. It goes oof. I just go, okay, it just kind of flops. And you can [inaudible 00:42:28] this.
I suppose you have a parallel like that in general webby business programming. Like, you click a button and if you make it pulse, it feels like there's less of a delay to the response, right?
Whereas if you click it and nothing happens, it's instantly frustrating. Did I click it? Did something actually happen?
And this is why you have to give a visual cue when you click something, right? And the best one here is loading screens. So loading screens is just everywhere. But the first and biggest trick of loading screens is you have a minimum loading screen time, which is one to two seconds. You make all loading screens last a minimum of one to two seconds because then there's never a differential between the instant and the five-second one.
Oh, you are sneaky.
Of course. It's all about consistent experience beats fast experience every time. So it's about normalizing to a consistent, which is why once again, things like Kafka are really powerful because you can normalize away from the idea of saying a slow processor or a fast processor to a series of stream processors, which are just doing their job as fast as possible. Once again, batch sucks when the batch is huge, it takes three days. Versus the batch is really cool, which it's just done now, versus the stream, which is always going to be a little bit latent, but you're going to get the results as you need them.
Yeah, yeah. I think that's the first time at Streaming Audio we've talked about deliberately slowing things down.
It's actually far more important than you think because when you roll that kind of thinking back, it comes back to this, do I pull bigger or smaller batches? Do I put my linger time to a reasonable number so I'm not exhausting my processor? Because I want to pull chunks so I'm always actually optimally utilizing my bandwidth.
I think we've got time for one more trick. Do you have one more trick up your sleeve you can teach us from the gaming world?
So the fun one here is things like, so one of the very interesting innovations is when things go back and forwards over the line. I didn't go into this in much detail in my talk, which is why it's quite fun to talk about now, which is this column modeling stuff.
So the idea of modeling engagement is a column store rather than a row store. So if you think about this, innovations like ClickHouse, Pinot, Druid and all these things. And roll back a bit like HBase and such, and Cassandra are all these column stores. And they generate this massive velocity improvement by doing the ability to only select on columns, therefore being narrow. Games have only recently picked this up. By recently, once again, it's 10, 15 years, but it takes a while to become mainstream. And what happens there is, so ECS system, the entity components system kind of design philosophy in games means that you have the ability to rapidly iterate stuff to make it nice and actually clean to iterate through.
But what's fun is why do we do it? In games, it mostly comes down to cache coherency, which is the idea of if you do the same thing over and over again, it's really fast. Now, this is not just data cache, which is the first one. Everyone thinks data cache, so process a long row, it's very nice. You can use CMD optimizations and pull forwards at a time. But what's actually even more important more often than not is iCache optimizations. Which is the instruction cache on your CPU. Which is if you do the same thing in a row, you're not going to have a branch prediction fail, you have to reload your data because you're doing something else. Or you're going to contact switch and do something else instead, so if I do all the same calculations in a row.
So this is where particularly, or what I like talking about with Kafka is, you apply that logic back to Kafka, you think, what are my partitions doing? So this is where you want to normalize and have homogeneous partitions. If you have bigger files and smaller files, put the smaller files in one partition, the bigger ones in another. Because the thing there is, now what happens is you actually normalized your pulling strategy because you now know you can actually optimize for the partition you're on. Let's say you're pulling small records, which are single 2Ks, pulling 50 at a time might be fine. But when they get up to hundreds of K or megabytes, pulling 100 at a time is going to crunch something. And this is where we want to optimize for the system you have, therefore you normalize the data per partition.
So what happens then is you get the ability to know that if you have a lagging consumer on a partition, you now know why it's happening rather than going different big, different small, big, small, big. Because you might just pull a big chunk of big records and therefore that consumer starts lagging. Instead, you have now got consistent performance in each of your consumers.
Right. Okay. I've always thought about just chucking data into the partitions as the partitioner decides, and letting that be something I don't think about.
If you actually do this...
You're proposing grooming the partition management strategy.
Yes. I've seen this be like a five to 10x improvements performance.
Yeah, it's insane because what happens here is linger time makes a huge difference of throughput, and all this how many records you pull. But if you don't know how big the records are going to be, think about what's happening on the network. The network is the most expensive step of any of these transactions by about three orders of magnitude. It's kind of fun how expensive it really is. So what happens here is let's say you're expecting to pull five records and the records are now 10 megabytes rather than five kilobytes. That's going to completely crunch that latent loop, and you're much more likely to have a problem, right?
The network drops in and out, retries, head of line blocking, it's going to get worse. It's all TCP. So instead, if you know it's going to be big records, you only pull one at a time, therefore you're far less likely to have a failure. The failures are smaller. But also we now know that that particular consumer is going to go at the speed it's supposed to go at, right? And we're not going to be like, we've got a lagging consumer here, what's going on? It was going fine and then it dropped, the performance. Nah, it's always processed in the same sort of records. So we now know what it should do. And now you're not actually [inaudible 00:48:19].
You're pushing for consistent experience over the fastest possible experience.
But you actually get the fastest possible experience is where it's fun, is because now the small records are always processed really fast, right?
Right. Yeah. Yeah.
And so what you have is predictable and fast and you can basically now choose rather than just get what you get.
Yeah. So can you think of a real-world place where you see this kind of massive disparity?
So I literally gave this advice to a government outfit recently. Randomly we met up at QCon in London a few months back because this is at the home office. They were having some problems, this exact thing. And it was just like they have records that go wildly in size. So let's say you're processing, well even if you just roll it back to the more usual business stuff, you have small records which are events, like do this thing, do that thing. And then you have bigger ones, which are the data itself or a chunk of data from one system is much bigger than the chunk of data to the other. So what happens there is if you have these big chunks, in general it's not, the immediacy is going to be lower because it's chunks. It's really just a batch, but at real time. So we are not going to worry about those so much.
So instead anything which is high velocity now goes through all the high velocity lines, which is the, oh crap, something's gone wrong. Or this is really cool, do this now. Now you have the low latency lines, the high latency lines in the same topic. So fraud detection event goes through high speed, single transaction. But I now need to do a historic check to see if someone's going over their credit limit, less important. So I'm going to throw that through in a big chunk because I need more transactions to deal with that.
Right. And you do that on a partition basis rather than on partitioning by topic?
I would do both. So it depends what level of granularity you have. Sometimes you can't create additional topics because a topic is actually really about a logical grouping of things. Like this is credit processing inside Europe maybe. Or in another case it might be, this is, I don't know, license records for this type of insurer. And it depends what grouping you have to have at the topic level. The trick here is even if you do this per topic, that's great, but then even sub-topic, you want to have homogeneous things per partition if you can. Because it works at any level is the best thing. It just depends what granularity you're allowed.
Right. Okay. Crikey, I feel like I've got a whole raft of new strategies from this conversation I need to go away and think about.
The best thing is, all these strategies come from an application of two or three very core ideas. Which is it's nearly as cheaper to do the same thing again that you've just done. The network is really, really expensive, so try and work out what you actually care about from the network. And latency only matters once you work out what the acceptable latency is, and then it's about predictability from that point onwards.
Right. I like that principle as a headline.
Yeah, it's about predictability.
That feels like the least predictable one. Latency isn't just bad, you've got to measure against it.
Exactly. You got to work out what's acceptable because there is always going to latency. There's this lie we often tell ourselves, which is we don't live in a distributed world. Where all our browsers are right now, which are distributed system by default, because it's the server somewhere. So nothing is not a distributed system anymore. So latency is real. We've got to accept that.
And we all work remotely.
Our lives are distributed systems now.
And so there's always going to be latency, even if it's just me thinking and then you responding and thinking as well. So latency is inherent in the system. So it's just about working out what is an acceptable amount and how to make it predictable.
I think I'm going to go away and apply this to my real life. I'm going to start replying to emails deliberately a bit more slowly so that my average is more consistent. And we'll take it from there.
So remember, Gmail has a function for that as well.
Yeah. So email delay is actually a thing for this exact reason. I'm not kidding. So does Slack.
I didn't know that. Okay. I'm going to go away and research that. Ben, even more than I thought before we started recording this, you've got a very populated and scatter-gun mind. Thank you for giving us a glimpse into it.
Well, thank you so much for having me. It's been wonderful to chat.
Okay, bye for now.
Thank you, Ben.
Okay. That one was pretty packed even by usual standards, so I'm going to think, I'm going to try and summarize my takeaways from it. Latency: Ben saying, decide on the latency that you're aiming for upfront and then optimize around that figure. Which feels like an extension of the usual advice to measure performance first before you try and optimize it.
Slicing: slicing was a big one. If you want to deal with potentially conflicting events, then slice up time into discrete chunks. Process each chunk on its own and then emit the results of that decision as downstream events. I think I'm going to call that technique nano batching because it's kind of really, really tiny batches you decide over.
As an extension of that idea, we talked about pooling, which is like when you want to do things like matchmaking or collaboration for open requests, like matching trades comes to mind. Then you maintain a pool of open requests coming in from an event stream, emit all the potential matches, and then let the acceptance of those potential matches feed back into that first event stream. So they eventually get removed from the pool once they're confirmed.
I immediately want to go and play with some of these ideas. I've got some ideas for games I could make with Kafka. But I'll leave that for now and just say, those are my takeaways. What are yours? Leave us a comment, drop us a line. There's a comment box below for things you want to tell us. And if there isn't, then you will find my contact details in the show notes, of course.
As ever, Streaming Audio is brought to you by Confluent Developer, which is our site to teach you all about event streaming. We have a lovely section of patterns of different architectural ideas you can use for building event streaming systems. And I think we might have to write up a couple of new ones and add Ben's ideas to that repository. You'll also find courses, tutorials, blog posts. Everything we know about Kafka is there for the taking for free. So check it out at developer.confluent.io. And with that, it remains for me to thank Ben Gamble for joining us and you for listening. I've been your host, Kris Jenkins, and I will catch you next time.
What can online gaming teach us about making large-scale event management more collaborative in real-time? Ben Gamble (Developer Relations Manager, Aiven) has come to the world of real-time event streaming from an usual source: the video games industry. And if you stop to think about it, modern online games are complex, distributed real-time data systems with decades of innovative techniques to teach us.
In this episode, Ben talks with Kris about integrating gaming concepts with Apache Kafka®. Using Kafka’s state management stream processing, Ben has built systems that can handle real-time event processing at a massive scale, including interesting approaches to conflict resolution and collaboration.
Building latency into a system is one way to mask data processing time. Ben says that you can efficiently hide latency issues and prioritize performance improvements by setting an initial target and then optimizing from there. If you measure before optimizing, you can add an extra layer to manage user expectations better. Tricks like adding a visual progress bar give the appearance of progress but actually hide latency and improve the overall user experience.
To effectively handle challenging activities, like resolving conflicts and atomic edits, Ben suggests “slicing” (or nano batching) to break down tasks into small, related chunks. Slicing allows each task to be evaluated separately, thus producing timely outcomes that resolve potential background conflicts without the user knowing.
Ben also explains how he uses pooling to make collaboration seamless. Pooling is a process that links open requests with potential matches. Similar to booking seats on an airplane, seats are assigned when requests are made. As these types of connections are handled through a Kafka event stream, the initial open requests are eventually fulfilled when seats become available.
According to Ben, real-world tools that facilitate collaboration (such as Google Docs and Slack) work similarly. Just like multi-player gaming systems, multiple users can comment or chat in real-time and users perceive instant responses because of the techniques ported over from the gaming world.
As Ben sees it, the proliferation of these types of concepts across disciplines will also benefit a more significant number of collaborative systems. Despite being long established for gamers, these patterns can be implemented in more business applications to improve the user experience significantly.
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.Email Us