In this week's podcast, we are talking to a streaming data expert from Thoughtworks. You might know him, it's Simon Aubury and he has recently won the Confluent Hackathon with a really great project. It's a fun use of machine learning and some field hardware Apache Kafka. He has been using Google's TensorFlow to do some AI stuff to track the animals of the Australian Outback, technically the suburbs of Sydney, but it's outback enough for me.
I thought we'd get him on because I was on the judging panel for the Hackathon and this project really hit that sweet spot between doing something fun and cool and playful. But then you step back for a second and you can easily see some real world news cases that he's addressing with really just a few hundred lines of code. It's really great and we're going to link to the project in the show notes. But before we get started, this podcast is brought to you by Confluent Developer, which is our education site for Kafka. We get more about that at the end, but for now I'm your host, Kris Jenkins. This is Streaming Audio. Let's get into it.
Joining me today is Simon Aubury. Simon, welcome to the show.
Hi Kris. It's an absolute pleasure to join you today. Thanks for having me on as a guest.
It's great to have you. Last time we met, we were in person. I was on the other side of the planet with you in Sydney. We'll have to make due with a wired link this time.
Yes. But I do appreciate that you made the long voyage out to Australia. It was so good to see you in person.
It's always great to meet people in person and I used to live in Sydney, it was nice to be back, if only for a while.
I'm sure it's changed a bit. Anyway, it's definitely great to be able to join you on Streaming Audio today.
It's a pleasure. Let's say in Sydney you work for Thoughtworks as a principle data engineer. What is a principle data engineer?
I think I'd describe my job as doing cool things with data, if I was to explain.
Isn't that the whole of computer science?
Absolutely. But I think if I was trying and do this as an elevator pitch, I'd describe it as doing interesting things with highly available distributed data systems. And that typically gets me in the door with clients and they might be in the worlds of finance or transport or healthcare or insurance. But it's definitely an interesting overlap with interesting business problems and the ability, the opportunity to play with really cool technology such as Apache Kafka. And I also love to mix it up and play with some concepts that come out of data mesh and the concepts of data mesh and data streaming platform seems to be a nice sort of overlap. It's an interesting place to play.
It sounds like a sweet gig to be honest.
Yeah, for sure.
The idea of play is why you're joining us this week because we recently held a Hackathon and you are our glorious crowned winner.
It was a fabulous opportunity to have some of the typical constraints of day to day project life removed and start thinking about, left unshackled, what cool things could you build? And I think the challenge out of the Confluent Hackathon, could be boiled down to do something cool and something that interests you with Apache Kafka and it's great to have the opportunity to play with cool tech.
Always. Tell people what you built and then we're going to go through how you built it.
Absolutely. I think the short summary would be this is a wildlife monitoring system and as any good Hackathon would be mixing technology with a real world problem and I essentially wanted to solve for how could you identify animals in the world, count them and work out where they were and maybe draw some insights around population trends and animal movements. And it's particularly good to be able to use something like Apache Kafka, streaming technologies and an excuse to use a Raspberry Pi.
It's all you need for a Hackathon project.
Absolutely.
And you live in the best place in the world for wildlife tracking because doesn't Australia have more things that can kill you than any other country?
That might be quite a stretch, but it was definitely a bit of an advantage of having an Australian back garden. When I literally put a Raspberry Pi in the back garden with a camera to see what animals could wander across, there really were local birds, local wildlife, and some stray cats and dogs that happened to wander past the camera. I did have the advantage of having a large collection of wildlife in our back garden to play with.
Diverse flora and fauna.
Absolutely.
Of APAC. Yes, absolutely. Actually, you've got a Raspberry Pi with a camera in the garden and you are remote... You are automatically tracking for wildlife and population statistics?
Yes. Maybe paint a bit of a broader picture here, Raspberry Pi is a low cost computing device. You probably don't want to put your expensive laptop in your garden, but maybe you are having to put a $20, $50 low cost microcomputer in your garden. And they're quite amazing computing platforms. You can actually deploy essentially a Linux computer to your back garden with a little camera. And it was a nice project to be able to put a detection model on a Raspberry Pi and do some edge processing and actually demonstrate what it looks like when you got an edge device and you actually want to connect it up to Kafka in the cloud.
Yeah, the thing I like about the Pi is, while we weren't looking, it snuck up and became a serious high power computing thing that you can just hide under a rock.
Absolutely. And I think the genesis of this project was actually from my day job where essentially we were using a number of edge devices to do image recognition in the field. And some of this technology is quite proprietary, but you can simulate a lot of that proprietary technology with quite accessible systems such as Raspberry Pi's. It's great to be able to have that capability as a hobbyist and demonstrate some of these same capabilities.
The computer aided vision stuff, I am completely out of date on. Where do you start with that?
Like many a good project, I think Google is your friend. Other search providers are available, but I should probably stress, I would not consider myself a data scientist or a research engineer, but I can find some established Python packages which do the heavy lifting when it comes to things like object detection. And for this project I ended up using a framework called TensorFlow.
I know that one.
Again, it's a wonderful framework. It's quite open source, it gives you a lot of capabilities, but one of the nice things is it's got a whole community around it, including model detection zoos. Not only do you get the advantage of a well opinionated framework for things like object detection, you can actually build on the shoulders of giants to find an object detection library for your own needs such as identifying animals in your backyard.
Did you download a specific package for detecting wildlife as opposed to Lego bricks?
Exactly.
Cool.
And quite nicely, the TensorFlow area where you go and start looking for these things is called a model zoo. You go and find an appropriate zoo for identifying, in my case backyard animals, but you can also find zoos for solving all sorts of things. And again, one of the advantages of something well opinionated framework is even if the object library isn't completely tailored for your use case, you can actually do what's known as transfer learning to take a model that might be optimized for one use case and tweak it and train it and get it to the start identifying Lego bricks or native flora and fauna.
Just so I'm understanding this, essentially they've pre-trained the neural net on a data set that's relevant to you and then you just download the pre-trained net?
Correct.
That's what we're talking about? Okay.
Exactly. Yeah. And this is quite accessible and for a deployment such as to Raspberry Pi, having rapid libraries in Python makes it quite accessible. Actually for this project, I think to be completely accurate, I used a cut down version of TensorFlow, called TensorFlow Lite, which is a slim down model evaluation library, which is optimized for battery powered devices that you actually want to deploy into a back garden or a field somewhere.
It's a power thing rather than you're just trying to save money.
Exactly. But when I want to multiply my real estate, I've now got the capability of deploying hundreds of these to hundreds of back gardens.
All you need is to become a property tycoon and you're away.
I know. But I've got the dream and the vision. It's just execution between here and there.
If anyone would like to fund Simon, please contact us directly at the podcast.
Yes.
There's a lot to break down here, but what does it actually look like to program? Because this is something I would like to do at some point. What's the actual Python? I know it's hard on a podcast, but give us an idea of what the coding actually looks like for that.
In short, I'd encourage anyone who's even mildly interested to either look at the project library for the Hackathon for my code base or look for some of the TensorFlow libraries. They're quite easy to read. But essentially you've got... For an object detection routine, you've got a big while loop. You open up, a feed to a camera using a Python library. Typically, OpenCV gives you a nice connection to an onboard camera. And for each frame that you bring down, you do a bit of object masking, crop it and to reduce the pixel density. And then for each frame you apply an inference function, which is an evaluation of the frame that you've got against the model. And then you'll get a Pyhton dictionary back describing objects that it's found in each frame. And then you can keep going through that loop again and again and again.
And is this probabilistic results? There is a 50% chance there's a cat in this frame.
That's correct. You can essentially ask the inference library to look out for maybe cats and dogs and teddy bears and then it will give you essentially a dictionary back saying there was two cats and the probability was 50% or 80%. With inference functions you never get a 100% or a 0%, to your point earlier, everything is a probability and you apply a threshold against it.
I'm sure Schrodinger wishes he had access to this.
Yes. I'm sure there's a pun there, something about probability and cats.
Got to be. We'll keep researching on that. It's per frame then, it's not tracking changes in the frame over time.
That's correct. Again, I'm emphasizing here that I'm very much a beginner here. This is very much the naive implementation of treating each frame independently from one another. Every frame is identified with no relation to frames either before or after it, which did lead on to some almost secondary problems because it was such a naive implementation that I ended up with a stream of events that required a level of massaging to actually identify what it meant when you are interpolating every image frame by frame.
Let's get into that then. How did you choose which parts to put on the Pi and then when it was time to split everything out, what's your larger architecture?
Yes. I think my mental process here was to do the simplest path and my ultimate goal here was to count animals and do trend analysis. Essentially, the job of the edge device, the Raspberry Pi in this case, was really, really simple. It had to identify things in each frame, come up with a payload and send the payload onwards. And my mental process here is to make sure that I'm playing to the strengths of the Raspberry Pi, which is it's in the field, it can look at the images one by one, but apart from that it need do nothing more than send out a payload of what it's seen in each frame. And all of the processing is best done by, I don't know, can I just say a stream processing framework would be a good place to do that transformation. The separation of architecture here is image processing at the edge, get a payload and send the payload, which is objects identified in each frame and send that off to a Kafka broker for some subsequent processing.
Fair enough. Take us through the subsequent processing. You've got this fairly frequent stream of that dictionary saying what animals there and the probability. Do you also include the image itself that it was detected in?
No. I'm not actually sending the image. I'm just sending an identifier describing where the image came from. Because I did actually want to start building for my eventual goal of having hundreds of back gardens. Every camera has a header payload describing where the image came from. But you're perfectly correct. The payload itself is just a... Call it a nested bit of JSON, which describes the object seen and the count of each object. If there were three zebras and two cats, that's what is sent in that record payload.
For those zebras that wander through the Australian outback.
It's funny though you should say that because I wasn't joking when I said I actually had multiple end points. I did have a Raspberry Pi initially and then souped it up a little bit by having a web feed from a number of local zoos and was running the same model. It was actually quite important.
I take back my sarcastic comment. You did have zebras.
Yes.
Brilliant.
But just to emphasize the point, they weren't my zebras, they were someone else's zebras. I should clarify the point around ownership of zebras. If you're not thoroughly confused by now, the point here is that there's edge devices, be them Raspberry Pi's sending out payloads, there's another Kafka producer, which is streaming observation events from webcams and they're all ending up in exactly the same Kafka broker with a bit of a payload that describes where the image came from and that JSON which is a dictionary of things identified in each of those image frames.
And is it all one big topic?
That's correct. Everything's landing in one enormous topic as a landing area. And then the idea is to do a level of stream processing over and above it, which is both to sort out the trends of animal movements. If you can imagine that across several frames, you might have identified two cats in one frame and then one cat and then another, third frame, two cats again, you can infer that you actually had two cats and maybe you missed counting at the second cat in that middle frame. That's why it made sense for not having that responsibility, the edge device, but maybe moving that processing to a stream processing layer.
You got like a smoothing function over it?
Exactly.
How's that actually coded?
I chose to use ksqlDB and coded it as a windowing function, which, again, in the context of a Hackathon, is a quick way of achieving an outcome.
Absolutely.
One of the nice things about ksqlDB is you can think through several iterations quite quickly and think, "Actually what do I want to do? I want to smooth over maybe a trailing 30 seconds and find the highest number of identified animals per class." And that's a nice construct in ksqlDB to achieve that outcome.
Okay. You're putting that into a result topic. Is it just raw input topic and smoothed over result topic? Is it just those two?
Correct. There was actually... To make my life easier, I ended up exploding the JSON payload. If you can imagine, as far as the payload is concerned, it's a dict structure. You might have cats, you might have elephants, you might have koalas. And I found it actually easier within the case KSQL itself to extract the count of each animal class and pivot it out to be a wide table. And it made some of the late processing a little bit easier by exploding that JSON pivoting it out. It was easier to visualize later.
Okay. Do you mean you've got almost a separate column for the interesting animals?
That's correct.
Or you've got one row per kind of animal?
No, in my case there was literally a column called the Cat Column, the Dog Column and the Zebra Column. And it was a quick way to speed the way for some of the visualization and notification systems later was a habit.
But you're going to have to extend it to include Tasmanian devils.
That's right. One of my short term design decisions might have a long term maintenance effort.
It's a Hackathon project. You're allowed to have future work in the notes. What was your onward processing from there?
Although it's interesting to think about this stream processing happening in Kafka, when you want to demonstrate change of populations or notify on interesting outcomes, you obviously want to send the data somewhere so you can choose to see it and draw some insights from it. I chose to use a Kibana Dashboard. And using Kafka Connect, it's a straightforward task to use an Elastic sync connector, send my newly created stream into an Elastic index, and then put a Kibana Dashboard on top of it.
And that gives me the advantage about being able to build something quite nice, which is constantly updating graphs and trends and doing all of that visualization in something like Kibana. And Kibana's quite nice for doing those real time and constantly updating graphs and pie charts and trend analysis. And the path from a Kafka broker into Elastic is being able to light up some specific Kafka Connect sync plugins. And again, it's nice to think this as a problem and have the implementation being some quick bits of configuration. It's solving some of that integration headache with some relatively straightforward configuration.
How much code did you end up with to put this all together?
I think this whole project from beginning to end was probably less than 200 lines of code. It's probably about a hundred lines of Python and some rest payloads to go and configure the indexes and the configuration for Kafka Connect. And maybe there's... I don't know, I'm going to say 60 to 80 lines of KSQL for producing the transformations. I'm glad I'm not being paid per line of code.
Is there anyone that still does that? I hope that died out in our industry a long time ago.
If you want to think about how can you build on the shoulders of giants and stitch together some really cool and existing infrastructure and frameworks, I think it's neat to be able to talk about lighting up these integration patterns and these stream processing patterns with configuration based outcomes and in lines of code, it's a handful.
That seems incredibly low for something that would be that reusable. I was speaking to someone recently who's doing the same area, tracking for footfall in front of shops and through shopping malls. I can't see any reason why what you've done couldn't be repurposed into that space.
A hundred percent. I think one of the initial ideas was actually to do a count of the queue length outside my local coffee shop, outside my office. And it's exactly the same process. You want to take a stream, do some object identification, and you actually don't care about the individual pixels. You actually care about the number of people in the queue because you actually care about how quickly you're going to get your coffee on a Monday morning.
That would actually be really useful to the business presumably as well. When are the peak hours? To a degree they'll know. It's one of those things that people in the shop will always know, but the management are probably completely unaware of when those spikes come in.
Or you could think about it, you know it from the transactions that you get on the register, but you don't know how many people got annoyed because the queue was more than five people deep and they wandered off to another coffee shop. You can actually think about some real world business implications of doing that observational analysis.
The drop data stuff that we end up talking about a lot in event streaming, if you capture the raw data, keep it permanently, you get to see not just the successes but also the failures.
Absolutely. And sometimes if you are thinking about just one problem, you can count the things that are in front of you, but sometimes the real opportunities are the things that are ancillary or the things that you didn't count.
Especially if your competitors aren't doing it.
Absolutely. We want to be a little bit better or solving for the important task of serving coffee on a Monday morning, that could be the thing that you want to measure for and optimize for.
Especially worth optimizing. How long did you leave this running for and did you learn anything about the Australia wildlife in your area?
I think I left the initial Raspberry Pi running in my back garden for... It was over a week, I did actually get a number of interesting encounters in the back garden. And then when I was of running some of the zoo observation ones again that was running for over a week or so to prove the viability. It was quite interesting. I was expecting this stuff to work for a few minutes and then fall over. But a lot of these frameworks are extremely robust and they keep going.
That's impressive. And are they reliable? Did you ever get, "I've detected a zebra in my back garden," which seems unlikely?
It was actually quite the disappointment to realize there were no zebras in my back garden. But I did get what I thought was a very exciting event. I got a rabbit event, but it was a misclassified cat, which was highly disappointing.
Perhaps a long-eared cat.
Absolutely. We live in hope that one day we will find an exotic animal in the back garden.
If you can't find one in Australia, you're not trying. Maybe the camera resolution can't detect all those interesting spiders you have.
Yes. But that does remind me, I think one of the one more things that I wanted to demonstrate in this project was a notification of unexpected events. To demonstrate you've got the happy path of being able to build pretty population graphs on things that you were expecting, I put in a second alerting condition for the arrival of... And I chose to do it, for the arrival of a teddy bear. If a teddy bear walked in front of the camera, it actually sent a special alert to my phone. I'd got instantaneous push notifications if a teddy bear appeared in the back garden.
Do you have kids walking them around?
I do. It was actually a neighbor who walked across with her teddy bear that actually triggered this. It's very exciting to have the real world appearance of a teddy bear in the back garden.
A surprise teddy bear. I suppose you could use this stuff for home security, but I think right now I prefer your use case.
Yes. But I'm hoping that these ideas can go in all sorts of directions, feed people's imagination. It was definitely good fun for a Hackathon, but I think the real takeaway is it's really interesting what you can do with a combination of low cost devices, edge processing, existing frameworks and then being able to of integrate it all and stitched all together.
And this is one reason why Hackathons are great, because you get to play and then when you step back you realize exactly the same structure has some real world use.
A hundred percent. And it's good to be able to have a bit of free form creativity and play with some of these things.
Absolutely. Congratulations on winning the Hackathon.
Thank you very much, Kris. It was good fun putting it together and I definitely got the opportunity to play and learn on the way. It was really great to have the opportunity to play with some new technology and some familiar technology and put it all together to both have some fun and demonstrate an outcome.
And presumably at some point someone at Thoughtworks will want some. One of your clients at Thoughtworks will want this and you'll jump up as the expert.
It's always good to have a demonstrator, a bit of a thought starter. Stream processing, you can do it for banks, you can do it for vehicles, and you can also do it for zebra and teddy bear identification.
Motto to live by. Cool. Thank you very much coming to talk to us about it. We'll put a link to the source code in the show notes. It's on GitHub?
Absolutely. Kris, it's been so good to have the opportunity to talk to you today.
Simon, really great to talk to you. Cheers.
Thanks very much.
I've just realized I didn't ask Simon if he's detected the great Australian bunyip yet. That is an oversight. I'll ask him next time. If you want to see more of Simon's project, check out the show notes. There's a link to the GitHub repo for the source code, and I believe he's writing a blog post, which should be linked to there as well, by the time you hear this. The blog post I think is going to be on Confluent Developer. That's a good place to look for that and for a wealth of other resources for learning about event streaming and Apache Kafka with Python, Go, Java and loads more, check it out at developer.confluent.io and to make the most of all that information, you will need a Kafka cluster. Try spinning one up at confluent.cloud, which is our Kafka cloud service. You can sign up and have Kafka running reliably in minutes, and if you add the code PODCAST100 to your account, you'll get some extra free credit to run with.
Meanwhile, if you've enjoyed this episode, then do click like and subscribe and the rating buttons and all those good things. It helps us to know what you'd like to hear more of. And it also helps like-minded people to find us. I think it's a good thing. As always, my Twitter handle's in the show notes if you want to get in touch with me directly or if you have an idea to be a guest on a future podcast. And with that, it remains for me to thank Simon Aubury for lining up the timezones and joining us. And thank you for listening. I've been your host, Kris Jenkins, and I will catch you next time.
Processing real-time event streams enables countless use cases big and small. With a day job designing and building highly available distributed data systems, Simon Aubury (Principal Data Engineer, Thoughtworks) believes stream-processing thinking can be applied to any stream of events.
In this episode, Simon shares his Confluent Hackathon ’22 winning project—a wildlife monitoring system to observe population trends over time using a Raspberry Pi, along with Apache Kafka®, Kafka Connect, ksqlDB, TensorFlow Lite, and Kibana. He used the system to count animals in his Australian backyard and perform trend analysis on the results. Simon also shares ideas on how you can use these same technologies to help with other real-world challenges.
Open-source, object detection models for TensorFlow, which appropriately are collected into "model zoos," meant that Simon didn't have to provide his own object identification as part of the project, which would have made it untenable. Instead, he was able to utilize the open-source models, which are essentially neural nets pretrained on relevant data sets—in his case, backyard animals.
Simon's system, which consists of around 200 lines of code, employs a Kafka producer running a while loop, which connects to a camera feed using a Python library. For each frame brought down, object masking is applied in order to crop and reduce pixel density, and then the frame is compared to the models mentioned above. A Python dictionary containing probable found objects is sent to a Kafka broker for processing; the images themselves aren't sent. (Note that Simon's system is also capable of alerting if a specific, rare animal is detected.)
On the broker, Simon uses ksqlDB and windowing to smooth the data in case the frames were inconsistent for some reason (it may look back over thirty seconds, for example, and find the highest number of animals per type). Finally, the data is sent to a Kibana dashboard for analysis, through a Kafka Connect sink connector.
Simon’s system is an extremely low-cost system that can simulate the behaviors of more expensive, proprietary systems. And the concepts can easily be applied to many other use cases. For example, you could use it to estimate traffic at a shopping mall to gauge optimal opening hours, or you could use it to monitor the queue at a coffee shop, counting both queued patrons as well as impatient patrons who decide to leave because the queue is too long.
EPISODE LINKS
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.
Email Us