This week on Streaming Audio, we are talking about one of my favorite things to do with computers, where you just take an idea and you run with it. Pure creativity, stuff where there's no standup, there's no backlog, there's no project manager. There's just one programmer saying, "What can I build? How can I solve my pet problem? And what can I learn along the way?" On today, Danica Fine is going to take us through one of her projects. She's been putting together hardware and software and sensors and wires, and a couple of cloud services for a very interesting take on home automation.
Before we dive into that, let me tell you that the Streaming Audio podcast is brought to you by Confluent Developer, which is our site that teaches you everything we know about Kafka. How to learn it, how to use it, how to grow with it. Check it out at developer.confluent.io. And while you're learning, you can easily get a Kafka instance running, using Confluent cloud. Sign up with the code PODCAST100, and we'll give you $100 of extra free credit to get you started. And with that, I'm your host, Kris Jenkins. This is Streaming Audio. Let's get into it. (Silence)
My guest today is Danica Fine, who is a developer advocate at Confluent. In fact, I know well because she's on the same team as I am. She is a former Kafka developer and wrangler for Bloomberg. And you may have seen her as the face of a few release notes recently on the Confluent YouTube channel. Danica, thanks for joining us.
Yeah, thanks for having me. I'm really excited to be here again on Streaming Audio.
Yes, yes. As long as you say that the new host is better than the old host or at least has better hair then we can work with that.
No comment.
That's wise. That's wise.
Ask me afterward, but yeah.
Okay. Let's move swiftly on from that. So you and I share a hobby, which is learning things and researching things by mucking about with computers.
Yeah, absolutely.
And you've come in to tell us about your latest research project.
It sounds really cool when you call it a research project. That's, I like that. I'm going to keep using that. But yeah, I think that I'm a pretty practical person and when I learn new technologies, I really like to have a practical application of it. It just really helps for me to understand what's going on. And so when I, as my first project as a developer advocate, I thought that would be a good thing to do. Right? For everybody else, I hope that everyone appreciates the practical nature of this. So yeah, I wanted to build out a Kaka pipeline and I decided to make it a lot more tangible in my life and build out a hardware pet project as well. So now I have a lot of expensive hobbies, so it's great.
So tell us what it is. What did you actually settle on building?
Yeah. So I have a large number of house plants. You see a couple in the background. There are many, many more elsewhere in the house and it's kind of a hassle to take care of them. I love them, but every day I have to waste time, spend not wasting time, but spend a lot of time checking to make sure if they need to be watered if they need anything from me. They're very needy. So I built out a house plant monitoring system to tell me when they need to be watered so that I can get back those precious minutes of my day. Right?
And presumably, never neglect them again.
Yeah, that's the hope. That's the hope, if every, yeah. If everything works well, yes. They will not be neglected.
I'm getting the impression, lack of neglect is a version two feature.
Yes. Absolute, that'll be in the release notes for version two. Yeah. Keep an eye out.
Well, let's start with version one. How does this, give me the overall architecture and then we'll go through it. What does it, how do you do it?
Yeah. So, well, step one. I got a Raspberry Pi, which are apparently in short supply right now, so I feel pretty lucky that I was able to get one. Yeah. And after a little bit of research, I found a couple of moisture sensors that I thought would work well for my at home setup. I crimped a lot of wires, which no one really has to do. I mean, at this point, if you're a hobbyist with hardware projects, you can get everything basically plug and play.
I wanted to make it a little more difficult for myself and actually use, and just actually wire everything up. So spent a lot of hours doing that. But yeah, once I actually got the physical system set up, so I have four sensors right now. I actually have a couple of them here.
Oh, [inaudible 00:05:13], for the people watching ...
They're really cute.
At home.
Yeah.
Danica's holding them to the camera. For the people listening, it looks like a green rectangle with a black rectangle on the end.
Yeah. That's a lot ...
And that's a motion sensor.
That's a lot of hardware projects. Yes. So this is like, I love ... This is my first hardware project and I just love how cheap and accessible everything was. Right?
Yeah.
So this was, I don't know, like $2 or something and you really can't break it. I've tried. But a number of those, and they're all wired into the Raspberry Pi. And I wrote a couple quick Python scripts to intermittently pull moisture readings from these plants. And once it's in Kafka, from there, it's pretty simple. Right? And so the rest of the pipeline is transforming the data, and then I get a really convenient alert on my phone now when a plant needs to be watered. It's pretty cool. I think it's cool.
That is cool. Right. I'm going to make you take me through in gory detail because I want to know exactly how this works. So I assume these moisture sensors are using the conductivity of soil. Is that where it starts from how the wet of the soil and the more it conducts?
Yes, yes. That is. That's the general vibe there. There are a couple of different, some soil moisture sensors that you can get. These ones are actually a little hardier, right. They don't wear over time as much as other ones. So, yeah. They're using, they're capturing the moisture of the soil. And so I just basically have just a for loop checking that, never ending for loop every, I don't know, two seconds. And pulling that information from all of the sensors. I also conveniently, for free, get some temperature readings as well.
Oh.This is a thing. You often get these boards where they like, "We have other sensors if you're interested for free," because it's ...
Yeah.
Too expensive to do individual components, right?
Yeah, yeah.
Yeah.
But the thing is, I don't actually. I mean, that's a nice to have reading. I am capturing it, but yeah. No, I've never really had a problem with my indoor plants getting too cold. So, it's not very useful in this case, but maybe I'll find out ...
You're out in California, right?
Yes.
Yeah. So I can see how the cold mess wouldn't be, I assume too bad a problem. Okay.
Not too much of an issue.
How does it work? Because, I've only ever played with Arduino stuff. So I would be literally soldering one of those boards into pins on an Arduino. Is it the same in Raspberry Pi?
It doesn't need to be. So, this is where it comes back to where I made it a little more difficult for myself and definitely didn't have to, especially for a first project. But yeah, so since I did opt for crimping all of those cables myself and setting up the connectors to plug into the sensor. In order to also connect all of the sensors and get them to use the same wires going into the Raspberry Pi, I had to connect into a breadboard. That did involve some soldering, so I am now adept at soldering. Well, I'm at least not the worst. I think so.
Is it a real hardware project if you don't burn yourself at least once with the soldering iron?
That is what I told myself. So I do feel fully initiated. In the future, I think I will choose maybe some easier things and not do it again like that, but yeah. So you don't need to do that. Like I said, a lot of these sensors that you can get are just plug and play. It's meant to be easy. It's meant to have a very, very low barrier to entry. So, it can be as easy as you want it to be. Right. You don't have to get a soldering iron. You do not. I do not recommend that for people listening. Do not get one.
It would have to be for the people there doing Raspberry Pi with their children, that's not a good time.
Yeah. It's not really. Yeah. It's not that friendly. I remember when I was starting off welding and soldering, pretty much the same thing. Right? When I was starting off, I lived with a mechanical engineer and I'm borrowing the soldering iron from him and he just walks in. He's like, "Don't lick your fingers when you're doing this, right, because there's just lead. There's just lead in everything apparently, still." And I'm like, "I'm three hours into this. You're telling me this now?" So, not child safe.
I was once discussing one of these projects with a friend of mine and he said, "What kind of extractor fan do you use when you're soldering?" I'm like, "What, now?"
Oh, is that supposed to be ...
I'm not supposed to breathe those fumes? They're so tasty.
Oops. Oh, you're not supposed to do that in a closed space? No.
Oh, God.
Ooh.
Right.
Yeah. Good to know. Good to know.
So, okay. So then Raspberry Pi, is it running Linux, or is it running something simpler?
That is a great question. I set this up so, so long ago and I pretty much just followed the straight setup instructions.
Give me a [inaudible 00:10:35], leave me alone. Right?
Yeah. Honestly, once I got it functioning, I just stopped. I stopped looking at the Raspberry Pi, like the actual operating system. Although, before I had SSH set up, I did have to have it connected to a monitor. And that was pretty adorable because it was just really cute. And I'm like, "No, no, I just want a command prompt. Just give me the bare minimum here." So it's so much easier to set everything up now, with not having to plug the Raspberry Pi into anything and physically look at the UI.
Right. So, okay. So you've got your Python script, that's able to scan the external sensor running in for loop. You get what, give me the data packet. Let's talk about how we make this into an event that we're going to ship off.
Yeah. And data modeling is extremely important so that's, yeah. That's a great thing to talk about. I think that it's very easy when you build out toy projects or things at home to just ignore the data model. And it gets sloppy, cut corners. Right. So I didn't want to do that. I spent a decent amount of time figuring out how I wanted this event to be modeled. So yeah, for the actual readings, the temperature and moisture that we're capturing, that's happening in the same moment. Right, per plant. So I decided to put those into the same event, right, so per plant. In an event, we have the plant ID, right. I've given every plant an ID like everyone. Right? Everyone does that at home.
Yeah, absolutely.
Yeah. So I got our plant ID. We're obviously capturing some sort of timestamp, either just in the Kafka message itself. I also explicitly put that timestamp in there, just in case we get our moisture reading ...
A different stream, internet time and plant time, just in case there's a discrepancy.
Exactly, just in case that comes up later on. And then we also have our moisture value and our temperature value. It's temperature, I think it's being captured in Celsius. And then the moisture value is a percentage of total possible moisture. It isn't like it's, which is very bizarre to configure with the sensors. When you receive the sensor, they actually tell you when you start to set it up, that you may not. You may have to configure this and figure out what the max and min value for your moisture sensor is, what makes sense for you. So I'm sitting there with a glass of water, trying to figure out what's the max moisture of this sensor.
Yeah. Surely a glass of water is 100, right?
You'd be surprised, but it really wasn't. I don't think it was. Yeah. Well, so it's just a value from, I don't know, 200 to like 6000 that the sensor just pulls out. And so basically, if I treat the 6000 as being in a glass of water, I think it only got up to like 5700 or something. But then, yeah. I just translate that to a percentage. And so, yeah. So every couple seconds we have our reading, which is our plant ID, our percentage moisture and our temperature of the plant. So, that's shipped off to Kafka. That is serialized as Avro, which I feel like the average person doesn't really want to think about serializing their data. But I feel confident that my data is robust and will not be changed later on in the pipeline. So there are benefits. There are benefits.
Yeah. I'm a fan of strong types.
I've heard, I've heard.
Trying not to dwell too much on that topic this week. Is it hard to get the Python Kafka libraries running? Because I always feel something might be a bit hairy cross platform.
Yes.
Because it runs on save actually.
And no. Yeah. So I ended up, I thought this was going to be a very, very straightforward thing. I mean, I guess in the end, it kind of was, but I had to build it from source, librdkafka, in order to make everything work properly. But doing that is not difficult. Building from source is pretty straightforward. Getting to the point where I realized I had to do that was a little complicated. Right. Took me like an hour or so. And I'm like, "What is this error message?" Because everything starts running and then there's this one cryptic error message. So, yeah. So if things aren't work, if someone's following along, wants to do this and for some reason your Kafka isn't working on your Raspberry Pi, it's a very specific use case. Just try building from source. I think it'll work. I think it'll be fine.
If all else fails.
For a solution. Yeah.
Okay. So then, you're off to the races. The data is heading across the wire to the cloud.
It is. It is in motion at that point. Yeah, it's crazy.
In motion, as we like to say around here.
Yeah. So yeah, we have our event, right? But that event isn't super, super useful without some additional information, some metadata for lack of a better word. I also, I'm only monitoring four plants. Sometimes I switch them out, so I have six, or six or eight plants that I actually keep track of with this system. I had to write some metadata on those plants, right. So I have my plant ID in my reading, plant ID. I had to tie that plant ID to an actual plant so that when I'm looking at an alert, I actually know what's going on. Right.
So I also wrote a simple script to leveraging an Avro schema, right. Some of that plant metadata into Kafka, as well. And that contained a lot of what I think is useful data. So we've got the plant, plant ID. We have the given name of the plant because all of my plants have names. Yes.
They have names either. Give me an example of one of your plants names.
So this is Olaf. I have a plant on my desk, so here's a plant. But so Olaf is his given name, and, but then there's the scientific name. He's the type of Dracaena. There is a common name for that plant that might be useful. That's a dragon plant, for some reason. I don't, it doesn't really look very menacing, but okay. So those are the fun things that I've got in there. And then also the useful metadata, which is, okay, for that plant, what is the lowest bound of moisture that this plant is comfortable with.
Right. Yeah.
Right.
Which varies by species, right?
Yeah. So, I think generally it's about, it's very similar for most plants, but some plants have a lower tolerance or a higher tolerance depending on how you look at it. So like that plant, for example, can just, I don't even remember the last time I watered that plant because they'll just keep living. It's totally fine. So that one may have a 10 or 15% moisture boundary, right, lower bound. I also, just in case, have the upper bound in there. I've definitely oversaturated this plant, let it dry out.
Yeah.
And then I also have the similar data point for the temperature. Right, so house plants should not get below a certain threshold. My house will never get that cold hopefully, unless there's a second ice age in the next year or so, but ...
It's almost believable, but that's actually ...
Yeah.
Actually, if that happens, the wetness of your plants will be the least of your worries. Right?
That is true. That is true. Also, pleasant, like kind of reassure. I don't know, reassuring a little bit. That's fine. So, yeah. I have my readings going to Kafka, I've written my metadata to Kafka. And now, I could actually start playing around with the data, which was the fun part, I think.
Tell me the fun part. What do you do with all that raw data? How do you manage it up?
Yeah. So I set up a Kaka cluster in Confluent Cloud, which was actually, you mentioned earlier my past experiences with cafe. I was a software engineer building out Kafka applications and it's different when you're at a company and you have Kaka provided to you and whatever. Doing it on your own, it's like a little more intimidating, right. And I think that having Confluent cloud available to me was made it just that much easier. It was so cool to just be like, "Okay, here's my cluster." I just start writing.
It took like three minutes, right? It was pretty cool, pretty easy to get that up and running. So yeah, I have all of my data in my Confluent cloud cluster and I ended up using ksqlDB to transform the data. So at this stage, I didn't even have to write any additional code, which is pretty cool. It's pretty nice to just transform the data with a couple lines of SQL.
What sort of transformations are you doing?
Yeah. Well first, I'm just registering that data in the ksqlDB application. And then, so the reading's data, I treat as a stream. It's an unbounded stream of events, hopeful, unless someone unplugs the Raspberry Pi. And the house plant metadata is a table. Every so often, I might play around with the lower bounds of the house plants. I just write a new entry to that table, if you will.
Yeah.
And so I've got my stream, my table. And so in order to make sense of the readings, I need to enrich it with the house plant metadata. Right?
Yeah.
So there's an enrichment stage or like just a basic join, if you will. And then from there is where it gets interesting. How do I actually alert on this data? How do I make a decision based on this data? Obviously, what I want to be doing is saying, "Oh, when a reading comes in and once I've enriched it, I need to check if that moisture reading is lower than the lower bound for that plant." Right, because then I need to water it. But we're taking readings every couple seconds. So if I just alert on that every couple seconds, my phone's going to explode when the plant needs to be watered.
Not that urgent, right?
Yeah. It's not. So I was like, "What would be a reasonable amount of time to be alerted?" Probably twice a day or every 12 hours. If I forgot to listen to the first text, I should probably water the plant on the second one. So I aggregated the data based on those outlier readings. If the moisture's lower than that threshold, I aggregate that per 12 hours.
Okay.
And that's also ...
So you're doing session windowing.
I am doing windowing. It's just a non overlapping window of 12 hours. Again, so simple, so simple to do. And SQL, oh my goodness. It's like, it blows my mind. So prior to this, I've really dealt a lot with Kafka streaming applications. So going from that to, oh, we're just going to write five lines and it's done, right. Like to window and SQL, so much easier. Yeah. So I window over 12 hours, and then I need to decide, okay, well, what is the number of readings, low readings that I would have to get in that 12 hour period to warrant an alert. Because sometimes, these sensors, it's a $2 sensor, right. Sometimes it might give a faulty reading. It just might. Yeah.
So I want to make sure it's absolutely certain that it's a low reading before I send it down the pipeline. Right. So chose an arbitrary number. We should get one hour worth of low readings in a 12 hour period to warrant an alert later. So, yeah. Which is also pretty easy. I just say, "Okay, when we get to that many readings, send it off." Yeah, it's crazy. It's so crazy how easy this was. I love it.
I sort of wish we could hold up the SQL query for the folks at home, but that's not going to work for the podcast.
If I just hold up a blank piece of paper, can we just, you think they can edit on there, like post production?
I think the YouTube people will be fine, but there's this really not going to work for the listeners. So we'll have to link to the show note, add to the repo and the show notes, but.
Okay. Yeah. Yeah, definitely.
Okay. So you've got this query that is de noising and aggregating, so it doesn't send out too often.
Yes.
And while we're talking about things we like about KSQL that aren't writing Java streams, I'm going to throw in my favorite, which is you can, it's just deployed your streaming. Your stream process app is just ... I typed in three lines and now it's deployed, which is great. So, but how, okay. So take me out, take me into consumer land. How does this event get out to your pocket?
Yeah, yeah. So also, this is pretty cool. I got the idea from our other esteemed colleague, Robin. He had written some use case where he leveraged a Telegram bot. So everyone has Telegram on their phone, that messaging app, and they make it really easy for you to set up your own bot. If you just wanted to play, to fake AI, you can write whatever you want. It's pretty cool.
So it takes a couple minutes to set up a Telegram bot. And once you start interacting with that bot, you get a chat ID and then they have an API that you can just reference that ID, and send messages to yourself from that bot. So naturally, I had to create a plant alerting bot that will yell at me when I need to water my plants. I set up that bot and using the API, the endpoints that they provided along with that chat ID, I could then use the HTTP sink connector and write that data out from Kafka to my phone.
I've assumed you're running your own consumer somewhere.
No, this is easier.
This is just a regular old connector.
Yeah. It's great. And ...
Oh, cool.
To make it even better, it's fully managed as well. So through this whole process, outside of the Raspberry Pi thing, I never had to leave the Confluent console. It was just all ...
Okay. That's quite cool.
Just set it up there. Yeah, pretty easy.
And it's not a dedicated Telegram bot, it's just the HTTP one. Oh, sorry. It's not a dedicated Telegram connector.
Yeah. It's just the HTTP sink connector. That's it. It's awesome. Yeah.
Okay.
So, yeah. I was able to just point to that end point, and then, also in the configuration point to that topic that I was aggregating that data in. And extract just the message that I wanted to send out. It's so simple. I keep saying that. I keep saying that, but it's like, I'm just baffled that for this first hardware project and pipeline that I wanted to build that was useful in my house, in my life. Something I wanted, it's crazy that it was like this easy to set up. I built something that I actually needed in my life in not that much time.
From scratch.
Yeah. It's so cool. It's like, looking back to my other experiences with software, it's one thing to be like, oh, you like to tell people, "I built this thing, I built this application or whatever." But unfortunately, if it's not a front end component, it's not something tangible for you. You can't really, you can't properly brag about it. But now, if people come over, I'm like, "Yeah, I built that thing. It's over there and it monitors my plants. Look, I got a text." It's pretty cool. It's got a wow factor, I guess.
Yeah. Little blinking lights by the shrubbery.
Yeah. That is. I got to get more blinking lights. I was thinking of wiring some extra ones in there, as a second. So I don't even have to look at my phone. What if there was just a light that, or a sad face on the plant, like each of them would monitor.
Yeah. I think that you should have each plant as a consumer of its feed.
Yeah.
And it can be like really sad if it's thirsty and a little bit sad if one of its friends are thirsty.
Yes. It's like real life Tamagotchi. It's going to be great. I guess real life Tamagotchi would just be having a pet. But I've got plants are a little cleaner, so it's ...
It's definitely, definitely. They're the low, low maintenance children.
Yeah.
So is that something you're actually planning to do, this enhancement? Where next for the project?
Yeah. So that was something that I feel like I joked about from the beginning, but I think it would be pretty cool to kind of complete that pipeline and have another set of consumers pushing back to the Raspberry Pi. Yeah, that's something I would want to do. I would also like to make it easier to write metadata to Kafka. So I was thinking of building out the Telegram bot to also accept input, to register a new plant.
Oh, cool.
And then of course, I feel like every one of these projects, you kind of build some sort of visualizations. I would love to see a chart, maybe add some machine learning in there to anticipate when beforehand, like when I need to water the plant based on historical trends. Right. How much do I neglect my plant? So yeah, there's a couple of extra thing, directions that I want to take this in, so we'll see. We'll see.
Cool. That'd be groovy. Do you know what it makes me want to do? This is revealing my own extracurricular activities, but you can get sensors that will sense O2 levels. Right?
Mm-hmm (affirmative).
So can I connect one of your system to an open bottle of wine, and it will tell me when it's breathed enough to drink?
Oh. Oh, that's good. I did have some things in mind for another, I call these practical pipelines. But I don't know if a pipeline based on, an application based on wine would really be that practical. But, that's good. There's a subset of people out there that would care, right?
Yeah. Actually, there's probably really, really rich subset of people that would care a lot. Right?
Yeah.
That's how the wine market works.
New startup idea, throw in some Bitcoin and you're ready. This is going to great, wine.
Great. So, if people want to follow in your footsteps, where do they get started?
Yeah. So I'm actually in the process of drafting a blog post to outline everything, because I think that this was, as I've said a couple times, it's too easy not to do it. It's a really fun thing. You gain practical skills. I now have all these ideas in my house for weird hardware things that I want to do now. I'm like, "Oh, there's a sensor for that. I can do it." It's pretty cool. So yeah, there's an associated blog post outlining pretty much everything that you need to do. And as well as pointing to a repo with code snippets, that might be relevant. Yeah. I want everyone to know that they too can be a hardware hobbyist.
Oh, cool. Yeah. Well, if you're tempted to follow along with that, I'm sure by the time this podcast goes up, your blog post should be up too. So we'll link to that in the show notes ...
Yes.
As well.
Absolutely.
Danica, you make me want to go and break out some Arduinos and do some damage.
You should. Why not?
I really should. Thank you very much for joining us.
Yeah. Thank you for having me. It's been fun.
We'll catch you next time.
All right. Bye.
Our very own Danica Fine there, doing a far better job caring for her house plants than I've ever managed with mine. As I said, if you want to get more details or you want to see the code for that project, then check out the show notes. We'll put the links in there. We'll also put our contact details in the show notes. So if you've got a question or a comment or you'd like to be a future guest on Streaming Audio, just drop us a line.
Or you could leave us a comment or a thumbs up or a review on whichever app is currently streaming, Streaming Audio into your life right now. Before we go, let me remind you that if you ever want to learn how to hack together your own event system, then you should take a look at developer.confluent.io, which is our site that should have all the guidance you need. You'll also find a few courses on there that are taught by Danica herself, so you can learn from someone who has both the knowledge and the battle scars to prove it.
If you need a Kafka instance to run your own project, then head over to Confluent Cloud and sign up with the code PODCAST100, and that'll get you $100 of extra free credit to run your project with. And with that, it remains for me to thank Danica Fine for joining us and you for listening. I've been your host, Kris Jenkins, and I'll catch you next time. (Silence)
Apache Kafka® isn’t just for day jobs according to Danica Fine (Senior Developer Advocate, Confluent). It can be used to make life easier at home, too!
Building out a practical Apache Kafka® data pipeline is not always complicated—it can be simple and fun. For Danica, the idea of building a Kafka-based data pipeline sprouted with the need to monitor the water level of her plants at home. In this episode, she explains the architecture of her hardware-oriented project and discusses how she integrates, processes, and enriches data using ksqlDB and Kafka Connect, a Raspberry Pi running Confluent's Python client, and a Telegram bot. Apart from the script on the Raspberry Pi, the entire project was coded within Confluent Cloud.
Danica's model Kafka pipeline begins with moisture sensors in her plants streaming data that is requested by an endless for-loop in a Python script on her Raspberry Pi. The Pi in turn connects to Kafka on Confluent Cloud, where the plant data is sent serialized as Avro. She carefully modeled her data, sending an ID along with a timestamp, a temperature reading, and a moisture reading. On Confluent Cloud, Danica enriches the streaming plant data, which enters as a ksqlDB stream, with metadata such as moisture threshold levels, which is stored in a ksqlDB table.
She windows the streaming data into 12-hour segments in order to avoid constant alerts when a threshold has been crossed. Alerts are sent at the end of the 12-hour period if a threshold has been traversed for a consistent time period within it (one hour, for example). These are sent to the Telegram API using Confluent Cloud's HTTP Sink Connector, which pings her phone when a plant's moisture level is too low.
Potential future project improvement plans include visualizations, adding another Telegram bot to register metadata for new plants, adding machine learning to anticipate watering needs, and potentially closing the loop by pushing data back
to the Raspberry Pi, which could power a visual indicator on the plants themselves.
EPISODE LINKS
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.
Email Us