Enhance your career, get your certificate as a Data Streaming Engineer | Get your Certificate

September 8, 2022 | Episode 232

Reddit Sentiment Analysis with Apache Kafka-Based Microservices

Transcript
Notes

Kris Jenkins: (00:00)

Let's start this week's Streaming Audio with a question. Do you remember being a beginner? Do you remember first coming into this industry and thinking, "Oh, my God, there is so much to learn." Maybe you're still feeling that. I know I do sometimes.

Kris Jenkins: (00:19)

I was on vacation last week. I was on a boat, and I found myself wondering, "Do sailors ever get used to the size of the ocean?" I mean, they must do just like we get used to the size of our field. But it's interesting sometimes to step back and try and see it afresh, see it with new eyes and wonder, "What do you think would matter if you saw this industry again from scratch? What would you explore first? How would you navigate your way through our ocean?"

Kris Jenkins: (00:50)

Well, I had a chance to get some of that perspective fresh firsthand because we have an internship program here at Confluent. I thought I'd ask one of our interns to be brave and come and tell us about his experience diving into this world for the first time. What did he learn? What did he need to learn? What did he do? What caught his interest?

Kris Jenkins: (01:13)

One thing he did, which I thought was really interesting was a natural language processing application for Reddit threads, but he also had some surprising takes about balancing technical skills with soft skills and business skills. He's got really nice perspective. If you are new to this industry, I think you're going to hear this episode and realize we're all in the same boat at the end of the day. If you're a veteran, well, it might leave you better equipped to deal with some new hires and see things from their perspective.

Kris Jenkins: (01:43)

Before we get started, this podcast is brought to you by Confluent Developer, which is our education site for Kafka. More about that at the end, but for now I'm your host, Kris Jenkins. This is Streaming Audio. Let's get into it.

Kris Jenkins: (02:02)

My guest today is Shufan Liu. Welcome to the show, Shufan.

Shufan Liu: (02:06)

Hi. Thank you. Thanks for having me.

Kris Jenkins: (02:08)

It's good to have you here. We have got you here because, let me see if I've got this right, you are studying at the University of Pennsylvania.

Shufan Liu: (02:17)

Yes, for master's.

Kris Jenkins: (02:19)

Yeah, you're doing your master. You've come across from a business degree to a computer science master's.

Shufan Liu: (02:25)

That is correct.

Kris Jenkins: (02:26)

Let's start there. Why did you do that?

Shufan Liu: (02:32)

I wasn't confident that I'm smart enough to handle technical stuff first going into college, but I don't want to say goodbye completely to that, so I picked a couple of computer science class during the course of my undergrad. And then from those courses, I feel like, "Oh, that is my stuff," so I decided to study more in master program. I applied to one of the technical computer science programs. Luckily, I got in, and then I happily started to pursue my career in technology.

Kris Jenkins: (03:10)

As part of that, you've been doing an internship with us at Confluent, right?

Shufan Liu: (03:15)

Yes.

Kris Jenkins: (03:16)

Now I was trying to think back to my days when I first went from university into industry, and I remember being kind of overwhelmed by what an ocean of stuff there was to learn.

Shufan Liu: (03:30)

Well, I can assure you that feeling hasn't gone away. That's the same with me.

Kris Jenkins: (03:37)

What's the most important stuff you've been picking up?

Shufan Liu: (03:41)

Well, trying to find the balance between reaching out and ask people for questions and actually diving into document and trying to solve the problem myself. That's a hard balance to find, but it's something important, I think, at the start of the career.

Kris Jenkins: (04:00)

That's interesting because I would've thought a lot of people would answer that question with something technical, but you've gone straight to the really hard stuffing of [inaudible 00:04:09] science, which is balancing when to talk to people.

Shufan Liu: (04:12)

Well, that kind of concludes my struggle was technical stuff because mostly I ask technical stuff, reading document or reaching out to people, bothering them.

Kris Jenkins: (04:27)

Well, one of the jobs of an intern is to make sure they're grabbing information from everybody they can.

Shufan Liu: (04:33)

Oh, well, I hope nobody is bored of me yet, but I ask all questions.

Kris Jenkins: (04:43)

But that's good. That's good. One of the reasons I wanted to get you in talking to us is that whole fresh perspective thing, right? A lot of people listening to this podcast will have been in the industry for quite a while, maybe have used Kafka for quite a while, and we've simply forgotten what it's like to see this with fresh eyes.

Kris Jenkins: (05:05)

I'm going to ask you some Kafka questions, but tell me what's it like coming into a technical company for the first time?

Shufan Liu: (05:13)

Everybody is so freelance. Everybody having a very chilling feeling doing their work, but they always get things done. That's something very impressive what I discovered in tech companies.

Kris Jenkins: (05:34)

You think we seem relaxed on the surface?

Shufan Liu: (05:37)

Well, yes, but I know for sure that everybody's working really hard to get things done because those things are really hard to get done.

Kris Jenkins: (05:46)

Yeah. We're just as much as anyone beating our heads against the computers and then trying to smile afterwards. But you've come in slightly... I mean my angle, I came in straight as a computer science major, I think is the American term, into a programming job. You wanted to go into DevX, DevRel, like the marriage between programming and talking to people.

Shufan Liu: (06:17)

Yes.

Kris Jenkins: (06:17)

Why choose that?

Shufan Liu: (06:21)

Well, part of the reason is because I am new to the industry. I want to experience much as possible to discover where my interest is. Software engineer seems like a very heavy job for me to do at the first year, first summer intern. A lot of my cohort friends got internship in software engineering, but I want to try something different. I want to try jobs that kind of helped me with preparing me for software engineering career, but also gave me a perspective of how the industry looks like in a bigger picture.

Shufan Liu: (07:03)

Working at DevX or DevRel, I have opportunity to work with the marketing team, the product management team, and I can learn the demand of our customer, that is a developer. I can try to learn what they're thinking and practice the way that I communicate with them. I think that's going to be beneficial for me in my career laurel.

Kris Jenkins: (07:31)

Yeah, that's a very smart perspective. I think a lot of us focus on getting really good at programming, and then wake up one day and realize we actually have to talk to the rest of the company for it to be worth anything.

Shufan Liu: (07:43)

Well, I still envy them for having good programming skills. I need to practice that. That's something I really want as well.

Kris Jenkins: (07:53)

Yeah. Well, the great thing about computers is they'll hold you to account. It's the human stuff you have to kind of mercurially judge your way through, right?

Shufan Liu: (08:02)

Yes.

Kris Jenkins: (08:04)

What's it been like being an intern? What have you learned? Tell me some things.

Shufan Liu: (08:08)

Well, I think the first stuff is what is DevX? What is Developer Advocate? What do we do? Coming into Confluent, before this, I had no idea what developer advocate is until I met Danica who just introduced a whole different world to me. I realized why it's important to a company, especially to a business effacing company like Confluent. I'm sure Confluent spend a lot of time with all DevX team, Confluent spend a lot of time educating the developers' community. It's our job to maintain that activity and try to help them with the best knowledge as we can.

Kris Jenkins: (08:55)

Which parts of that did you get involved in and which parts did you enjoy?

Shufan Liu: (09:00)

I enjoy writing a blog and thinking as an audience. What do they want to hear from me? What do they want to know from Confluent and try to introduce my best knowledge to them. That's some really good experience. I think that's some unique experience that I wouldn't have gotten if I were a software engineering intern.

Kris Jenkins: (09:27)

Yeah, it's something I think I mostly learned from user interface design, but that empathy for the user is such a vital skill.

Shufan Liu: (09:36)

I agree, yes. Empathy.

Kris Jenkins: (09:39)

I kind of feel your business degree sneaking in here, too.

Shufan Liu: (09:43)

Oh, wow.

Kris Jenkins: (09:43)

The whole holistic, what's actually going to benefit the business perspective. You think that's true?

Shufan Liu: (09:50)

Well, now it reminds me. That's probably intuitive. I didn't realize, but it just blends well with my work.

Kris Jenkins: (10:04)

Yeah. What was the blog post you wrote about which I'm sure we will link to in the show notes?

Shufan Liu: (10:12)

My internship kind of divide into two parts. The first part is working with Danica, and I try to extend her data pipeline. I wrote my first blog as from a rookie perspective to Kafka. I specifically described the process of building data pipeline with Apache Kafka, and how to extend the data pipeline using cluster linking on Confluent cloud.

Kris Jenkins: (10:42)

Right, you went straight into cluster linking.

Shufan Liu: (10:45)

Well, yes.

Kris Jenkins: (10:49)

That's biting off a big topic.

Shufan Liu: (10:51)

Well, thanks to Danica. We did pretty much everything together, and she's helpful. I keep asking question to her instead of diving into docs. That's my cheat code.

Kris Jenkins: (11:08)

Well, we could get Danica in the room for a follow-up podcast reviewing you, but I don't think we'll do that. We'll skip over that.

Shufan Liu: (11:16)

She'll work it out with you.

Kris Jenkins: (11:19)

I know you two have been working very closely, and she is great to work with. Let's just say that for the record.

Shufan Liu: (11:24)

Everyone in DevX team is great to work with.

Kris Jenkins: (11:28)

Oh, you're too kind. Was there a particular reason or something that interested you about getting involved in the Kafka side of things?

Shufan Liu: (11:44)

I didn't know what Kafka is before coming into Confluent, let's say before my interview with Confluent. I know Confluent is a great company to work with, but I really didn't know what the product is for Confluent. But the interview opportunity showed up in front of me. I started to learn what Kafka is, what Confluent is doing. I started to see the business value inside Confluent, why it's so important to developers, why it's so important to businesses, especially those have demand with fault tolerance and highly elastic message queue stuff.

Kris Jenkins: (12:34)

Yeah. I wonder, there must be a lot of people listening to this thinking, "Okay, well they've got some maybe junior developers coming in or someone of about your experience." What is it that we should be teaching them that matters about this? From what you've learned, what would you say to someone with a similar level of experience to yourself? What are the things to grasp mentally from it all?

Shufan Liu: (13:04)

Well, I think Confluent Developer 101 course is really helpful. Setting it up was good for me. Thanks Team Brooklyn, I had a very holistic picture of what Kafka is before coming in. I think the best way to get to know Kafka is to start trying to build a project with Kafka and see what Kafka can do with the project.

Shufan Liu: (13:31)

That's what I did for my second project. Once I get familiarized with building a data pipeline using Kafka, I started to build my own project with a purpose to educate our develop advocates. Well, educate seems too much of a selfish word. It seems like a big workpiece with-

Kris Jenkins: (13:57)

No, no, not at all. You've learned some things, you're sharing them. That's all education is.

Shufan Liu: (14:01)

Well, thank you. I'm not sure if I'm qualifying now to say educating people as I'm fresh rookie here.

Kris Jenkins: (14:13)

Okay. Let's move on from the humility. Tell me what you built.

Shufan Liu: (14:19)

The second project is building a microservice architecture application with Kafka and that's kind of inspired by Dave Klein's microservice pizza application and-

Kris Jenkins: (14:34)

Oh, yeah. I've seen that one.

Shufan Liu: (14:36)

I was thinking maybe I can build something more interesting than building a pizza. Sorry, Dave Klein. That was a very good project, but I was thinking maybe I can do something a little bit more complicated than that, so I decided to use microservice application to build a sentiment analysis on Reddit. My application prompts user to give a request on which sub-Reddit they want to analyze in the specific time range. With my microservice architecture, the application first pull Reddit thread from Reddit API and then flows through the Apache Kafka topics and microservices applying the sentiment score on each of the sub-Reddit thread, calculate average in the end and figure out a way to display it in front of our user.

Kris Jenkins: (15:40)

Okay. So you are splitting out the idea of gathering a large amount of data.

Shufan Liu: (15:45)

Yes.

Kris Jenkins: (15:46)

Then somehow processing that in probably quite time-consuming weight sentiment analysis.

Shufan Liu: (15:53)

Actually no.

Kris Jenkins: (15:54)

It's not?

Shufan Liu: (15:56)

Yeah, it processes pretty fast.

Kris Jenkins: (15:59)

Okay. But it's something you wanted to separate out from the gathering phase?

Shufan Liu: (16:03)

What do you mean the gathering phase?

Kris Jenkins: (16:07)

The phase where you gather the data from Reddit. You've got that as a separate thing to let's process this and analyze it.

Shufan Liu: (16:13)

Yes, they are separate microservices.

Kris Jenkins: (16:17)

And then from there it dumps the analysis output and you think about displaying it separately.

Shufan Liu: (16:24)

Yes. Those are from different microservice application and they communicate through Apache Kafka topics.

Kris Jenkins: (16:34)

That's a juicy enough template, kind of application for this kind of thing. It's something we'll all end up doing in one shape or form.

Shufan Liu: (16:44)

Yes.

Kris Jenkins: (16:44)

If you're dealing with Kafka in anger, right?

Shufan Liu: (16:47)

Right.

Kris Jenkins: (16:50)

I have to ask this, did you do that because you thought it was the right solution to the problem or because you wanted to exercise the system and get a feel for it or both?

Shufan Liu: (17:03)

I wanted to build a data pipeline from scratch completely myself because the first project is building upon Danica's project. For the second project, I really wanted to do something that I really enjoy doing. I enjoy doing extending Danica's project, but I wanted to have something from my own and to think about something interesting. What I've learned from university, sentiment analysis came up to me and I start to think from there, how do I integrate Kafka application with sentiment analysis? I talked to a lot of people and here's the question problem I can solve with Apache Kafka and with microservices.

Kris Jenkins: (17:52)

It's a good one, I think. They're kind of what is the sentiment behind a Reddit thread or a sub-Reddit? That's the word I'm looking for.

Shufan Liu: (18:01)

Yes.

Kris Jenkins: (18:01)

I think we can probably all think of a couple of sub-Reddits where we know the sentiment just from the topic, right? It's a very polarized place sometimes.

Shufan Liu: (18:11)

The worst sub-Reddit is politics. I think that's not [inaudible 00:18:20].

Kris Jenkins: (18:20)

I think regardless of people's politics, they probably agree that the politics sub-Reddit is pretty spicy. Let's use that word.

Shufan Liu: (18:28)

That's a good word. That's a good word choice. On the other hand, I think one of the benefit of map applications is to provide a way to quantify sentiment from a time range. When you go over social media, you feel like, "Ah, I can feel the bad vibe. I can feel the positive vibe," looking through the thread, but it's hard to quantify and compare those sentiment. Thanks to Kafka and microservices, my application actually provides a pretty good way to summarize and quantify them and maybe for users to compare them.

Kris Jenkins: (19:14)

Yeah. That's what gets really interesting, something like that, where you are automating, comparing lots of different threads or lots of different sub-Reddits and getting a quantitative balance between them, I would think.

Shufan Liu: (19:27)

Another example is analyzing the same sub-Reddit, but over different times. One example is there's all kind of sports on sub-Reddits. The sentiment when the team is on a winning streak is probably different from the sentiment when the team is on a losing streak. I had one example shown in a blog that is coming out soon, so stay tuned.

Kris Jenkins: (19:57)

Okay. We'll link to that in the show notes. I'm not really a sports person, but I would've thought at the start of each season, all the supporters are really optimistic, and you've got a varying length window for the team to actually do well before they all start trashing on the team. Are any patterns like that?

Shufan Liu: (20:21)

Well, before season starts, there's a bunch of pre-season games. If your team performed well in those pre-season games, the sentiment is probably a higher rate, but that really depends on each teams.

Kris Jenkins: (20:38)

Is sports the sentiment that you personally wanted to get to?

Shufan Liu: (20:42)

Yes.

Kris Jenkins: (20:43)

That's your hobby topic?

Shufan Liu: (20:45)

Yes. I really want to use that as an example to test if my sentiment application works, and luckily it worked just the way I thought.

Kris Jenkins: (20:59)

Test in what way?

Shufan Liu: (21:02)

Whether the sentiment is accurate because I can choose a time period when my team is doing great and then quantify the sentiment and compare them to a period of time that the team is doing bad. You can look at a negative score, which is significantly higher when the team is doing bad.

Kris Jenkins: (21:24)

So you're testing it against your experience and intuitions about the team's performance.

Shufan Liu: (21:30)

Right.

Kris Jenkins: (21:31)

I see. Did you find any other interesting patterns from the data?

Shufan Liu: (21:39)

Well, it really depends on the sub-Reddit, right? Another interesting stuff is maybe... Are you playing video games?

Kris Jenkins: (21:51)

Sometimes. Probably more than I should.

Shufan Liu: (21:54)

Well, there's some video games that really got their fans hyped up before releasing. Well, unfortunately, sometimes when the game get released, the fans start to realize it is not what the marketing team promised to be and the sentiment dropped. That's another example of how people react on social media.

Kris Jenkins: (22:17)

Ohm yeah. Do you know that makes me think, change in sentiment over time. I would love to see this project run against a few NFT projects. I'm not blanket covering all of them, but some of them are hyped trains that build and build and crash.

Shufan Liu: (22:36)

Well, I mean there's blockchain sub-Reddits, and there's some of the stock investing sub-Reddit. It's really interesting to see their reaction over time. Right.

Kris Jenkins: (22:51)

Yeah. I wonder if it ever leads. I mean, if you could use that change in sentiment on Reddit to analyze where you think the market's going to go.

Shufan Liu: (23:03)

I doubt it, but I think that's a good application.

Kris Jenkins: (23:09)

I think you've come up with a good project because it sparks questions that you then want to use the tool to answer. Right?

Shufan Liu: (23:16)

Well, I... Go ahead.

Kris Jenkins: (23:20)

You've written this up and you're going to publish the source code, right?

Shufan Liu: (23:24)

Yes.

Kris Jenkins: (23:26)

Okay. So maybe you should talk to us. It's going to be there for people to try out and try their own predictions. Talk to us a bit more about how it's built. What language did you use? What are the services actually coded as, that kind of thing?

Shufan Liu: (23:41)

I used Python for each of the microservices and there are, I think, four microservices. The first ones get user input for each request. Each request consists of a sub-Reddit name and a start date and an end date. After that microservices, the request will be appended as a message onto a Kafka topic, and then another microservice called API Poller will consume that message from the previous topic and then pull all the sub- Reddit thread accordingly. Once that happens, they will append all the thread to another topic which will be consumed by sentiment analysis. The sentiment analysis microservice just append a sentiment score to each of the thread.

Kris Jenkins: (24:41)

How does that work? Let's just slow down on that one a bit. How do you calculate a sentiment?

Shufan Liu: (24:45)

I just use one very convenient Python package called NLTK. It's a three-sentence Python code. You can check it out on my new poll.

Kris Jenkins: (24:59)

Okay. So you get sentiment score?

Shufan Liu: (25:01)

Mm-hmm. The sentiment score applies to each of the Sub-Reddit thread. I use [inaudible 00:25:09] to calculate average from the streaming data. After I create another table on [inaudible 00:25:18], that table also corresponds to another Kafka topic, which I will use to consume from my last application call display. That is to-

Kris Jenkins: (25:28)

A good name.

Shufan Liu: (25:29)

Yeah. Very, very good. That is to consume the Kafka KTable from [inaudible 00:25:39] and display the results to our user.

Kris Jenkins: (25:45)

Right. I can see how that pipeline puts together. I also, I think, see how that would scale quite well. Have you load tested it at all?

Shufan Liu: (25:58)

I have not, but I build this application just to make sure that it can scale well with microservice architecture, Apache Kafka, which is built to be fault intolerant and with high elasticity.

Kris Jenkins: (26:16)

Yeah, I can see that. I can see how the architecture would lend itself towards that kind of scale, which I think probably most of our listeners will be able to see, I would hope. But there's finite time in the day and definitely finite time when you're an intern. You probably haven't load tested that hugely in anger.

Shufan Liu: (26:34)

Well, maybe I should try to do that and put it in my blog because the purpose of that is to show people that microservice and Apache Kafka is such a good combination for making the application that is very easy to expand horizontally.

Kris Jenkins: (27:02)

What's the primary key on all those partitions? What's the key? Is it the sub-Reddit?

Shufan Liu: (27:12)

For the request ID, the key is... For this user input, the first topic, that's request ID. Each request has a unique ID, and we use that for aggregation later on in case of [inaudible 00:27:27].

Kris Jenkins: (27:28)

All right. So you should see the chart by user request.

Shufan Liu: (27:32)

Right.

Kris Jenkins: (27:32)

You could scale it per user request. That makes sense.

Shufan Liu: (27:35)

Each request is actually unique in this application.

Kris Jenkins: (27:40)

Okay. I like this midterm project. This feels very tasty to me.

Shufan Liu: (27:45)

Thank you.

Kris Jenkins: (27:47)

Did you enjoy working on it?

Shufan Liu: (27:49)

I do. I bounce into a lot of technical question. As I said, it's a good way to practice finding the balance between reading docs and bothering people. I always choose the easy way, to bother people. Everyone in DevX team are being bothered by me constantly.

Kris Jenkins: (28:12)

What do you think are the most important things you learn from an internship in the industry?

Shufan Liu: (28:19)

Well, the first thing I learned is what is DevX, right? I think the most important stuff is to learn Kafka and Confluent cloud, and try to figure out the use case with Kafka. I think at the end of the day only technical stuff really, really matters in career growth. I mean, there's a lot of practice time for soft skills, but really the most important thing to get away from internship is the technical stuff that I learned. That is the top.

Kris Jenkins: (29:03)

That's good. I think the heart of what we do in DevX has to be development. It has to be software engineering.

Shufan Liu: (29:12)

One thing that I like about DevX team is that you have to constantly learn new stuff and try to pick up the new technology really fast. That's one thing that I really enjoy being a develop advocate is I have opportunity to get exposed to technology that are new, that are interesting, and keep thinking how I can integrate those technology with Apache Kafka.

Kris Jenkins: (29:43)

Yeah. That's something I enjoy about it. You get to be very technical, but you get to be imaginative and thinking, "How can I fit these ideas together and explain them and use them in interesting ways? How can I cheat and get paid to build things that I find interesting?"

Shufan Liu: (30:03)

I don't know that's something that we can discuss publicly, but yes, I agree with you.

Kris Jenkins: (30:09)

I think it serves both sides, to be honest. I think you are most interesting talking about the things that you find interesting. I think you've chosen a great project in that it's something you find inherently interesting and useful. And so would I actually. I think I might have to check out your code and play around with it.

Shufan Liu: (30:33)

Well...

Kris Jenkins: (30:34)

There are some synthesizer sub-Reddits I'd like to analyze.

Shufan Liu: (30:38)

Let me know, and we can do it together.

Kris Jenkins: (30:42)

Oh, yeah, absolutely, in the future. I want to wrap up by asking you a couple of questions that you may not be able to answer 100% honestly, but I'm going to ask them anyway and see how unguarded you're feeling.

Shufan Liu: (30:56)

Okay.

Kris Jenkins: (30:57)

Would you work in DevX again when you finish your master's, and would you work for Confluent again? Would you use Kafka again? All three of those. You can be as honest as you like about any of them.

Shufan Liu: (31:10)

I would definitely work for DevX team and Confluent. It's been such a great experience. I enjoy talking to everyone in this team, and everyone is so supportive and helpful to me. I think this is a great place for me to grow in terms of career and in terms of learning new stuff.

Shufan Liu: (31:33)

I probably will write more code as software developer, but I definitely will learn more tech stack with DevX team. In terms of Apache Kafka, I don't see why not. It's the best message queue that [inaudible 00:32:00]. That's a bad word to describe Apache Kafka. It's not a message queue, but it's like a message queue. It's a great tool.

Kris Jenkins: (32:09)

It's kind of a super set.

Shufan Liu: (32:12)

Yes. That's what I use for the microservices. I use that as message queue, but it can do a lot more than being a message queue.

Kris Jenkins: (32:23)

Yeah, I put you on the spot there. Thank you for fielding those questions. I'll say from my side, it's been really great working with you. I wish you could stick around longer.

Shufan Liu: (32:35)

Thank you.

Kris Jenkins: (32:36)

Maybe we'll see you on the podcast sometime in the future wen you're analyzing the sentiments of sports team in a way that... You'll probably be cast in the sequel to Moneyball one day. That's my prediction.

Shufan Liu: (32:53)

Wow.

Kris Jenkins: (32:54)

Did you see that film? That's a really good film. I

Shufan Liu: (32:58)

I did not, but-

Kris Jenkins: (32:58)

Oh, you've got to see that film. It's all about data and numerical analysis of sports. Love it.

Shufan Liu: (33:05)

That's my dream job. Oh, my God.

Kris Jenkins: (33:09)

Cool. On that, Shufan, great talking to you. Thanks for joining us on Streaming Audio.

Shufan Liu: (33:15)

Thank you.

Kris Jenkins: (33:15)

I'll catch you soon.

Kris Jenkins: (33:17)

Well, by the time you hear this, Shufan is going to be back in academia. Will he be writing his thesis on real time sentiment analysis? Will he be arguing about basketball online? I don't know. Both, I hope. Whatever it is, good luck, Shufan. It was great having you around here, and I wish you all the best.

Kris Jenkins: (33:37)

If you would like to check out his code and play with it or read the blog post he's written about project, check the links in the show notes. If you want to build your own streaming data application, then check out Confluent Developer, which is where you'll find a wealth of resources for learning about event streaming and Apache Kafka with Python, Go, Java, JavaScript, loads more. Check it out at developer.confluent.io.

Kris Jenkins: (34:05)

To make the most of that knowledge, you're going to need a Kafka cluster. So you can try spinning one out at confluent.cloud, which is our Kafka cloud service. You can sign up and have a cluster running reliably in minutes. If you add the code PODCAST100 to your account, you'll get some extra free credit to run with.

Kris Jenkins: (34:25)

Meanwhile, if you've enjoyed this episode, then do click like and subscribe and the rating buttons and all those good things. It helps people that like this kind of information to find us. It also helps us know which episodes you want to hear more of, which topics you want us to explore in more detail. If you want to get in touch with me directly, as always, my Twitter handle is in the show notes.

Kris Jenkins: (34:49)

With that, it just reminds me to thank Shufan Liu for joining us and you for listening. I've been your host, Kris Jenkins, and I will catch you next time.

How do you analyze Reddit sentiment with Apache Kafka® and microservices? Bringing the fresh perspective of someone who is both new to Kafka and the industry, Shufan Liu, nascent Developer Advocate at Confluent, discusses projects he has worked on during his summer internship—a Cluster Linking extension to a conceptual data pipeline project, and a microservice-based Reddit sentiment-analysis project. Shufan demonstrates that it’s possible to quickly get up to speed with the tools in the Kafka ecosystem and to start building something productive early on in your journey.

Shufan's Cluster Linking project extends a demo by Danica Fine (Senior Developer Advocate, Confluent) that uses a Kafka-based data pipeline to address the challenge of automatic houseplant watering. He discusses his contribution to the project and shares details in his blog—Data Enrichment in Existing Data Pipelines Using Confluent Cloud.

The second project Shufan presents is a sentiment analysis system that gathers data from a given subreddit, then assigns the data a sentiment score. He points out that its results would be hard to duplicate manually by simply reading through a subreddit—you really need the assistance of AI. The project consists of four microservices:

A user input service that collects requests in a Kafka topic, which consist of the desired subreddit, along with the dates between which data should be collected
An API polling service that fetches the requests from the user input service, collects the relevant data from the Reddit API, then appends it to a new topic
A sentiment analysis service that analyzes the appended topic from the API polling service using the Python library NLTK; it calculates averages with ksqlDB
A results-displaying service that consumes from a topic with the calculations

Interesting subreddits that Shufan has analyzed for sentiment include gaming forums before and after key releases; crypto and stock trading forums at various meaningful points in time; and sports-related forums both before the season and several games into it.

EPISODE LINKS

Continue Listening

Episode 233September 15, 2022 | 34 min

Real-Time Stream Processing, Monitoring, and Analytics With Apache Kafka

Processing real-time event streams to identify wildlife movement patterns and population changes is a challenge but can be broken down into solvable problems. With a day job designing and building highly available distributed data systems, Simon Aubury (Principal Data Engineer, Thoughtworks) believes stream-processing thinking can be applied to any stream of events. In this episode, he shares his Confluent Hackathon ’22 winning project—a wildlife monitoring system to observe population trends over time using a Raspberry Pi, along with Apache Kafka, Kafka Connect, ksqlDB, TensorFlow Lite, and Kibana. He used the system to count animals in his Australian backyard and perform trend analysis on the results. Simon also shares ideas on how you can use these same technologies to help with other real-world challenges.

Listen Now

Episode 234September 20, 2022 | 1 min

How to Build a Reactive Event Streaming App - Coding in Motion

How do you build an event-driven application that can react to real-time data streams as they happen? Kris Jenkins (Senior Developer Advocate, Confluent) will be hosting another fun, hands-on programming workshop—Coding in Motion: Watching the River Flow, to demonstrate how you can build a reactive event streaming application with Apache Kafka, ksqlDB using Python.

Listen Now

Episode 235September 30, 2022 | 62 min

International Podcast Day - Apache Kafka Edition | Streaming Audio Special

What’s your favorite podcast? In celebration of International Podcast Day, Kris Jenkins invites 12 experts from the Apache Kafka community to talk about their favorite podcasts. Unlike other episodes where guests educate developers and tell stories about Kafka, its surrounding technological ecosystem, or the Cloud, this special episode provides a glimpse into what these guests have learned through listening to podcasts that you might also find interesting.

Listen Now

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free

Apache Iceberg ™

Kafka® 101

Apache Flink® SQL

Apache Flink® Table API: Processing Data Streams in Java

Designing Event-Driven Microservices

Apache Flink® 101

Building Flink® Apps in Java

Kafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Articles

Patterns

FAQs

Blog

Streamables

Learn More

Language Guides

Tutorials

Demos

Meetups

Community Slack

Community Catalysts

Community Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2026

Past Current and Kafka Summit events

Reddit Sentiment Analysis with Apache Kafka-Based Microservices

Kris Jenkins: (00:00)

Kris Jenkins: (00:19)

Kris Jenkins: (00:50)

Kris Jenkins: (01:13)

Kris Jenkins: (01:43)

Kris Jenkins: (02:02)

Shufan Liu: (02:06)

Kris Jenkins: (02:08)

Shufan Liu: (02:17)

Kris Jenkins: (02:19)

Shufan Liu: (02:25)

Kris Jenkins: (02:26)

Shufan Liu: (02:32)

Kris Jenkins: (03:10)

Shufan Liu: (03:15)

Kris Jenkins: (03:16)

Shufan Liu: (03:30)

Kris Jenkins: (03:37)

Shufan Liu: (03:41)

Kris Jenkins: (04:00)

Shufan Liu: (04:12)

Kris Jenkins: (04:27)

Shufan Liu: (04:33)

Kris Jenkins: (04:43)

Kris Jenkins: (05:05)

Shufan Liu: (05:13)

Kris Jenkins: (05:34)

Shufan Liu: (05:37)

Kris Jenkins: (05:46)

Shufan Liu: (06:17)

Kris Jenkins: (06:17)

Shufan Liu: (06:21)

Shufan Liu: (07:03)

Kris Jenkins: (07:31)

Shufan Liu: (07:43)

Kris Jenkins: (07:53)

Shufan Liu: (08:02)

Kris Jenkins: (08:04)

Shufan Liu: (08:08)

Kris Jenkins: (08:55)

Shufan Liu: (09:00)

Kris Jenkins: (09:27)

Shufan Liu: (09:36)

Kris Jenkins: (09:39)

Shufan Liu: (09:43)

Kris Jenkins: (09:43)

Shufan Liu: (09:50)

Kris Jenkins: (10:04)

Shufan Liu: (10:12)

Kris Jenkins: (10:42)