July 15, 2021 | Episode 167

Powering Real-Time Analytics with Apache Kafka and Rockset

  • Transcript
  • Notes

Using large amounts of streaming data increasingly requires interactive, real-time analytics and dashboards—and this applies to any industry, including tech. CTO and Co-Founder of Rockset Dhruba Borthakur shares how his company uses Apache Kafka® to perform complex joins, search, and aggregations on streaming data with low latencies. The Kafka database integrations allow his team to make a cloud-native analytics database that is a fundamental piece of enterprise infrastructure. 

Especially in e-commerce, logistics and manufacturing apps are typically receiving over 20 million events a day. As those events roll in, it is even more critical for real-time indexing to be queried with low latencies. This way, you can build high-performing and scalable dashboards that allow your organization to use clickstream and behavioral data to inform decisions and responses to consumer behavior. Typically, the data follow these steps:

  1. Events come in from mobile or web apps, such as clickstream or IoT data
  2. The app data is sent to the cloud
  3. Data is fed into the database in real time
  4. This information is shared live on a dashboard or via SaaS application embeds

For example, when working with real-time analytics in real-time databases, both need to be continuously synced for optimal performance. If the latency is too significant, there can be a missed opportunity to interact with customers on their platform. You may want to write queries that join streaming data across transactional data or historical data lakes, even for complex analytics. You always want to make sure that the database performs at a speed and scale appropriate for customers to have a seamless experience. 

Using Rockset, you can write ANSI SQL on semi-structured and schemaless data. This way, you can achieve those complex joins with low latencies. Further data is required to supplement streaming data, but it can be easily supported through supported integrations. By having a solution for database requirements that are easily integrated and provide the correct data, you can make better decisions and maximize the result. 

Continue Listening

Episode 168July 22, 2021 | 29 min

Consistent, Complete Distributed Stream Processing ft. Guozhang Wang

Stream processing has become an important part of the big data landscape as a new programming paradigm to implement real-time data-driven applications. One of the biggest challenges for streaming systems is to provide correctness guarantees for data processing in a distributed environment. Guozhang Wang (Distributed Systems Engineer, Confluent) contributed to a leadership paper, along with other leaders in the Apache Kafka community, on consistency and completeness in streaming processing in Apache Kafka in order to shed light on what a reimagined, modern infrastructure looks like.

Episode 169July 27, 2021 | 25 min

Collecting Data with a Custom SIEM System Built on Apache Kafka and Kafka Connect ft. Vitalii Rudenskyi

The best-informed business insights that support better decision-making begin with data collection, ahead of data processing and analytics. Enterprises nowadays are engulfed by data floods, with data sources ranging from cloud services, applications, to thousands of internal servers. The massive volume of data that organizations must process presents data ingestion challenges for many large companies. In this episode, data security engineer, Vitalli Rudenskyi, discusses the decision to replace a vendor security information and event management (SIEM) system by developing a custom solution with Apache Kafka and Kafka Connect for a better data collection strategy.

Episode 170August 5, 2021 | 31 min

Minimizing Software Speciation with ksqlDB and Kafka Streams ft. Mitch Seymour

Building a large, stateful Kafka Streams application that tracks the state of each outgoing email is crucial to marketing automation tools like Mailchimp. Joining us today in this episode, Mitch Seymour, staff engineer at Mailchimp, shares how ksqlDB and Kafka Streams handle the company’s largest source of streaming data.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.