RocksDB is a really cool, increasingly popular open-source database that gets integrated into lots of places [inaudible 00:00:07] which is Kafka Streams and ksqlDB. Today, I talked to Dhruba Borthakur who is the CEO and co-founder of Rockset about RocksDB, how it works and how it fits into their overall plans to make a cloud-native analytics database.
Check it out. It's all on today's episode of Streaming Audio, a podcast about Kafka, Confluent, and the cloud. Hello, and welcome to another episode of Streaming Audio. I am once again, seemingly without fail your host, Tim Berglund and I'm joined in the virtual studio today by the CTO and co-founder of Rockset, Dhruba Borthakur, Dhruba welcome to Streaming Audio.
Thank you. Thanks for inviting me, Tim. Great to be here.
Now, first of all, I have to get the obligatory Rockset jokes out of the way. I think there's a very subtle difference in pronunciation. Rockset is a start-up that seems to be based around RocksDB and building cloud services and things like that. On top of that Rockset was a band in the late 80s known for the hit single, It must've been love. So you are definitely the RocksDB startup company and not the 80s band.
Yes, absolutely. Yeah, we use RocksDB a lot [crosstalk 00:01:29]
People of certain age in our audience are going to need that clarification. And I just wanted everybody to be clear. So tell Dhruba tell us, I guess, start with a little bit about you a little bit about your background, what have you done that got you to what you're doing right now?
Sure. Yeah, absolutely. Thank you again. Thanks for inviting me here. Yeah, like you said, I have worked with RocksDB. I'm the founding engineer of the RocksDB database that we built when I was at Facebook. This was probably 10 years back or so. RocksDB is an open-source key-value store that is essentially used by Kafka Streams and we also use it as Rockset in the cloud service that I'm currently building, but it's a core foundational piece of a lot of infrastructure that's being built.
So yeah, I spend a lot of time building RocksDB. I spent a lot of time building other database systems, including HBase at some previous point in my life. And I was also the Apache project lead for the Hadoop file system. This was way back in 2005 or 2006. This was when Hadoop was only two or three people. I was at Yahoo at the time. So I had built a lot of backend systems over the last, maybe two decades now, but right now I'm at Rockset. I'm the CTO and co-founder, and we have a great cloud service that uses RocksDB and gives this power for people to use RocksDB for real-time analytics.
Nice. I want to get into that stuff by the time we're done. And this is a judgment on my biographical research of you because I didn't know you were a project lead of HDFS, like you said, that's back when Hadoop was HDFS and a job scheduler and not this menagerie of things named after various animals.
Yeah. This was a time, I still remember the early days where there was one summer intern in the Yahoo search team and he was the first user of Hadoop. So I still remember those days when the first person was trying to use this [inaudible 00:03:30] large-scale indexing of big datasets. And it was fun time at the time. This was 2006, I think something like that.
Yes, the heady days, the early days of Hadoop. So let's get started with RocksDB itself. Like you said, it's the default state store for Kafka Streams and plays a key role there that maybe we'll talk about some more and I'm drawing a blank on other examples, but I know I've seen it finding its way into other infrastructure cases like that. I need a particular key-value store embedded in my, whatever it is. And I've just been saying for the last couple of years, "Hey, if you don't know RocksDB, you probably check it out because it's becoming an important technology."
Yeah, sure. It's like a storage engine that is designed for storing large volumes of data, right? It's a key-value store. You can store a lot of data. It can support high write rates and also it can support high credit rates. So it's essentially built as a storage engine that works optimally when your storage is either memory or SSDs very different from spinning disks that other storage systems were built earlier for, right?
So this is basically the innovation for RocksDB that how can we leverage the power of flash drives and memory systems and build a database or a storage engine that can support high write rates and high credit rates as well? This was one of the requirements when we were building RocksDB at Facebook because a lot of data systems at Facebook needed a lot of writes and also a lot of queries at the same time. Does it make sense?
I have got you. It does that you said one thing that surprised me. And so it's like a three-part question in my mind and I don't want to ask it as three parts because that's confusing. So I'm going to ask it a part at a time and it feels like I'm being the prosecuting attorney but I'm not trying to trap you. First of all, my understanding, I thought RocksDB could fairly be described as a log-structured merge-tree structured database. Is that right?
Okay. Can you tell us then what is a log-structured merge-tree?
Absolutely. Yes. So this is the difference from traditional storage engines that were pre-RocksDB. So most traditional systems like say my sequel or Postgres or any other systems you're talking about. Those mostly used B-trees and that's widely known in the industry, right? But for RocksDB, what we did was we essentially implemented a log-structured merge algorithm that was published in a paper almost 10 years ago though.
Though the difference here is that in a log-structured merge-tree when a write happens to your database, it is just written to a new place in the storage system. It's not merged along with existing data that is already out there, right? And in a background process comes along and tries to merge these things on the side. So now what happens is that think about it in a B-tree or a binary tree or traditional systems when the write comes in, you need to go find a place where the existing data is, then you modify it and you write it back, right?
So there's a lot of read modified, writes in B-trees backend systems. Whereas in LSM systems, the read modify writes are less because when the new write comes in, you have to update something. You don't have to read the original data from the disk, apply it, and then write it back. You just write the new data into a new place in the storage system, right?
And then over time, there is some background compaction processes and very interesting pieces of software that runs on the backend transparently for you, and then compacts these things and make sure that you can always see the latest version of data when you do a read. So this is essentially the difference between a B-tree and an LSM. I can support high write rates because all the writes are going to new places in the storage system. Am I making sense?
They're all sequential. That's the log-structured part.
Everything's log basically. And so you make these sequential writes.
Yeah. So what I was trying to say is that your point about logs is super accurate. So what happens is that when new data comes in, it gets written to it a sequential log or sequential place on the storage system, right? And then the compaction process also. It reads all these things that are written sequentially and writes it to new places, also in a very sequential manner.
So every readers and all the writes are all happening in a sequential pattern. There are no random writes on storage anymore if you use LSM technology. All the writes are always sequential in nature. So that absolutely is like a log at a very small scale, not at the Kafka scale, but a very small scale inside one machine.
Somebody needs to build giant LSM database on top of Kafka, HackerNews, somebody go do it and that'll make it front page, no problem. No, that's a terrible idea, please don't. So that makes sense. Everything's a log. You do compaction because reading logs is a pain and you want all that to be efficient.
The thing that set me off down this path is you said it's designed for in-memory and SSD. I've always thought LSMs were optimized for spinning rust-based drives because they package up writes as these sequential things. And that's more efficient when you've got seek time and moving sectors under the head and all that. So am I wrong about that?
No, that's a great point. Actually, the first commercial implementation of LSM was with HBase. That is where I also wrote a lot of HBase code. This was an LSM engine, and it was mostly running data for HDFS, which is all of disk-based systems. This was 10 years ago. So you're absolutely right.
It came up more for disk systems, but what has happened is that it can essentially drive very high write rates to your disc because every write is sequential in nature. So irrespective of whether it's a disc or an SSD, if you want very high write rates, you need to use an LSM. Otherwise, you can't really get to the 500 megabytes a second that an SSD might let you write to it.
Sure. Okay. So there's still money to be made in making your writes sequential on an SSD just because they're better at random writes.
You're still going to have to try to saturate the bandwidth in an LSM.
Right. Exactly. Yeah. I saw this happen close because the first use case for RocksDB was indexing the Facebook social graph. So that Facebook social graph is all user posting comments, likes photos, this and that, but then you need an inverted or you need a reverse index for all these things, right? So RocksDB was super useful because, at that time, flash drives also became very popular in the sense the price of flash started to fall. And it became a real feasible thing for Facebook to deploy large-scale flash drives.
And so they needed an indexing engine, which can index all the social graph and give great query latencies as well as update latencies to the Facebook newsfeed. So when you look at Facebook, when you use the Facebook app, the newsfeed that you see, all the posts, likes, comments, that's all served from a RocksDB database in the backend, not one machine, of course, there are hundreds of machines, but they're all RocksDB based because that's an analytics engine that is out there. And it needs very low latency queries from hundreds or billions of people refreshing their feeds.
Their feeds, compulsively, doom scrolling, and doing all those things people do. And that's what I want to get into. Not so much what's the architecture of Facebook because I know you're not at Facebook anymore. And I'm not asking you about the internals there, but as I understand it, RocksDB itself, the open-source database is like a single node affair, right? It's a thing that runs on some cores and it expects to own a file system and writes files to that file system. And so it's not a distributed system. It's a program that runs on a computer like the good old days.
Well, two questions, number one, let's just start with a simple one. Where are some other places where it has found its way into the world? So it started, you use it at Facebook, Kafka Streams adopted it. Are there any other big things we know about that people would recognize where it's RocksDB under the covers?
Yeah, absolutely. So for example, Apache Flink, there's some streaming software that uses RocksDB under the covers. There is things like Cassandra. I know your association with Cassandra before, but now Cassandra also has something called Rocksandra that Facebook open-sourced where storage engine for Cassandra is actually RocksDB. That is what Facebook has in production. And then of course the startup that I worked for Rockset, we use RocksDB heavily. So it's a distributed version of RocksDB that we have made it as a service so that people can use it without pinging them on the internals of how to use a database system.
Yeah. I think all of these are essentially related to an analytic system or somebody who is trying to do analytics on large data sets. All of them fall in that category at least from what I understand. Even MySQL has a RocksDB backend. Now it's called MyRocks. I think it is commercialized by some companies nowadays. So yeah, you would see RocksDB being the default storage engine for most data structures or data storage systems that are out there these days.
Okay. Turns out it's a good idea to throw an LSM under new whatever, right?
Yeah. I think just if you use storage our storage not on spinning disks but on flash drives, you get a lot of performance benefit if you just use RocksDB under the covers.
Nice. And I guess, just to dig into that a little bit and I'll volunteer this, but yeah Kafka Streams has a lot of state to manage. That's one of the things that the Kafka community discovered early on was the consumer library is great, but most of the interesting things you're going to do reading streaming data are going to be stateful and you got no help, right? It's just letting you solve that state problem. So Kafka Streams came along.
It had a solution to state and there's a lot more to the thing than just RocksDB, but it needs per node of your streaming application, there needs to be a local key-value store to hold whatever it is. There's lots of state that you could create in the stream processing topology needs to live somewhere. And so by default, that's in RocksDB. In the case of Kafka Streams, it's also replicating it back to an internal Kafka topic. And there's more to the story, but that local copy on in-memory and on disc is important. In off-heap memory, I should say and on disc, that off-heap is always a good thing.
So tell us more about where you go from there. I mean, you building a cloud service around this, but cloud service around a single node embeddable database. I know that's not your story. This isn't giving me a hosted RocksDB in the sky. So what is it that you guys are doing and how are you doing it?
Yeah. No, absolutely. I can explain a little bit. So what happens is that, like you said, Kafka is so popular nowadays, right? People put all kinds of data in Kafka. So Kafka in my mind is the pipes in your building that is carrying all the water, all the gas, it's like very core part of the utility system of an whole building, right? But then at the end of the pipes, you need some faucets to get the water out, right?
You might want to water your garden. You might want to put it in your kitchen, whatever else you need different kinds of faucets that can store some data and give you a different color to the utility that's coming up to the pipes. So RocksDB definitely solves this purpose where you can attach like you're saying, Kafka Streams or RocksDB instances to the end of these Kafka topics.
And these RocksDB instances, they're streaming data in from the Kafka topics and then indexing all the data because I mean, the data is all in the log but the value that you can extract is all in the index. So once you index this data in your Kafka, streams or on Rockset like we do, people can quit. It is indices or extract the value out of these data. So say, for example, when you asked about Rockset, which is similar to some streaming systems, but essentially it's a distributed version of RocksDB where these RocksDB instances are tailing data from different Kafka topics and partitions and indexing it and giving it equitable form of this data, right?
And then you can essentially run SQL queries on Rockset. You can issue a SQL query. The SQL is distributed among all the nodes that are running the RocksDB database and doing joints aggregations, group buys, order buys, whatever it is, the power of standard SQL without having to think about RocksDB, or how to configure this database or anything, it's just a easily usable SQL interface on lots of streaming analytics data that you might have on Kafka. Am I able to explain it-
Yeah, absolutely. It sounds like at that point, the fact that RocksDB happens to be a storage and key-value store layer underneath in that backend is interesting. It's just that's cool but what you're providing is a hosted analytics database, what it sounds like.
Correct. Exactly. Yeah, absolutely. There are lots of analytics systems that are out there these days, right? But the focus for Rockset is again, using RocksDB. How can we make sure that you can power real-time analytics, where when data comes in, you can query it as soon as it is coming into Kafka Stream. Let's say you have an even processing application and you're writing events to Kafka.
And you want to query, and you want to get some intelligence out of this data set as soon as the events get produced, like mostly focused on real-time analytics. So RocksDB, because it can do high rights as well as high credit rates. We can actually update these things in place in the indexing system, which are components that are hanging at the edge of the Kafka topics.
There's no delay, there's no ETL that you need to get data from Kafka and then transform it and then make it, put it in a warehouse. You don't need any of those things. You can immediately query data, which is why Rockset is focused mostly gets a lot of value for real-time analytics systems where you need to query your data as soon as it gets produced.
Nice. That sounds great. In your mind, why is it an analytics database and not an application database? You're talking about analytics and I get it but why?
Yeah. Great question. So I think the difference is actually partly in your question itself because what we see is that real-time analytics is mostly used by applications in the sense that there are two types of analytics that I've seen at least in the last 15 years, based on my experience with Hadoop, Kafka and other systems is that there's analytics that helps your business better, which is what traditional analytics is.
But then there's a lot of analytics that is happening now, which makes your products better. These are products that people are building to make that product better. Take for example, like online gaming systems or Facebook application. It's an online product it needs analytics, like gaming systems. They need analytics, a lot of some distribution systems. They need analytics that are built into the product. Am I making sense?
It's not like analytics to make your business better. So the line between using analytics for applications and real-time analytics is merging in that sense where real-time analytics is more about making your products better because if analytic systems is stale, let's say it's 10 minutes old data, or it is not running at 99.99 uptime, your product might suffer.
Whereas for traditional analytics earlier times, it was more about a backend system that is making your business better, giving reports to your CEOs and the CIOs and giving the answers to what-if questions, which you don't really need high uptime, or not really application focused. So you are absolutely right. Yeah. Even if Kafka Streams use case, I think you might be seeing use cases where there are focus where people are trying to hook in some of these systems into their products that they're shipped to their customers. [crosstalk 00:21:44]
Yes. That's absolutely Kafka Streams, case sequel DB. They're totally oriented around that. Is there a connector? Kafka connect connector?
Yes. So there's a Kafka connect collector. That rocks that has a, so if you have data in Kafka, it's very easy to use the standard. It's there in the Confluent store that you can download the Kafka connector from.
Confluent Hub .
Confluent Hub. Yes. And also Rockset also has a write API. So Rockset is like a database, or you can also write your own application, get it from Kafka do some processing in your application and then write it to Rockset. Both of them are possible. So the Confluent Hub addition that we did was easy because I think we got a lot of support from Confluent trying to figure out how to write a very optimized connector to be able to put data from Kafka into Rockset.
Yeah. Writing a connector is easy. Writing an optimized connector is not easy.
It's sometimes difficult because sometimes the person has to manage the connectors. So we have to somehow sometimes teach the people who use the Kafka connector saying that, "Hey, use it in this fashion, make sure it's always running, make sure this, that." So there are some operational challenges that at least I have seen when I'm talking to people who are using our connectors.
Cool. If anybody wanted to kick the tires on your cloud service and check it out, where should they go to learn more?
Yeah. So we have a cloud service at rockset.com. So if you go to rockset.com, you can sign up for it. You can create an account. And what you can do is that you can... Only thing you need to do is to set up the Kafka connector on your Kafka side and say, "Hey, deposit this to this Rockset account in the cloud." And essentially set up point and clicks are going to get you an SQL query interface to all the data that you have in your Kafka cup.
Sounds good. My guest today has been Dhruba Borthakur. Dhruba, thanks so much for being a part of Streaming Audio.
Thank you. Thanks a lot for having me here, Tim. It was fun talking to you.
And there you have it. Hey, you know what you get for listening to the end? Some free Confluent Cloud. Use the promo code 60PDCAST—that's 60PDCAST—to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 31st, 2021, and use it within 90 days after activation. Any unused promo value after the expiration date is forfeit and there are a limited number of codes available. So don't miss out. Anyway, as always, I hope this podcast was useful to you. If you want to discuss it or ask a question, you can always reach out to me on Twitter @tlberglund, that's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on a YouTube video or reach out on Community Slack or on the Community Forum. There are sign-up links for those things in the show notes. If you'd like to sign up and while you're at it, please subscribe to our YouTube channel and to this podcast, wherever fine podcasts are sold. And if you subscribe through Apple podcasts, be sure to leave us a review there that helps other people discover it, especially if it's a five-star review. And we think that's a good thing. So thanks for your support, and we'll see you next time.
Using large amounts of streaming data increasingly requires interactive, real-time analytics and dashboards—and this applies to any industry, including tech. CTO and Co-Founder of Rockset Dhruba Borthakur shares how his company uses Apache Kafka® to perform complex joins, search, and aggregations on streaming data with low latencies. The Kafka database integrations allow his team to make a cloud-native analytics database that is a fundamental piece of enterprise infrastructure.
Especially in e-commerce, logistics and manufacturing apps are typically receiving over 20 million events a day. As those events roll in, it is even more critical for real-time indexing to be queried with low latencies. This way, you can build high-performing and scalable dashboards that allow your organization to use clickstream and behavioral data to inform decisions and responses to consumer behavior. Typically, the data follow these steps:
For example, when working with real-time analytics in real-time databases, both need to be continuously synced for optimal performance. If the latency is too significant, there can be a missed opportunity to interact with customers on their platform. You may want to write queries that join streaming data across transactional data or historical data lakes, even for complex analytics. You always want to make sure that the database performs at a speed and scale appropriate for customers to have a seamless experience.
Using Rockset, you can write ANSI SQL on semi-structured and schemaless data. This way, you can achieve those complex joins with low latencies. Further data is required to supplement streaming data, but it can be easily supported through supported integrations. By having a solution for database requirements that are easily integrated and provide the correct data, you can make better decisions and maximize the result.
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.Email Us