So much of the modern world runs on REST API's, but is that the only choice? Well, of course, it isn't. Today, we'll be talking about GraphQL, where it fits in with event streaming, and where you might and might not want to use it. All that and more on today's episode of Streaming Audio. Streaming Audio is brought to you by Confluent Developer, that's developer.confluent.io, which is the one-stop website to teach you everything you need to know about Apache Kafka, event streaming, event-driven architectures, and systems.
You'll find getting started guides, which will teach you how to connect your language to Kafka, all the way up to high-level architectural patterns of how you should build event-driven systems, and then back down into the guts of Kafka to teach you about the internals, and how it works under the hood. You'll also find plenty of educational courses, which you can walk through with Confluent Cloud. If you choose to do that, sign up to Confluent Cloud with the code PODCAST100, and you'll get a hundred dollars of extra free credit to play around with. With that said, let's start today's episode.
I'm your host, Kris Jenkins for another episode of Streaming Audio. With me today is Gerard Klijs. He's been a backend engineer for 10 years. He's been working with Kafka for six of those, and he's here today to talk to us about GraphQL. Gerard, thank you for coming to the show.
Yeah. Thanks for having me.
Pleasure. So let me step back, and set the scene for people that don't know about GraphQL. So here I am, I'm writing a web server as I've done 100 times before, I run a query against the backend. I get an object back. I serialize it to JSON, and I spit it out over my REST API. Why should I step back, and consider GraphQL in that chain?
That could be several reasons, but one of the most important ones I think is also why Facebook's company that kind of invents GraphQL started this. And sometimes you really want to support all the clients. In the case of Facebook, that could be many different applications and also things like Androids apps, and people might not always update those. And then once they request data, so for example, you have a name fields and at latest point you decide it becomes first name and last name. So then with GraphQL you have to say, which bits data you want to get. So that means that if I deprecate to get name and I introduce get first name, get last name, then server site, they could exactly see, "Oh, this field is still being used, even though we deprecated it so we don't stroke away that go to yet." And then it's becomes more graceful to deprecate things than if you would have something like RESTs.
Because then for example, you would have one payload that has just the name one, and then you probably calls field one, because other ones, if you don't version it, then the old clients will break and then you have a field two version, and that has the first name and the last name, a separate fields. And that's fine but then you still don't know from the clients that ask for the field one version, what they will use because then there might be 30 fields in there and probably, maybe they only need one. And that's not the one that's over fetching. Maybe for some reason, it's only want to display the name, but you return a JSON with 30 fields. Then they have to ask that JSON, what else 30 fields just to get the name part. I mean it's GraphQL, you can say, "I just want a name," and just use that to display something of.
Right. So it's partly migration, but it's also kind of analytics on the API. You can see which field are used a lot, which fields perhaps completely useless.
Yes. And that makes migration a little easier.
Oh God. Yeah, I can imagine. Interesting. So let's get a sense of the shape of how you actually use it. So it's not like REST where your URI tells the server what you want, right?
Yes. The more [inaudible 00:04:20] on the actual implementation usage pulse, and then you have an object that can become quite big. And in there you really define sort of a graph of what you want to have. So in this case, for example, for the name part, it could be that you say, "I want a person based on some ID and from the person, I just want to have the name value." And now much you get back is a proper JSON with a person and a name and then surname.
And you can also dig further down. So I could say, I want a person and their address and I want the address to be resolved to a latitude, longitude.
Yes. For example, [crosstalk 00:05:03] and the friends of their friends or you can make it pretty crazy queries. This is also one of the risks, but yeah.
Right. That feels like it would lead into sort of different documentation because you wouldn't... In a REST API, you often want to know the shape of the thing coming back, but in GraphQL, you're often in the state where the question you ask determines the shape of the thing coming back so it's more in your control, right?
Yes. You can also fully control [inaudible 00:05:34]. Like I said, even when you are query for a person, maybe for some reason you want to have as JSON back as I don't know, important person, then you could also say, "I want to query this as an important person, but I want to call the person thing," and then you get a JSON back what important person is so it's very variable also, it's how you want to get a JSON back.
So you can do renaming on the fly to suit you as a client?
Yeah. Or you could, for example, query for three persons at the same time with the same query. And then of course you have to give them names so you know which is which, but things like that you can also do.
Right. Okay, yeah. So you're not restricted to say getting one user out or all the users out. You can say, "I want these three because it's mother, father, child."
Yes.
Okay. Yeah. I see how that would work. How do people typically get started with this?
One of the things you often see is that mostly your front-end team is getting kind of frustrated, for example, with the backend team is often late with changing things or exchanging things without communicating and then things start breaking, for example. So then they start building a BFF that's that's called a Backend For Frontend where they put it in between and then they typically consume [inaudible 00:06:56]. And they will at the server level kind of transform it into a GraphQL endpoint and then translate every... So that queries coming in to REST calls, combine them together and throw them back to the clients.
Right. So I don't have to throw away my REST client, I can build this on top as a kind of more flexible query there.
There are multiple ways to build REST APIs onto GraphQL.
Okay. That's interesting. My other big question, I think is it just for querying or can I if I need to create a new user, is that handled similar?
Yeah. So typically like I always find it a bit confusing with REST because I think typical REST, you have seven type of operations and GraphQL just has... And then it's always a debate which one it's really should be.
[crosstalk 00:07:52] On the team. Who's the true believer in the right REST way, right?
Yes. So things like that. So with GraphQL [inaudible 00:07:59] is so you have queries, they shouldn't change anything. And that typically as for asking for specific information. Then you have mutations so they do change anything, and because they can change anything, it's important if you have multiple mutations in one request that they are executes in order, that's also part of the specification and then you have subscriptions with this kind of streaming APIs.
Tell me about those because we're very interested in streaming APIs around here.
Yeah, I understand so that was also one of my biggest interests to get it working so the demo application, I didn't mention it, but I once built a demo application of kind of a banking app, I think about four years ago with closure. And then I used Kafka for the messaging and you also had a front-end and then all the transactions would be streamed. So you could see them real time popping on your screen using subscriptions. But when I started playing around with different server implementations and client implementations of those subscriptions, I also found some rough edges. So some of them do things just a bit different. And for example, in the Kotlin GraphQL also found some errors so then I found them myself and I fixed them. I couldn't [inaudible 00:09:21] them so that was kind of nice that I could do something back for the open source community. But yeah, because often people mostly care about the mutations and queries and subscriptions are not that often used in production also because it's kind of hard to scale them properly.
Yeah. If you've got too many connected clients, right? That can be a challenge.
Yeah. And since it's a stateful connection and yeah.
Yeah. I've worked on UIs where it's all web sockets, it's all streaming data and the end user experience is so much nicer when everything's live.
Yes.
So I'm a big fan of doing it when you can. [crosstalk 00:10:04] So give me some gory details.
For REP sockets. That's kind of the problem that is not an official specification of GraphQL over REP sockets. And that kind of says, "You can implement this like this," and then most of the libraries implement like this, but some diverge just a bit and then some clients with some servers and then, it won't work. Yeah.
Right. So as you move from language to language, which it sounds like you do quite a lot, you get a slightly different experience. We've already mentioned closure in Kotlin and I know you do some REST so...
Yes.
We're clearly dealing with a polyglot here.
Yeah, a bit.
Yeah.
For my workers it's still mostly just JVM and mostly just JAVAtool, but yeah.
Okay. So give me some gory details. What does subscriptions actually look like? How do I set one up? Is it like running a query or?
Much like it but like I said, the one I've most experienced with is over WebSockets. I know there are also some that use Server Sent Events currently, but I've never used those. So then you set up a WebSockets and then basically you also have something like a running query and what something you can do on the server sites is for what I did with the bank application, then I create kind of a filter for all the incoming transactions. And if it's the filter letting that transaction, so for example, it's matching the account number and the subscription was started saying it should match this account number and then it's sent to the clients via WebSockets.
Okay. So is it the backend that setting up a GraphQL subscription or is it still the front-end saying, "Hey, I want you to subscribe to these things for me."
The front end is always starting.
It's always-[Crosstalk 00:12:05].
But then of course on the backend, since it's a WebSocket, you get a [inaudible 00:12:08] on the backend part.
It kind of sounds a bit like ksqlDB to me in what you set up these streaming queries that run constantly in the backend, but it's the client that defines them.
Yes.
Oh, I see that parallel. Yeah. Having worked with a lot of front and backend teams, I can see the backend saying, "Oh, we don't want to give up the control to the front end, but we do want them to bother us less." There's a trade off there, right? Interesting. So you've used this in production?
Kind of.
Tell me that story.
I think, yeah. I wanted to use it at some point, but then we use it as a hackathon, but then especially for the streaming parts, then it became much too complex for just one day to fix it in the cloud environments so it was kind of bummer but then that was because the hackathon was on a production-like environment so that was interesting. And I also did something for this national police, but I was already there before I came there and then we also used GraphQL. But then GraphQL was used mostly server to server and then I had a bit of my doubts about that.
Oh, that's interesting. Why would you use it server to server? Is it just, you had another query layer to speak to different machines?
Yeah, kind of and of course you have still the overhead issue, but mostly on backend. Backend is mostly close together so then the overhead doesn't really matter that much. And also the versioning since you probably deploy all the servers yourself at least can see which version they are on, you are much more in control with the application so then in that case also makes less sense and it's getting pretty for both. If you just use REST, you can just go the end points and here you have to really specify all the field you need. And often the use case was that we want to have everything to put it in some other database and then yeah. Everything, that can be quite a huge thing. And then also if something changes and updates it, of course, then you might run the risk of missing something.
Right. [crosstalk 00:14:36]
Because they might update the schema at some field, but then you also have to update your schema... You have to update the query otherwise you don't get that additional field.
Right. Yeah. So there's always backwards compatible migration and forwards compatible, right?
Yes.
You want to be able to anticipate future changes. Again, that's reminding me of Avro right?
Yes.
Migrations in Avro. It's not enough to think about a serialization format. You've got to think about change as well. Okay. So do you think GraphQL is particularly suited to Kafka event streaming? Is it particularly suited to any backend? Is there a synergy there?
It might be used as part of solution to help with scalability since of course with Kafka, you can just start multiple consumers spreading all those events and then just for each consumer could consume potentially all the events and then you have a WebSocket over and end points for each consumer and then once they see such an event and then set it to the one that subscribed on the GraphQL endpoint so I think it could be part of scaling out.
Yes. Yeah. I can see that. So you deploy every consumer with its own GraphQL?
Of course, if you want to do it smarter and you have really a lot of messages, you would probably kind of want to direct people to the correct instance that only gets messages that might be related to them, but then it becomes a bit more complicated and then probably just a custom WebSocket solution would be better because then you're more in control of what you're sending to the clients.
Right. So you are not saying that GraphQL is the solution, just that it should be one of your options on the table?
Yes.
Yeah. Okay. So tell me if I wanted to get started with it, where should I start? What's the best way to get learning this?
That's a good question. I don't really know anymore since I've started for you. I know Apollo is a big company that's very big on GraphQL so it has a lot of documentation so I think also interactive playgrounds and stuff like that so that would probably be a good place to start. Of course, if you really like documentation and to read things, you could also read the GraphQL specification, some people like it. It's quite easy to do it.
Should I confess at this point? I have so I read it a few years ago for a project and yeah, I would agree it's not a bad read as specs go.
If you really want to know what really GraphQL [inaudible 00:17:31] is about then I think that's a good place to start.
Yeah. For specs, not too heavy and not too brief. It often is the best place to get started, right? And I was going to ask you, types. That's another question in my mind about GraphQL is, can you introspect the schema or the type to different fields. [crosstalk 00:17:57]
If we have names, then it could be a string. So then also on the client side, you know that it's a string. And for example, in your front end, if you use something like type script and you generate your API based on the GraphQL schema, then you know this is a string and there are certain operations I can do on that so that's really helping also compared to REST where you sometimes can do its based on the API specification also to generate something like that. But what I've seen in practice, I've not done that much front-end, but then it's often just on my hands. I need these in these fields and I put them in a TypeScript definition because I know that's what I get back, but yeah.
Yeah. [crosstalk 00:18:45] TypeScript, definitely. Those are easy to write, but...
Yeah. But then you have to maintain also and I think especially if it's things like React, then there are really nice things that also do some state management and stuff like that. So if all your data is coming from GraphQL, you can kind of hook that into it and then you get auto completion and stuff like that.
Oh really? That could be an interesting side project because I've got this little hack I do with WebSockets and Kafka and Python and just straight WebSocket API. I might see if I can do something with GraphQL with that because this sounds really interesting to me. Before you sort of wrap that up, any last tips for someone working with GraphQL? Anything to watch for?
There are some downsides on GraphQL. Most of them are, especially if you have a public API and you don't really know what people are going to query for. So one of the most famous open GraphQL is the one from GitHub and I know they have pretty some security things about so before they execute a query, they check how complex it is and if it's too complex, you just get an error back. They don't send it to the backend and have it rambling on it and then they just send it back so that's one of the things.
So you could have this BFF, you called it? Backend For Frontend acting into the police for your query.
Yeah.
Is there any support in GraphQL for that or do you just have to roll your own?
I think there's probably some community things with hanging about and the complexity of a query. I haven't used it myself because I haven't worked on public APIs.
Okay.
But there are some talks from mainly GitHub and stuff for how you can secure your public GraphQL API so that's one of the concerns.
Right. I will see if we can get some of those put in the show notes so that people can find the links easily. Gerard, it's been a pleasure talking to you. I know that one of your other side projects is using Schema Registry in REST client.
Yes.
So maybe we can entice you back someday to talk about that.
Maybe. Yes.
Keep it so. Thank you very much Gerard Klijs. Thank you for joining us on Streaming Audio.
Yes. Thanks. Bye
And that brings us to the end of another episode of Streaming Audio. My guest today has been Gerard Klijs and we've been talking about GraphQL. I've been Kris Jenkins and I will continue to be in the next episode so I hope you'll join me for that. If you've got any comments or questions, then please do drop us a line. And especially if you've got your own story to tell on a future episode, let us know about that. If you are listening to this, you'll find contact details in the show notes of your podcasting app. And if you're watching it, there's probably a comment box just down there you could drop us a line with. So please do that. Your reminder that Streaming Audio is brought to you by Confluent Developer, that's developer.confluent.io, which is your one stop shop for learning all about Apache Kafka, event streaming and event driven architectures.
You'll find courses there, getting started guides, interviews with people who've really kicked the tires and put Kafka in production and seen what it's like in the trenches. And you'll also have the chance to go through various courses that will teach you in depth about how to work with Kafka. If you choose to sign up for one of those and you sign up for Confluent Cloud to run your Kafka instance, remember to use the podcast code, which is PODCAST100 and that will give you a hundred dollars worth of free credit. And with that, let me just once again thank Gerard for joining us and thank you for listening.
What is GraphQL? And how can you combine GraphQL with Apache Kafka® to query data in real time?
With over 10 years of experience as a backend engineer, Gerard Klijs is a Confluent Community Catalyst, a contributor to several GraphQL libraries, and also a creator and maintainer of a Rust library to use Confluent Schema Registry with Java client. In this episode, he explains why you want to use Kafka with GraphQL and how they work together to bridge the gap between backend and frontend to make data more easily accessible in the frontend.
As an alternative to REST, GraphQL is an open source programming language developed by Meta, which lets you pull data from multiple data sources via a single API call. GraphQL lets you migrate and deprecate data easily. For example, if you have a `name` field, which you later decided to replace by `firstName` and `lastName`, you can group the field names together and monitor the server for query requests. If there are no additional query requests for the deprecated field, then it can be removed from the server.
Usually, GraphQL is used in the frontend with a server implemented in Node.js, while Kafka is often used as an integration layer between backend components. When it comes to connecting Kafka with GraphQL, the use cases might not seem as vast at first glance, but Gerard thinks that it is due to unfamiliarity and misconceptions on how the two can work together. For example, some may think Kafka is merely a message bus and GraphQL is for graph databases.
Gerard also talks about the backend for frontend (BFF) pattern as well as tips on working with GraphQL.
EPISODE LINKS
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.
Email Us