April 29, 2021 | Episode 156

Data Management and Digital Transformation with Apache Kafka at Van Oord

Transcript
Notes

Tim Berglund:

What would a 150-year-old Dutch company that builds wind turbines and does oil and gas infrastructure and seawalls and land reclamation and built the Palm Islands of Dubai, what would they have to do with Kafka? Well, I don't know, but I want to hear about it because all those things are really cool. So, that company is Van Oord. And today I have Marlon Hiralal and Andreas Wombacher on the show to talk about their initial Kafka use case and really their entrance into the world of being a technology company. It's all on today's episode of Streaming Audio, a podcast about Kafka, Confluent, and the cloud.

Tim Berglund:

Hello, and welcome to another episode of Streaming Audio. I am as always your host, Tim Berglund. And I'm joined in the virtual studio today by a couple of guests from a company called Van Oord. By name, you are Marlon Hiralal and Andreas Wombacher. Marlon and Andreas, welcome to Streaming Audio.

Marlon Hiralal:

Yeah. Thank you for having us with you, Tim.

Tim Berglund:

I want you to tell us a little bit about yourselves and I'd love to hear a little bit more about Van Oord. I have just the briefest background on the company, but who wants to tell us about Van Oord? It's pretty cool. Pretty cool company.

Marlon Hiralal:

Okay. So in Van Oord, I am the enterprise architect also taking care of the data management environment. In fact, we started this journey from scratch. Van Oord itself is a family-owned company, 150 years of experience with the international marine as international Marine contractor. So as you can understand, we know a lot of how to do dredging, how to do windmills at sea, lay pipes and cables on the sea surface, as well as building infrastructure. Okay. I will just let Andreas also shortly introduce himself before I go further into telling you more about the Van Oord.

Andreas Wombacher:

Hi, my name is Andreas and I'm responsible for the data architecture of this platform environment. Van Oord is a big company doing infrastructure projects and as you can imagine, it's important that you understand how much time it would take for certain aspects. In these projects a lot of weather is involved, because if you have certain tides of waves, you cannot rise with a ship which has a 20 meter pull on top of it, right? It will tumble. So they have certain limitations on the velour. So the data is really important to understand of the environmental data, but also of how actually the project ran to make good estimates and to make good offers to the customers on what they can perform in which time, based around the globe and around the year.

Marlon Hiralal:

Okay. So for this in fact Van Oord has its own vessels. Over 100 large vessels. So we call them also the sailing manufacturers. So, [inaudible 00:03:30] what it means in fact is that those vessels have all types of equipment to be able to do those types of projects. It also means that we need all these types of data. Not only to daily report to our customers how good we're doing the project and how far we are in the project, but also to give the headquarter information about our performance on finance on the project, the planning. As well as, to have this data available, to also give to the government that the environment has not been damaged. Okay, because of a lot of times when we do those projects, one of the small letters are that we, it's our risk, that we keep the environmental sustainable.

Tim Berglund:

Of course, that's a reasonable, fine print, I suppose, it's on you to get that right.

Marlon Hiralal:

We need to prove and show where we have sailed, what we have done and that it was not our fault if something has happened.

Tim Berglund:

That's a lot of data. Obviously we're here to talk about what you're doing with Kafka, and you're doing some things with some Confluent technology, and that's what we're here for. But, this is like when you see a dump truck or something like the little boy in me kind of comes to life, and I just want to look at the dump truck. So I kind of want to talk about windmills and reclaiming land and stuff just for a minute. So if I've flown in to Amsterdam Schipol airport from the North, there are windmills in the water, there's wind farms out in the water. Is that possibly Van Oord built infrastructure out there?

Marlon Hiralal:

Yes.

Tim Berglund:

Cool. Okay. Now I know, cause I like this vivid, I'm usually jet lagged at the time, right. This is flying in from Newark or something like that or Dallas and I'm like waking up and sleepy, but I see windmills as beautiful and I love them.

Andreas Wombacher:

If you also see we have some name [inaudible 00:05:51] In fact, when it's really storming from the North Sea, that we close. Right, we have the gates that we can close to keep the sea out.

Tim Berglund:

Oh, okay. No I haven't seen that, that's cool.

Andreas Wombacher:

Also, from Van Oord. Yeah.

Tim Berglund:

Okay, and if I've been in the Northern part of the Netherlands, is it possible that's land? Has there been interesting reclaiming going on in the history of Van Oord where there's dirt that wasn't dirt?

Marlon Hiralal:

Well, I would say one of our major landmark is the Palm Islands at Dubai.

Tim Berglund:

Okay.

Marlon Hiralal:

Created by Van Oord.

Tim Berglund:

Cool. The Palm islands. Okay.

Marlon Hiralal:

Yes.

Tim Berglund:

Nice. All right. We'll have to link to a picture of those in the show notes that's created by Van Oord. Okay. See, now this is a really dangerous podcast because frankly, I just want to talk about these things now. And we have some technology cause that's what our jobs are. So we need to move on because that's super cool and really interesting. And it's kind of neat, 150 year old family-owned company, builds windmills, reclaims land. Love it.

Tim Berglund:

So anyway, you were talking about there are regulatory pressures for environmental impact and when you are explicitly reshaping the environment, you want to do that carefully. You want to do that in a responsible way. And that makes sense that you'd have a lot of scrutiny on how you do that kind of thing. And also being an old company, it's one thing to be like Uber, right? And I don't know how old they are at this point, maybe 15 years, but they're a digital native, and everybody's got legacy code. You know, if you've been around longer than a year, you've got legacy code. But the legacy systems at internet startups don't go back that far.

Tim Berglund:

But when you talk to old companies who were big enough to be using computers, when there were first computers, you have interesting legacy histories. There's stuff, that's old stuff. And so, I imagine a lot of that, you keep it running. It does its job. It's fine. But some of that you want to modernize so, that's what I'm interested in. What are some of the things that you're able to talk about that you're wanting to modernize? What systems do they come from?

Marlon Hiralal:

So for example, correct. So I told you, Van Oord is really a type of civil engineering company. So not a lot of software or application knowledge by that time, but you need an HR system. You need a financial system. So those system has been bought. We've been implemented the best way it can, but it has become quite monolithic, it has become quite old. And then just like you were talking about Uber, we also have an Engineering Department that couple of years ago started with we need to build some really fancy applications to be able to do fancy stuff. So we have, on one side, we have the old monolithic environment world in Van Oord. On the other side, we have the more modern that built microservice applications and types. But totally disconnected two different worlds that are not speaking to each other.

Tim Berglund:

The old legacy, we were using computers 50 years ago and the new, we're building shiny things. And we're cool and you're not.

Marlon Hiralal:

Yes. So here you can see, and then you need to understand that we were environment that is really about dredging. So software knowledge, IT was not really our piece of cake. So there's no ESB types of things. There is no data warehouse type of things.

Tim Berglund:

Okay. Okay.

Marlon Hiralal:

So, we are looking into this and that we have all type of challenges because our data need to be real time. Because we have, for example, data coming from applications from the vessels and we have geographical type of data. So, a lot of those things were big requirements for us.

Marlon Hiralal:

In fact, that's when we start looking into, now a lot of companies where we buy an ESB, we buy a Microsoft management system, data warehouse, data lake call it. That's when we say, Hey, we need to take a step back. We need to learn from the Uber's, from FinTech companies from [inaudible 00:10:51] call them. And then that's when we start to learn about Kafka.

Tim Berglund:

So, I kind of like this. I'm thinking of, you've got this long history of all this legacy technologies. And I think what you said was really, yeah, HR and Finance, that stuff was automated, but really it's a company that's about drills and shovels and ships and not a technology company. And you are now, and, kind of the saying that every company is becoming a software company, you're getting on board with that. But late enough in the game that you really don't have a whole bunch of mainframy kinds of stuff to deal with, you really do get to kind of come in like digital natives and build things fresh. Just kind of nice.

Marlon Hiralal:

Yeah. So exactly. And that's why I'm asking, I was talking also about us, we can start fresh, we can go to proven, coolest, but also proven technology. But at the same time we didn't have what the other company has. It doesn't have lessons learned. All the companies started with ESB and saw the pros and the cons.

Tim Berglund:

They know the cons, right?

Marlon Hiralal:

Exactly. So we were able to choose what we think the best fit solution, going with Confluent Kafka. But at the same time, we then had a history to learn about it to come here. So that's also a challenge for us.

Tim Berglund:

Right, right of a different kind. Were you guys involved Andreas and Marlon, were you involved with the choice to use Kafka or to go with Confluent or anything like that? Do you know any of the history of how you got down that path? You said you never had any ESB legacy to overcome, which is good, but event driven architecture is apparently interesting. So how did that all come about?

Marlon Hiralal:

So, our background, we know ESBs quite well, the pros and the cons. We know streaming platforms by that time, not Kafka, but some other ones and only us, we explain. We know data warehouse, data lakes technology. So what we saw is that in all those 15, 20 years or longer, still the companies that have bought all those types of technologies were busy with integration still have not finished yet. And at the end, we started looking to solution of Kafka. So what we thought then why not change it? Why buy? We don't say, okay, learn from this company. So we will start with Kafka. See how much we can solve with that. And if needed, we will look to other type of part solutions.

Tim Berglund:

Sure, if Kafka doesn't work.

Marlon Hiralal:

Yes. So we changed it around and say, even if it's 80, 90%, it's good enough right. We are coming from 0%, right.

Andreas Wombacher:

And the mindset a bit was on the one hand side, we needed infrastructure, which allowed us to connect different applications in a dynamic, flexible way. And on the other hand side needed infrastructure, which allowed people to look at the state of a system at a given time without connecting to the system itself. So people want to see how many employees are running around in the company as a dashboard and the other applications like payment applications, project planning as a force, they need the information which people will be around next week, next month in a timely manner. And this is where Kafka then fits very well in. So, we set it then up that we have a base changes from one system being communicated to other systems. And on the other hand side, keep kind of a persistent state of a system for a given point in time available for the users with different interfaces.

Tim Berglund:

I love it. There's a talk I've been giving as a keynote, a few times over the past year, that talks about the development of Kafka itself over time. From its beginnings and features it's added. And you get into current times with ksqlDB and Confluent Cloud, and things like that are expansions of the Kafka ecosystem. And also the way it develops inside a company. Like how does it first, like when Kafka is a toddler inside of Van Oord, what is it doing? And that's the example I give is you have one application, things are happening. You need the changes to get over here. And that's a great place to start because you're already trying to think about how to operate a new technology. And there's new event driven paradigms. Nobody's all that good at them because they're new, but it's a simple and easy to think about things like for everything. That's hard about it. At least the diagram on the whiteboard is easy. And, and so you can take those first steps. So I love that. That's your first use case.

Andreas Wombacher:

Yeah. And you see, we have obviously legacy applications, quite some. And they are often less equipped to deal with these event driven environments. And we are not in the position to exchange them directly. You really have to be pragmatic there. And then the approach of going with a change data capture. So propagating changes through the system is a pragmatic approach to integrate also legacy systems.

Tim Berglund:

Gotcha. But this is an internal system. This is, you're speaking in terms of HR things. So it's one of those old legacy HR applications that you're modernizing and not a data collection of what the vessels are doing, dredging.

Andreas Wombacher:

It goes across support, because HR, you have to be an employee of Van Oord to be able to participate in a project. So we need the people who are registered in the HR system, in our project planning tool to be able to be planned and assigned to a project. And also these people can only record efforts in regard of executing a project. So also there, we need the information of the people. So the knowledge of who is participating, who can do what, and who is around is something we're just one of the essential concepts, which we have to share of the essential data structures we have to share in the organization.

Tim Berglund:

Got it, got it.

Andreas Wombacher:

The same, like these vessels. Vessels are used, as you can imagine in Van Oord, everywhere, for project planning, for execution, for maintenance, for assigning locations, but also consumables of these vessels and all this. So procurement. So it's in all places, these phase concepts are really essential to use.

Tim Berglund:

So you're doing change data capture out of a legacy HR system.

Andreas Wombacher:

Yes.

Tim Berglund:

And using, integrating that data into we'll say more operational...

Andreas Wombacher:

Partly legacy system also, but also partly new applications, which are more capable of dealing with event driven systems. So it's both sites. A lot of challenges we obviously have is what is a good event to publish in this event structure, how to represent an employee, how to represent a ship and so forth. And for us, it's essential to understand who's the owner of these data and what do the different fields actually mean? So the data governance, data ownership, data stewardship, this is one of the essential concepts, which we are also representing here. And therefore, we connect Confluent Kafka with a data dictionary to maintain that.

Tim Berglund:

To maintain the...

Andreas Wombacher:

Well, the ownership And have that understandable and the meaning for it. And this also depends, or helps us defining the different events structures we are setting up. So from the data structure, but also from the organizational, the topics in the Kafka itself.

Tim Berglund:

Got it. Okay. So it sounds like a little bit of a governance layer that you've had to build internally.

Andreas Wombacher:

Yes.

Tim Berglund:

What, and just tell us kind of, from a technical standpoint, a few minutes here, your use case application integration, integration of legacy application into other legacy applications and things under development, under new development, like all that is textbook, but in a real company, there's always a little friction points and things that you have to build. And like maybe there's a connector that wasn't there or something like that. What have been to you the interesting parts of the development?

Marlon Hiralal:

Yeah. [crosstalk 00:20:50] No, no. It's just us.

Tim Berglund:

That's what people learn the most.

Andreas Wombacher:

A lot of our systems are Oracle based and you know with Oracle and CDC, it's not so easy. So there's obviously the GoldenGate option and then there's a Confluent option now also.

Tim Berglund:

It's plenty easy if buy GoldenGate, right?

Andreas Wombacher:

Yeah.

Tim Berglund:

I don't see what's so hard about this.

Andreas Wombacher:

Yeah, to convince the right people to spend the money maybe. But yeah. So there, we needed pragmatic solutions. Information is important for Van Oord to do their business. But it's not the core business. So it has to be efficient in costs to be able to do that. So this is one thing, the other challenge is really about organizing the internal processes. So what we have learned in the time when we dig out, who is actually responsible for which part of the data, and then you figure out, okay, we have one system, they actually tour three different parts of the organization operating on that system. They all claim to have ownership of certain parts of the information. And now we have to divide that again, to make sure that we can also ask the right people for access rights for the right data. So again, imagine if we are dealing with HR systems, it's more unpleasant if some people get the wrong information. It can also be costly with GDPR with everything going around

Tim Berglund:

Oh yes. Knowing this person is currently an employee here and is available to these kinds of assignments. Okay. But you know, here's this, person's national ID number. And like that kind of stuff, you don't want them...

Andreas Wombacher:

To read the person's earning.

Tim Berglund:

Right, you don't want that in the wrong topic.

Marlon Hiralal:

Exactly. And at the same time, we need to know the total cost of the person to be able to understand...

Tim Berglund:

Yeah.

Marlon Hiralal:

To make yes, exactly, to make the proposal.

Tim Berglund:

Of course. So this is why [crosstalk 00:23:21] costing has to know something, but...

Andreas Wombacher:

This is why ownership, they're so important. And it's important to understand how the different concepts or events and event structures relate to processes we have in the rest of the organization. So we are using their then enterprise architecture models to understand the relation of events and topics to the business processes. And with that, we relate it to data governance to explain what the different fields mean, and also who's owner of these and why they are on and what are the related processes to it. So that you get a pretty nice understanding of how the different pieces of the data platform relate to each other.

Marlon Hiralal:

And that integrated with the core from Kafka solution.

Tim Berglund:

Yes, yes, yes. Now Confluent Platform and Confluent Cloud, or just broadly just say Confluent, doesn't do all of that. We've got components of that. So it sounds to me, and this could be secret sauce that I'm asking about, and we're kind of coming up on time and I'm asking this big question, but I really want to know. Can you talk about what you had to build to make that work? Cause it's a hard problem.

Marlon Hiralal:

Yeah. Well, to be honest, we quite often source it to different guys. So we don't have a lot of secrets. Andreas go ahead...

Andreas Wombacher:

Yeah. We are using Apache Atlas and Models4Insight for the data dictionary.

Tim Berglund:

Excellent.

Andreas Wombacher:

And which has already from its history, from Hadoop systems in relation to Kafka and has some features already, and we use Models4Insight to manage, actually make models, which are enterprise architecture models.

Tim Berglund:

Okay. Nice.

Andreas Wombacher:

And yeah. In Confluent, you have now these extended capabilities of recording the flows and seeing what is going over which flows, which then also enables us to visualize that information in ICU made models. And you can see, this process actually is often as this kind of traffic going on and the connector here is stocking or having a problem because of their message format is not a compliant tool. And so this is kind of the impact on my processes and organization and things like this.

Tim Berglund:

So two things, number one, you guys should really consider proposing to Kafka Summit, APAC or Americas. That's a really interesting thing. And again, just the work the company does is cool. People want to hear about that, right? Oh, wow. I'm coming, and I'm also, and I'm just sitting here soliciting additional content. That'd make a great blog post as well. I mean, there's cool things going on there. So if you want to talk more about it, I love to find other ways to just kind of make this stuff more famous because you're solving problems that other people are solving. And I think other people could benefit from it.

Marlon Hiralal:

That's cool to hear. Thanks.

Tim Berglund:

Yeah. Well, my guests today have been Marlon Hiralal and Andreas Wombacher. Marlon and Andreas, thanks for being a part of Streaming Audio.

Andreas Wombacher:

Thanks for having us.

Marlon Hiralal:

Thanks for being on your podcast.

Tim Berglund:

And there you have it. Hey, you know what you get for listening to the end? Some free Confluent Cloud. Use the promo code 60PDCAST—that's 60PDCAST—to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 31st, 2021, and use it within 90 days after activation. Any unused promo value after the expiration date is forfeit and there are a limited number of codes available. So don't miss out. Anyway, as always, I hope this podcast was useful to you. If you want to discuss it or ask a question, you can always reach out to me on Twitter @tlberglund, that's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on a YouTube video or reach out on Community Slack or on the Community Forum. There are sign-up links for those things in the show notes. If you'd like to sign up and while you're at it, please subscribe to our YouTube channel and to this podcast, wherever fine podcasts are sold. And if you subscribe through Apple podcasts, be sure to leave us a review there that helps other people discover it, especially if it's a five-star review. And we think that's a good thing. So thanks for your support, and we'll see you next time.

Imagine if you could create a better world for future generations simply by delivering marine ingenuity.

Van Oord is a Dutch family-owned company that has served as an international marine contractor for over 150 years, focusing on dredging, land infrastructure in the Netherlands, and offshore wind and oil & gas infrastructure.

Real-time insights into costs spent, the progress of projects, and the performance tracking of vessels and equipment are essential for surviving as a business. Becoming a data-driven company requires that all data connected, synchronized, and visualized—in fact, truly digitized.

This requires a central nervous system that supports:

Legacy (monolith environment) as well as microservices
ELT/ETL/streaming ETL
All types of data, including transactional, streaming, geo, machine, and (sea) survey/bathymetry
Master data/enterprise common data model

The need for agility and speed makes it necessary to have a fully integrated DevOps-infrastructure-as-code environment, where data lineage, data governance, and enterprise architecture are holistically embedded. Thousands of topics need to be developed, updated, tested, accepted, and deployed each day. This together with different scripts for connectors requires a holistic data management solution, where data lineage, data governance and enterprise architecture are an integrated part.

Thus, Marlon Hiralal (Enterprise/Data Management Architect, Van Oord) and Andreas Wombacher (Data Engineer, Van Oord) turned to Confluent for a three-month proof of concept and explored the pre-prep stage of using Apache Kafka® on Van Oord’s vessels.

Since the environment in Van Oord is dynamic with regards to the application landscape and offered services, it is essential that a stable environment with controlled continuous integration and deployment is applied. Beyond the software components itself, this also applies to configurations and infrastructure, as well as applying the concept of CI/CD with infrastructure as code. The result: using Terraform and Confluent together.

Publishing information is treated as a product at Van Oord. An information product is a set of Kafka topics: topics to communicate change (via change data capture) and topics for sharing the state of a data source (Kafka tables). The set of all information products forms the enterprise data model.

Apache Atlas is used as a data dictionary and governance tool to capture the meaning of different information products. All changes in the data dictionary are available as an information product in Confluent, allowing for consumers of information products to subscribe to the information and be notified about changes.

Van Oord’s enterprise architecture model must remain up to date and aligned with the current implementation. This is achieved by automatically inspecting and analyzing Confluent data flows. Fortunately, Confluent embeds homogeneously in this holistic reference architecture. The basis of the holistic reference architecture is a change data capture (CDC) layer and a persistent layer, which makes Confluent the core component of the Van Oord future-proof digital data management solution.

EPISODE LINKS

Continue Listening

Episode 157May 4, 2021 | 27 min

Resilient Edge Infrastructure for IoT Using Apache Kafka ft. Kai Waehner

What is the internet of things (IoT), and how does it relate to event streaming and Apache Kafka? In this episode, Kai Waehner, field CTO and global technology advisor at Confluent, discusses the intersection of edge data infrastructure, IoT, and cloud services for Kafka. He also details how businesses get into the sticky situation of not accounting for solutions when data is running dangerously close to the edge.

Listen Now

Episode 158May 13, 2021 | 31 min

The Truth About ZooKeeper Removal and the KIP-500 Release in Apache Kafka ft. Jason Gustafson and Colin McCabe

Jason Gustafson and Colin McCabe, Apache Kafka developers, discuss all things KIP-500 adoption, the removal of ZooKeeper, and how that’s played out on the frontlines within the event streaming world. A previous episode of Streaming Audio featured both developers on the podcast before the release of Apache Kafka 2.8. Now they’re back to share how everything is working in reality.

Listen Now

Episode 159May 20, 2021 | 42 min

Engaging Database Partials with Apache Kafka for Distributed System Consistency ft. Pat Helland

When compiling database reports using a variety of data from different systems, obtaining the right data when you need it in real time can be difficult. With cloud connectivity and distributed data pipelines, Pat Helland (Principal Architect, Salesforce) explains how to make educated partial answers when you need to use the Apache Kafka® platform. After all, you can’t get guarantees across a distance, making it critical to consider partial results.

Listen Now

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog