April 29, 2021 | Episode 156

Data Management and Digital Transformation with Apache Kafka at Van Oord

  • Transcript
  • Notes

Imagine if you could create a better world for future generations simply by delivering marine ingenuity. 

Van Oord is a Dutch family-owned company that has served as an international marine contractor for over 150 years, focusing on dredging, land infrastructure in the Netherlands, and offshore wind and oil & gas infrastructure.

Real-time insights into costs spent, the progress of projects, and the performance tracking of vessels and equipment are essential for surviving as a business. Becoming a data-driven company requires that all data connected, synchronized, and visualized—in fact, truly digitized.

This requires a central nervous system that supports:

  • Legacy (monolith environment) as well as microservices
  • ELT/ETL/streaming ETL
  • All types of data, including transactional, streaming, geo, machine, and (sea) survey/bathymetry
  • Master data/enterprise common data model

The need for agility and speed makes it necessary to have a fully integrated DevOps-infrastructure-as-code environment, where data lineage, data governance, and enterprise architecture are holistically embedded. Thousands of topics need to be developed, updated, tested, accepted, and deployed each day. This together with different scripts for connectors requires a holistic data management solution, where data lineage, data governance and enterprise architecture are an integrated part.

Thus, Marlon Hiralal (Enterprise/Data Management Architect, Van Oord) and Andreas Wombacher (Data Engineer, Van Oord) turned to Confluent for a three-month proof of concept and explored the pre-prep stage of using Apache Kafka® on Van Oord’s vessels.

Since the environment in Van Oord is dynamic with regards to the application landscape and offered services, it is essential that a stable environment with controlled continuous integration and deployment is applied. Beyond the software components itself, this also applies to configurations and infrastructure, as well as applying the concept of CI/CD with infrastructure as code. The result: using Terraform and Confluent together.

Publishing information is treated as a product at Van Oord. An information product is a set of Kafka topics: topics to communicate change (via change data capture) and topics for sharing the state of a data source (Kafka tables). The set of all information products forms the enterprise data model.

Apache Atlas is used as a data dictionary and governance tool to capture the meaning of different information products. All changes in the data dictionary are available as an information product in Confluent, allowing for consumers of information products to subscribe to the information and be notified about changes.

Van Oord’s enterprise architecture model must remain up to date and aligned with the current implementation. This is achieved by automatically inspecting and analyzing Confluent data flows. Fortunately, Confluent embeds homogeneously in this holistic reference architecture. The basis of the holistic reference architecture is a change data capture (CDC) layer and a persistent layer, which makes Confluent the core component of the Van Oord future-proof digital data management solution.

Continue Listening

Episode 157May 4, 2021 | 27 min

Resilient Edge Infrastructure for IoT Using Apache Kafka ft. Kai Waehner

What is the internet of things (IoT), and how does it relate to event streaming and Apache Kafka? In this episode, Kai Waehner, field CTO and global technology advisor at Confluent, discusses the intersection of edge data infrastructure, IoT, and cloud services for Kafka. He also details how businesses get into the sticky situation of not accounting for solutions when data is running dangerously close to the edge.

Episode 158May 13, 2021 | 31 min

The Truth About ZooKeeper Removal and the KIP-500 Release in Apache Kafka ft. Jason Gustafson and Colin McCabe

Jason Gustafson and Colin McCabe, Apache Kafka developers, discuss all things KIP-500 adoption, the removal of ZooKeeper, and how that’s played out on the frontlines within the event streaming world. A previous episode of Streaming Audio featured both developers on the podcast before the release of Apache Kafka 2.8. Now they’re back to share how everything is working in reality.

Episode 159May 20, 2021 | 42 min

Engaging Database Partials with Apache Kafka for Distributed System Consistency ft. Pat Helland

When compiling database reports using a variety of data from different systems, obtaining the right data when you need it in real time can be difficult. With cloud connectivity and distributed data pipelines, Pat Helland (Principal Architect, Salesforce) explains how to make educated partial answers when you need to use the Apache Kafka® platform. After all, you can’t get guarantees across a distance, making it critical to consider partial results.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free