Get Started Free
‹ Back to courses
course: Streaming Data Governance

Streaming at Scale

4 min
Wade Waldron

Wade Waldron

Staff Software Practice Lead

Streaming At Scale

Overview

At scale, modern businesses face challenges with data volume, regulatory requirements, data security, and more. Managing these challenges is part of the discipline known as Data Governance. However, as applications move from data-at-rest to data-in-motion it is important to consider Stream Governance as well. This video introduces us to the challenges facing modern businesses as they try to scale their applications.

Topics:

  • Digital Transformation
  • Data-At-Rest
  • Data-In-Motion
  • Regulatory Requirements (eg. GDPR) and Data Security
  • Data Governance and Stream Governance

Resources

Use the promo code GOVERNINGSTREAMS101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Streaming at Scale

Modern businesses have entered a new world. We are collecting vast amounts of data about users, their behaviors, system resources, Internet of Things devices, and a slew of other subjects. This data is being fed into data lakes, pipelines, and machine learning algorithms to derive various insights. The collection and processing of this data has become fundamental to modern businesses. This has forced them to enter into a digital transformation. One aspect that has grown out of these digital transformations is an increased need to manage the staggering amount of data being collected. Companies have been forced to build entire teams to manage this data. However, solving problems with large teams isn't really a digital transformation. If we want to transform our business, we need to think more about the tools we use, rather than the people. The volume of data we are working with is a big concern, but it's only part of the larger puzzle. While the volume has been increasing, the method of consumption has also changed. In older, legacy systems, it was common for data to be consumed at rest. This means that the data was pushed into some kind of static storage mechanism, such as a database. Analysts would use this data to perform various queries to extract the information they need. Or, perhaps, a periodic batch job would run against the data and transform it for later consumption. In either case, we are dealing with a significant amount of lag time between when the data's collected and when it is consumed. Similar to how users have grown to expect 100% uptime, they've also been conditioned to expect instant results. We no longer have the luxury of waiting for the data to be ready. This has sparked another change in the industry. We've been forced to move from "data at rest" to "data in motion". This transition means that data needs to be processed as it arrives, in real-time. We need to ensure all of that data is being pushed to where it is needed as fast as possible. Our users aren't going to wait for it. If it is taking too long, they might just find a solution that can solve the problem faster. At this point, we'll lose them to the competition, and obviously, that's not what we want. For businesses to be successful in this modern world, they need to adopt data streaming solutions. But, adopting data streaming has its challenges. As the amount of data flowing through our systems has increased, so has the scrutiny from external forces. In recent years, there have been a rash of security breaches on some of the largest companies in the world. Meanwhile, users have become increasingly concerned about where and how their data is being managed. As a result, governing bodies have stepped in with regulatory requirements, such as the General Data Protection Regulation, or GDPR. The teams who've been tasked with managing our data streams now have to contend with these additional regulatory requirements. This puts us in a difficult situation. We have massive amounts of data streaming through our systems in real-time. That data is constantly under threat from potential attackers. Meanwhile, government regulations have increased the burden of managing all of that data. And through all this, we still need to provide services to our users with nearly 100% uptime. If we want to achieve this, we are going to need a new set of principles and some powerful tools in our toolbox. These principles and tools are the focus of this course, and they fall under the category of stream governance. If you aren't already on Confluent Developer, head there now using the link in the video description to access the rest of this course and its hands-on exercises.