StreamSets

Scale Your Spatial Analysis By Building It In SQL With Syntax Extensions - Episode 262

Along with globalization of our societies comes the need to analyze the geospatial and geotemporal data that is needed to manage the growth in commerce, communications, and other activities. In order to make geospatial analytics more maintainable and scalable there has been an increase in the number of database engines that provide extensions to their SQL syntax that supports manipulation of spatial data. In this episode Matthew Forrest shares his experiences of working in the domain of geospatial analytics and the application of SQL dialects to his analysis.

Read More

A Reflection On Learning A Lot More Than 97 Things Every Data Engineer Should Know - Episode 260

The Data Engineering Podcast has been going for five years now and has included conversations and interviews with a huge number of guests, covering a broad range of topics. In addition to that, the host curated the essays contained in the book “97 Things Every Data Engineer Should Know”, using the knowledge and context gained from running the show to inform the selection process. In this episode he shares some reflections on producing the podcast, compiling the book, and relevant trends in the ecosystem of data engineering. He also provides some advice for those who are early in their career of data engineering and looking to advance in their roles.

Read More

The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam - Episode 258

Data platforms are exemplified by a complex set of connections that are subject to a set of constantly evolving requirements. In order to make this a tractable problem it is necessary to define boundaries for communication between concerns, which brings with it the need to establish interface contracts for communicating across those boundaries. The recent move toward the data mesh as a formalized architecture that builds on this design provides the language that data teams need to make this a more organized effort. In this episode Abhi Sivasailam shares his experience designing and implementing a data mesh solution with his team at Flexport, and the importance of defining and enforcing data contracts that are implemented at those domain boundaries.

Read More

An Introduction To Data And Analytics Engineering For Non-Programmers - Episode 255

Applications of data have grown well beyond the venerable business intelligence dashboards that organizations have relied on for decades. Now it is being used to power consumer facing services, influence organizational behaviors, and build sophisticated machine learning systems. Given this increased level of importance it has become necessary for everyone in the business to treat data as a product in the same way that software applications have driven the early 2000s. In this episode Brian McMillan shares his work on the book “Building Data Products” and how he is working to educate business users and data professionals about the combination of technical, economical, and business considerations that need to be blended for these projects to succeed.

Read More

Open Source Reverse ETL For Everyone With Grouparoo - Episode 254

Reverse ETL is a product category that evolved from the landscape of customer data platforms with a number of companies offering their own implementation of it. While struggling with the work of automating data integration workflows with marketing, sales, and support tools Brian Leonard accidentally discovered this need himself and turned it into the open source framework Grouparoo. In this episode he explains why he decided to turn these efforts into an open core business, how the platform is implemented, and the benefits of having an open source contender in the landscape of operational analytics products.

Read More