Bigeye

Introduce Climate Analytics Into Your Data Platform Without The Heavy Lifting Using Sust Global - Episode 322

The global climate impacts everyone, and the rate of change introduces many questions that businesses need to consider. Getting answers to those questions is challenging, because the climate is a multidimensional and constantly evolving system. Sust Global was created to provide curated data sets for organizations to be able to analyze climate information in the context of their business needs. In this episode Gopal Erinjippurath discusses the data engineering challenges of building and serving those data sets, and how they are distilling complex climate information into consumable facts so you don’t have to be an expert to understand it.

Read More

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality - Episode 320

The dream of every engineer is to automate all of their tasks. For data engineers, this is a monumental undertaking. Orchestration engines are one step in that direction, but they are not a complete solution. In this episode Sean Knapp shares his views on what constitutes proper automation and the work that he and his team at Ascend are doing to help make it a reality.

Read More

Understanding The Role Of The Chief Data Officer - Episode 318

The position of Chief Data Officer (CDO) is relatively new in the business world and has not been universally adopted. As a result, not everyone understands what the responsibilities of the role are, when you need one, and how to hire for it. In this episode Tracy Daniels, CDO of Truist, shares her journey into the position, her responsibilities, and her relationship to the data professionals in her organization.

Read More

Bringing Automation To Data Labeling For Machine Learning With Watchful - Episode 316

Data engineers have typically left the process of data labeling to data scientists or other roles because of its nature as a manual and process heavy undertaking, focusing instead on building automation and repeatable systems. Watchful is a platform to make labeling a repeatable and scalable process that relies on codifying domain expertise. In this episode founder Shayan Mohanty explains how he and his team are bringing software best practices and automation to the world of machine learning data preparation and how it allows data engineers to be involved in the process.

Read More

Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab - Episode 314

Data mesh is a frequent topic of conversation in the data community, with many debates about how and when to employ this architectural pattern. The team at AgileLab have first-hand experience helping large enterprise organizations evaluate and implement their own data mesh strategies. In this episode Paolo Platter shares the lessons they have learned in that process, the Data Mesh Boost platform that they have built to reduce some of the boilerplate required to make it successful, and some of the considerations to make when deciding if a data mesh is the right choice for you.

Read More

Build Your Python Data Processing Your Way And Run It Anywhere With Fugue - Episode 266

Python has grown to be one of the top languages used for all aspects of data, from collection and cleaning, to analysis and machine learning. Along with that growth has come an explosion of tools and engines that help power these workflows, which introduces a great deal of complexity when scaling from single machines and exploratory development to massively parallel distributed computation. In answer to that challenge the Fugue project offers an interface to automatically translate across Pandas, Spark, and Dask execution environments without having to modify your logic. In this episode core contributor Kevin Kho explains how the slight differences in the underlying engines can lead to big problems, how Fugue works to hide those differences from the developer, and how you can start using it in your own work today.

Read More

Build Your Own End To End Customer Data Platform With Rudderstack - Episode 263

Collecting, integrating, and activating data are all challenging activities. When that data pertains to your customers it can become even more complex. To simplify the work of managing the full flow of your customer data and keep you in full control the team at Rudderstack created their eponymous open source platform that allows you to work with first and third party data, as well as build and manage reverse ETL workflows. In this episode CEO and founder Soumyadeb Mitra explains how Rudderstack compares to the various other tools and platforms that share some overlap, how to set it up for your own data needs, and how it is architected to scale to meet demand.

Read More

Scalable Strategies For Protecting Data Privacy In Your Shared Data Sets - Episode 261

There are many dimensions to the work of protecting the privacy of users in our data. When you need to share a data set with other teams, departments, or businesses then it is of utmost importance that you eliminate or obfuscate personal information. In this episode Will Thompson explores the many ways that sensitive data can be leaked, re-identified, or otherwise be at risk, as well as the different strategies that can be employed to mitigate those attack vectors. He also explains how he and his team at Privacy Dynamics are working to make those strategies more accessible to organizations so that you can focus on all of the other tasks required of you.

Read More

Effective Pandas Patterns For Data Engineering - Episode 259

Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has become a standard tool for data engineers for a wide range of applications. Matt Harrison is a Python expert with a long history of working with data who now spends his time on consulting and training. He recently wrote a book on effective patterns for Pandas code, and in this episode he shares advice on how to write efficient data processing routines that will scale with your data volumes, while being understandable and maintainable.

Read More