This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Support the show!Listen in your favorite app:
FountainHere are shows you might like
Data warehouse technology has been around for decades and has gone through several generational shifts in that time. The current trends in data warehousing are oriented around cloud native architectures that take advantage of dynamic scaling and the separation of compute and storage.…
Data warehouse technology has been around…
01 September 2020 | 01:05:51
In order to scale the use of data across an organization there are a number of challenges related to discovery, governance, and integration that need to be solved. The key to those solutions is a robust and flexible metadata management system. LinkedIn has gone through several iterations on…
In order to scale the use of data across…
25 August 2020 | 00:51:04
Most databases are designed to work with textual data, with some special purpose engines that support domain specific formats. TileDB is a data engine that was built to support every type of data by using multi-dimensional arrays as the foundational primitive. In this episode the creator…
Most databases are designed to work with…
17 August 2020 | 01:05:44
Event based data is a rich source of information for analytics, unless none of the event structures are consistent. The team at Iteratively are building a platform to manage the end to end flow of collaboration around what events are needed, how to structure the attributes, and how they are…
Event based data is a rich source of…
10 August 2020 | 00:59:17
Finding connections between data and the entities that they represent is a complex problem. Graph data models and the applications built on top of them are perfect for representing relationships and finding emergent structures in your information. In this episode Denise Gosnell and Matthias…
Finding connections between data and the…
04 August 2020 | 01:00:43
A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of subtle ways that errors can creep in. Kyle Kingsbury created the Jepsen framework for testing the guarantees of distributed data processing systems and…
A majority of the scalable data…
28 July 2020 | 00:49:38
Wind energy is an important component of an ecologically friendly power system, but there are a number of variables that can affect the overall efficiency of the turbines. Michael Tegtmeier founded Turbit Systems to help operators of wind farms identify and correct problems that contribute…
Wind energy is an important component of…
21 July 2020 | 00:40:48
The first stage of every data pipeline is extracting the information from source systems. There are a number of platforms for managing data integration, but there is a notable lack of a robust and easy to use open source option. The Meltano project is aiming to provide a solution to that…
The first stage of every data pipeline is…
13 July 2020 | 01:05:19
There are an increasing number of use cases for real time data, and the systems to power them are becoming more mature. Once you have a streaming platform up and running you need a way to keep an eye on it, including observability, discovery, and governance of your data. That’s what…
There are an increasing number of use…
06 July 2020 | 00:45:36
We have machines that can listen to and process human speech in a variety of languages, but dealing with unstructured sounds in our environment is a much greater challenge. The team at Audio Analytic are working to impart a sense of hearing to our myriad devices with their sound recognition…
We have machines that can listen to and…
30 June 2020 | 00:57:29