A High Performance Platform For The Full Big Data Lifecycle - Episode 94

Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of the early entrants that predates Hadoop and has since been open sourced is the HPCC (High Performance Computing Cluster) system. Designed as a fully integrated platform to meet the needs of enterprise grade analytics it provides a solution for the full lifecycle of data at massive scale. In this episode Flavio Villanustre, VP of infrastructure and products at HPCC Systems, shares the history...

Play Episode

Digging Into Data Replication At Fivetran - Episode 93

The extract and load pattern of data replication is the most commonly needed process in data engineering workflows. Because of the myriad sources and destinations that are available, it is also among the most difficult tasks that we encounter. Fivetran is a platform that does the hard work for you and replicates information from your source systems into whichever data warehouse you use. In this episode CEO and co-founder George Fraser explains how it is built, how it got started, and the challenges that creep in at the edges...

Play Episode

Simplifying Data Integration Through Eventual Connectivity - Episode 91

The ETL pattern that has become commonplace for integrating data from multiple sources has proven useful, but complex to maintain. For a small number of sources it is a tractable problem, but as the overall complexity of the data ecosystem continues to expand it may be time to identify new ways to tame the deluge of information. In this episode Tim Ward, CEO of CluedIn, explains the idea of eventual connectivity as a new paradigm for data integration. Rather than manually defining all of the mappings ahead of time,...

Play Episode

Straining Your Data Lake Through A Data Mesh - Episode 90

The current trend in data management is to centralize the responsibilities of storing and curating the organization's information to a data engineering team. This organizational pattern is reinforced by the architectural pattern of data lakes as a solution for managing storage and access. In this episode Zhamak Dehghani shares an alternative approach in the form of a data mesh. Rather than connecting all of your data flows to one destination, empower your individual business units to create data products that can be consumed by other teams. This was an...

Play Episode

Join The Mailing List