Self Service Business Intelligence And Data Sharing Using Looker with Daniel Mintz - Episode 55

Business intelligence is a necessity for any organization that wants to be able to make informed decisions based on the data that they collect. Unfortunately, it is common for different portions of the business to build their reports with different assumptions, leading to conflicting views and poor choices. Looker is a modern tool for building and sharing reports that makes it easy to get everyone on the same page. In this episode Daniel Mintz explains how the product is architected, the features that make it easy for any business...

Play Episode

Using Notebooks As The Unifying Layer For Data Roles At Netflix with Matthew Seal - Episode 54

Jupyter notebooks have gained popularity among data scientists as an easy way to do exploratory analysis and build interactive reports. However, this can cause difficulties when trying to move the work of the data scientist into a more standard production environment, due to the translation efforts that are necessary. At Netflix they had the crazy idea that perhaps that last step isn't necessary, and the production workflows can just run the notebooks directly. Matthew Seal is one of the primary engineers who has been tasked with building the tools...

Play Episode

Of Checklists, Ethics, and Data with Emily Miller and Peter Bull (Cross Post from Podcast.__init__) - Episode 53

As data science becomes more widespread and has a bigger impact on the lives of people, it is important that those projects and products are built with a conscious consideration of ethics. Keeping ethical principles in mind throughout the lifecycle of a data project helps to reduce the overall effort of preventing negative outcomes from the use of the final product. Emily Miller and Peter Bull of Driven Data have created Deon to improve the communication and conversation around ethics among and between data teams. It is a Python...

Play Episode

Improving The Performance Of Cloud-Native Big Data At Netflix Using The Iceberg Table Format with Ryan Blue - Episode 52

With the growth of the Hadoop ecosystem came a proliferation of implementations for the Hive table format. Unfortunately, with no formal specification, each project works slightly different which increases the difficulty of integration across systems. The Hive format is also built with the assumptions of a local filesystem which results in painful edge cases when leveraging cloud object storage for a data lake. In this episode Ryan Blue explains how his work on the Iceberg table format specification and reference implementation has allowed Netflix to improve the performance and...

Play Episode

Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - Episode 51

One of the most complex aspects of managing data for analytical workloads is moving it from a transactional database into the data warehouse. What if you didn't have to do that at all? MemSQL is a distributed database built to support concurrent use by transactional, application oriented, and analytical, high volume, workloads on the same hardware. In this episode the CEO of MemSQL describes how the company and database got started, how it is architected for scale and speed, and how it is being used in production. This was...

Play Episode

Join The Mailing List