PostHog

Bringing The Modern Data Stack To Everyone With Y42 - Episode 295

Cloud services have made highly scalable and performant data platforms economical and manageable for data teams. However, they are still challenging to work with and manage for anyone who isn’t in a technical role. Hung Dang understood the need to make data more accessible to the entire organization and created Y42 as a better user experience on top of the “modern data stack”. In this episode he shares how he designed the platform to support the full spectrum of technical expertise in an organization and the interesting engineering challenges involved.

Read More

Data Cloud Cost Optimization With Bluesky Data - Episode 293

The latest generation of data warehouse platforms have brought unprecedented operational simplicity and effectively infinite scale. Along with those benefits, they have also introduced a new consumption model that can lead to incredibly expensive bills at the end of the month. In order to ensure that you can explore and analyze your data without spending money on inefficient queries Mingsheng Hong and Zheng Shao created Bluesky Data. In this episode they explain how their platform optimizes your Snowflake warehouses to reduce cost, as well as identifying improvements that you can make in your queries to reduce their contribution to your bill.

Read More

Evolving And Scaling The Data Platform at Yotpo - Episode 285

Building a data platform is an iterative and evolutionary process that requires collaboration with internal stakeholders to ensure that their needs are being met. Yotpo has been on a journey to evolve and scale their data platform to continue serving the needs of their organization as it increases the scale and sophistication of data usage. In this episode Doron Porat and Liran Yogev explain how they arrived at their current architecture, the capabilities that they are optimizing for, and the complex process of identifying and evaluating new components to integrate into their systems. This is an excellent exploration of the decisions and tradeoffs that need to be made while building such a complex system.

Read More

Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs - Episode 283

There are very few tools which are equally useful for data engineers, data scientists, and machine learning engineers. WhyLogs is a powerful library for flexibly instrumenting all of your data systems to understand the entire lifecycle of your data from source to productionized model. In this episode Andy Dang explains why the project was created, how you can apply it to your existing data systems, and how it functions to provide detailed context for being able to gain insight into all of your data processes.

Read More

Repeatable Patterns For Designing Data Platforms And When To Customize Them - Episode 277

Building a data platform for your organization is a challenging undertaking. Building multiple data platforms for other organizations as a service without burning out is another thing entirely. In this episode Brandon Beidel from Red Ventures shares his experiences as a data product manager in charge of helping his customers build scalable analytics systems that fit their needs. He explains the common patterns that have been useful across multiple use cases, as well as when and how to build customized solutions.

Read More

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera - Episode 275

Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in data management adds additional stress to an already complex endeavor. Privacera is an enterprise grade solution for cloud and hybrid data governance built on top of the robust and battle tested Apache Ranger project. In this episode Balaji Ganesan shares how his experiences building and maintaining Ranger in previous roles helped him understand the needs of organizations and engineers as they define and evolve their data governance policies and practices.

Read More

Accelerate Your Embedded Analytics With Apache Pinot - Episode 273

Data and analytics are permeating every system, including customer-facing applications. The introduction of embedded analytics to an end-user product creates a significant shift in requirements for your data layer. The Pinot OLAP datastore was created for this purpose, optimizing for low latency queries on rapidly updating datasets with highly concurrent queries. In this episode Kishore Gopalakrishna and Xiang Fu explain how it is able to achieve those characteristics, their work at StarTree to make it more easily available, and how you can start using it for your own high throughput data workloads today.

Read More