Data engineers have typically left the process of data labeling to data scientists or other roles because of its nature as a manual and process heavy undertaking, focusing instead on building automation and repeatable systems. Watchful is a platform to make labeling a repeatable and scalable process that relies on codifying domain expertise. In this episode founder Shayan Mohanty explains how he and his team are bringing software best practices and automation to the world of machine learning data preparation and how it allows data engineers to be involved in the process.
Bringing Automation To Data Labeling For Machine Learning With Watchful - Episode 316August 14, 2022 tmacey Comments Off on Bringing Automation To Data Labeling For Machine Learning With Watchful - Episode 316
Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery - Episode 315August 14, 2022 tmacey Comments Off on Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery - Episode 315
Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab - Episode 314August 6, 2022 tmacey Comments Off on Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab - Episode 314
Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery - Episode 315
Data is useless if it isn't being used, and you can't use it if you don't know where it is. Data catalogs were the first solution to this problem, but they are only helpful if you know what you are looking for. In this episode Shinji Kim discusses the challenges of data discovery and how to collect and preserve additional context about each piece of information so that you can find what you need when you don't even know what you're looking for yet.
Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab - Episode 314
Data mesh is a frequent topic of conversation in the data community, with many debates about how and when to employ this architectural pattern. The team at AgileLab have first-hand experience helping large enterprise organizations evaluate and implement their own data mesh strategies. In this episode Paolo Platter shares the lessons they have learned in that process, the Data Mesh Boost platform that they have built to reduce some of the boilerplate required to make it successful, and some of the considerations to make when deciding if a data mesh is the right choice for you.
Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus - Episode 313
The optimal format for storage and retrieval of data is dependent on how it is going to be used. For analytical systems there are decades of investment in data warehouses and various modeling techniques. For machine learning applications relational models require additional processing to be directly useful, which is why there has been a growth in the use of vector databases. These platforms store direct representations of the vector embeddings that machine learning models rely on for computing relevant predictions so that there is no additional processing required to go from input data to inference output. In this episode Frank Liu explains how the open source Milvus vector database is implemented to speed up machine learning development cycles, how...
Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model, or data warehouse table that you are working with. Because of its centrality to your data systems it is valuable for debugging, governance, understanding context, and myriad other purposes. This means that it is important to have an accurate and complete lineage graph so that you don't have to perform your own detective work when time is in short supply. In this episode Ernie Ostic shares the approach that he and his team at Manta are taking to build a complete view of data lineage across the various data systems in your organization and the useful applications...