Stretching The Elastic Stack with Philipp Krenn – Episode 23

Search is a common requirement for applications of all varieties. Elasticsearch was built to make it easy to include search functionality in projects built in any language. From that foundation, the rest of the Elastic Stack has been built, expanding to many more use cases in the proces. In this episode Philipp Krenn describes the various pieces of the stack, how they fit together, and how you can use them in your infrastructure to store, search, and analyze your data.

Database Refactoring Patterns with Pramod Sadalage – Episode 22

As software lifecycles move faster, the database needs to be able to keep up. Practices such as version controlled migration scripts and iterative schema evolution provide the necessary mechanisms to ensure that your data layer is as agile as your application. Pramod Sadalage saw the need for these capabilities during the early days of the introduction of modern development practices and co-authored a book to codify a large number of patterns to aid practitioners, and in this episode he reflects on the current state of affairs and how things have changed over the past 12 years.

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman – Episode 18

As communications between machines become more commonplace the need to store the generated data in a time-oriented manner increases. The market for timeseries data stores has many contenders, but they are not all built to solve the same problems or to scale in the same manner. In this episode the founders of TimescaleDB, Ajay Kulkarni and Mike Freedman, discuss how Timescale was started, the problems that it solves, and how it works under the covers. They also explain how you can start using it in your infrastructure and their plans for the future.

CRDTs and Distributed Consensus with Christopher Meiklejohn – Episode 14

As we scale our systems to handle larger volumes of data, geographically distributed users, and varied data sources the requirement to distribute the computational resources for managing that information becomes more pronounced. In order to ensure that all of the distributed nodes in our systems agree with each other we need to build mechanisms to properly handle replication of data and conflict resolution. In this episode Christopher Meiklejohn discusses the research he is doing with Conflict-Free Replicated Data Types (CRDTs) and how they fit in with existing methods for sharing and sharding data. He also shares resources for systems that leverage CRDTs, how you can incorporate them into your systems, and when they might not be the right solution. It is a fascinating and informative treatment of a topic that is becoming increasingly relevant in a data driven world.

Citus Data: Distributed PostGreSQL for Big Data with Ozgun Erdogan and Craig Kerstiens – Episode 13

PostGreSQL has become one of the most popular and widely used databases, and for good reason. The level of extensibility that it supports has allowed it to be used in virtually every environment. At Citus Data they have built an extension to support running it in a distributed fashion across large volumes of data with parallelized queries for improved performance. In this episode Ozgun Erdogan, the CTO of Citus, and Craig Kerstiens, Citus Product Manager, discuss how the company got started, the work that they are doing to scale out PostGreSQL, and how you can start using it in your environment.

SiriDB: Scalable Open Source Timeseries Database with Jeroen van der Heijden – Episode 11

Time series databases have long been the cornerstone of a robust metrics system, but the existing options are often difficult to manage in production. In this episode Jeroen van der Heijden explains his motivation for writing a new database, SiriDB, the challenges that he faced in doing so, and how it works under the hood.

Data Serialization Formats with Doug Cutting and Julien Le Dem – Episode 8

With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, creator of Avro, and Julien Le Dem, creator of Parquet, dig into the different classes of serialization formats, what their strengths are, and how to choose one for your workload. They also discuss the role of Arrow as a mechanism for in-memory data sharing and how hardware evolution will influence the state of the art for data formats.