Summary
Time series databases have long been the cornerstone of a robust metrics system, but the existing options are often difficult to manage in production. In this episode Jeroen van der Heijden explains his motivation for writing a new database, SiriDB, the challenges that he faced in doing so, and how it works under the hood.
Preamble
- Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure
- When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show.
- Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production and GoCD is the open source platform made by the people at Thoughtworks who wrote the book about it. Go to dataengineeringpodcast.com/gocd to download and launch it today. Enterprise add-ons and professional support are available for added peace of mind.
- Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
- You can help support the show by checking out the Patreon page which is linked from the site.
- To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
- Your host is Tobias Macey and today I’m interviewing Jeroen van der Heijden about SiriDB, a next generation time series database
Interview
- Introduction
- How did you get involved in the area of data engineering?
- What is SiriDB and how did the project get started?
- What was the inspiration for the name?
- What was the landscape of time series databases at the time that you first began work on Siri?
- How does Siri compare to other time series databases such as InfluxDB, Timescale, KairosDB, etc.?
- What do you view as the competition for Siri?
- How is the server architected and how has the design evolved over the time that you have been working on it?
- Can you describe how the clustering mechanism functions?
- Is it possible to create pools with more than two servers?
- What are the failure modes for SiriDB and where does it fall on the spectrum for the CAP theorem?
- In the documentation it mentions needing to specify the retention period for the shards when creating a database. What is the reasoning for that and what happens to the individual metrics as they age beyond that time horizon?
- One of the common difficulties when using a time series database in an operations context is the need for high cardinality of the metrics. How are metrics identified in Siri and is there any support for tagging?
- What have been the most challenging aspects of building Siri?
- In what situations or environments would you advise against using Siri?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering podcast, the show about modern data management. When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out Linode at data engineering podcast dotcom/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production, and Go CD is the open source platform made by the people at Thoughtworks who wrote the book about it. Go to data engineering podcast.com/gocd to download and launch it today.
Enterprise add ons and professional support are available for added peace of mind. And go to data engineering podcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page, which is linked from the site. To help other people find the show, you can leave a review on Itunes or Google Play Music, tell your friends and coworkers, and share it on social media. Your host is Tobias Macy. And today, I'm interviewing Yaron van der Heiden about Siri DB, a next generation time series database.
[00:01:23] Unknown:
So, Jeroen, could you please introduce yourself?
[00:01:26] Unknown:
Yes, Tobias. Thank you for inviting me to this interview. I'm, I started my career as a system engineer and then shifted more towards development. And then about, 5 years ago, I made the the switch to become a full time developer at the company I currently work with. At this company, we are building a solution monitoring IT infrastructure and IT components. And, therefore, we are collecting a lot of data. So, basically, that's what we are doing right now.
[00:02:00] Unknown:
And how did you first get interested or involved in the area of data management?
[00:02:05] Unknown:
Like, it it it was at this company that we, needed to build, like, this this system for monitoring all this IT infrastructure components and stuff like that. So, yeah, we needed to find a way to store all this data. So we automatically get got involved with all this, data engineering. This is also the time when we we started to need needing a time series database.
[00:02:30] Unknown:
And so can you take a minute to describe what Siri DB is and, how the project first got started?
[00:02:38] Unknown:
Yes. It's a it's a database for storing time series, and time series are, like a metric, for example, where you store points in time. So so each point in a time series database has a time value and, yeah, an actual value. Like, for example, you can measure a CPU of your notebook or whatever, and you want to store, like, at each second, you want to store the value of the current CPU usage. This is stored in a time series. We we call it like that or a metric. Yeah. 0 DB is a database which is specialized in storing that type of data. And, yeah, we started this because, the monitoring solution that we are building at our company just, had a need for such database.
Because if you wanted to store everything in a SQL like database or something like that, yeah, it will probably not scale and and not fit. So
[00:03:31] Unknown:
And what is it about time series in particular that requires a different format of database and way of storing the data that doesn't work with more traditional models?
[00:03:42] Unknown:
Well, it scales in 2 dimensions. You you can have, like, a lot of series, which you could translate them to a table in, for example, SQL. But if you would do this approach, then you would need a time series for every metric that you want to store. So that would require like a lot of tables. For example, with our monitoring solution, we currently say we store over more than 2, 000, 000 components of metrics, so to say. So it scales in 2 dimensions. It scales in time, and it scales in the number of metrics. And I think that that's different from, like how SQL usually works or a traditional database.
[00:04:23] Unknown:
And just briefly, where does the name come from?
[00:04:27] Unknown:
Well, my daughter's name is Iris. In Dutch, you say Iris. And turning it around gives Siri, and then it's a database storing time series. So it's a basically series of points. So that's how we got the name. So we just stick with the name. Yeah. That's
[00:04:47] Unknown:
clever. Thank you. And what was the landscape of time series databases like at the time that you first began working on Siri that you felt it was necessary and you couldn't just use 1 of the off the shelf options?
[00:05:01] Unknown:
Well, at at the time, Influx was just available as a beta version, and I believe InfluxDB was using LevelDB as their underlying storage system. We actually used InfluxDB for some time, but, this was a very early version, and it had some issues. And, yeah, of course, there were other options like, openTSDB. Yeah. But in general, I think it it it was a time when time series databases really started to grow. And just as a proof of concept, I started 0DB and just to see, like, can we do this ourselves? It was more like a proof of concept, so to say. Probably, yeah, we could have used something else, but we we only tried Influx, and, yeah, this was, at the time, too new. So, yeah, that's why we started, to create our create our own time series database.
[00:05:53] Unknown:
And now that you've got it to a point where it's production ready and being used, how do you feel that it compares to the other options such as Influx DB that you mentioned or Timescale which is a somewhat newer 1 or Kairos or as you mentioned open TSDB?
[00:06:08] Unknown:
Well, I I think we are the most similar to InfluxDB. I think we we are often compared with InfluxDB. But, yeah, I like the the idea of time scale because, yeah, they are fully compatible with SQL and everything, so that's that's also a nice time series database. I think I think a key difference is is that started off with its own storage system. We we never relied on something else. I know that Influx DB at the moment is also, switched to its own storage system, but I believe back then, they used LevelDB, and they switched to something else in between. BoltDB, I remember, if I remember correctly.
But, yeah. Yeah. I think 1 key difference is that we, from ground, yeah, from from the beginning, had our own storage system underlying Siri DB. Now probably there are a lot of time series databases which which do the same, but that's at least 1 thing which is different from the ones you mentioned.
[00:07:04] Unknown:
And do you feel that Siri is more suited to particular use cases than some of the others or are there areas or feature sets where they, where Siri differs drastically from some of the other available options?
[00:07:19] Unknown:
Well, not drastically, I think, but, it's like a combination of things. Like, we have the combination that we are scalable. We can, yeah, we can scale across nodes, and, yeah, not all the order time series databases have, the scalable ability in in themselves yet. I know that Influx DB has scalable options, but I believe the open source version does not, and Timescale is still working on that. So in in that sense, yeah, we are different from that ones. Then, of course, like, I think it's it's, it's when you, have a lot of metrics, and each metric has, for example, a 1, 000, 000 data points. So we scale pretty well in both having a lot of metrics and both having a lot of data points, yeah, when you are only required to use, intake and, floating point values because that are the only values that CRDB at a point is able to store, then I think Siri DB is a good, good option.
[00:08:24] Unknown:
And on the point of the data types that it supports, as you mentioned, they're floating point or integer values, but is there any capability for being able to store things like events where you might want to indicate, for instance, that a deploy happened or, any particular just single point in time as a reference, or is that something that you would store in some external system and then merge in at the time that you're trying overlay it as you're displaying the data in some sort of dashboard?
[00:08:55] Unknown:
Yeah. Currently, we are storing this in another database, so we are not using Siri DB for that purpose. But, actually, at the time, I'm also developing this in Siri DB. So it it might be that, in a few weeks or a few months at most, we should be able to store this type of values in CWDB itself. But at the moment, we we just focused on integrals and floating point values. So this is something that which will change in the future, but at the moment, we do not.
[00:09:23] Unknown:
And can you take some time to describe how the server is architected and how the internals of it work and how that design has evolved over the time that you've been working on it? Well, I've I first created CODP as a single node and,
[00:09:38] Unknown:
yeah, only later I I added, like, the cluster mechanism that it has. If you take the single node, then, it accepts data, which is coming in, and it first stores all the data in a sort of a rider headlock, or you can call it a buffer, what you what you want. And then when it collects a certain amount of points, then it's these points are moved to a shard. And you can see a shard as a as a fell in time. Like, for example, 1 shard can store, like, a couple of days or maybe a couple of hours. You have to choose that when you create a database, and, the shards are getting optimized over time. So if you store, for example, a few chunks of data inside a chart, then CRD Bay will run an optimized task over the chart so that these points are sorted and they are already sorted, but they may have overlaps. These overlaps can happen because allows you to write points from the past.
For example, if you miss some points, you can later add them to CW DB. But this way, on the disk, you can get an overlap in time, and we want to sort this overlap out. So we run we run an optimized task over these shards so that they are written optimally. Yeah. Then, man, I had this single node fully working. I extended c r d b so that, it works like a cluster, and you, yeah, can make security c r d b scalable and fault tolerant. So that that's some there are things which I created later. And how does that clustering mechanism operate now that you've got it functional? Well, it's a assigns time series or metrics to a pool. It's based on that time series name. So when you add a new pool for example, then CWDB sort of reindexes the time series, so that each existing pool moves a part of the series to the new pool, so that when this process is finished, all pools have approximately an equal amount of series.
So your data is always spread across multiple pools, and in a most equal way. It is important to note that when series will move from, 1 pool to another, that they only move to the new pool. So if you add a new pool to your cluster, then only CERiS will move from existing clusters to this new pool, and there is no re indexing between the existing pools. Then it is also possible to add a second server to each pool, so that your database has some fault tolerance. And to sever the second server, it acts as a active replica. So when they are both online, they can both process queries, and there is no really master slave, in that sense.
[00:12:17] Unknown:
And is it possible to add more than 2 servers to a pool in case you wanted to, for instance, spread the pool across 3 availability zones in AWS, for instance? No. It's it's only possible,
[00:12:31] Unknown:
to have 2 servers in a pool. So we do the same thing with oversight. We spread it across 2 locations. But at the moment, at least, it's not possible to use 3 locations. And this is mainly because synchronizing is a lot easier when you limit yourself to 2 servers in a pool. So maybe we we change this in the future, but at the moment, we stick with this limit just because it's easier for synchronization.
[00:12:57] Unknown:
And do you have any issues with, conflicts in mirroring that data between the 2 servers in a given pool, or is the data, as it's written just balanced between the 2 using, for instance, a load balancer and then they just fill in the gaps in each other's data storage?
[00:13:16] Unknown:
It's just because that we limit, everything to 2 servers for replication. It's a lot easier to prevent this to happen If you if you, for example, would add like, would allow like a search server, this problem becomes much more, how do you say that? This problem becomes much more bigger, you know? Like, it's quite easy to prevent if you limit yourself to 2 servers, because like, 1 server is receiving data. It can store it on its disk in a in a sort of buffer, you know, and then then the other server is online. Or if it's online, then it can send immediately the data to the 2nd server, and it always knows, like, which server still needs to receive data points or if that server has already had its data point because there's no way 2 servers are updating the third 1, for example.
[00:14:09] Unknown:
And when the data is being ingested, is there some routing mechanism, for instance, if you're trying to write to write a metric to a point in the server that is belongs to a different pool, does it automatically route that for you? Or is if you can just describe how that works?
[00:14:27] Unknown:
Yeah. Yeah. You you just try you can just choose 1 server in the cluster. It doesn't really matter which 1, and you just start writing to that 1. And that server knows about, all the series. It doesn't really know about all the series, but it knows that if a series exists, then it knows in which pool it exists. For example, you come up with a metric, then the receiving server knows that this metric must be living in, for example, pool 1 of pool 2. So it forces this metric to the correct pool. Each pool just knows by an algorithm to what pool a metric belongs. So that's how the the metrics are spread across the pools. And is it using some sort of algorithm because what we don't want is that if you, scale, for example, from 2 to 3 pools, that metrics, which, by the algorithm are assigned to pool 0 can move to pool 1 when you add, like, a second pool. Because this this would mean that when you have, for example, 20 or 30 pools, that you get a lot of traffic between the the pools when you just add a new 1. So we don't want that. We we only want, like, the minimal amount of of series to move to the new pool.
So the algorithm is it is like a hashing, but it's it's a just a different algorithm to assign the series to a pool, just to prevent from series, yeah, being transferred back to a pool that they came from, so to say.
[00:16:09] Unknown:
And is it possible to scale back down if, for instance, your data volumes decrease and you don't need quite as much throughput? And what would that process look like as far as decommissioning a given pool to scale back in?
[00:16:24] Unknown:
Well, at at the moment, it's not possible at all, but it would be very easy to, to make it possible to to to scale back and remove the last pool. If we wanted to make that in the future, but right now we don't have this. But if we want to create that, it would be very easy to remove the last pool. But it will be rather difficult to remove, for example, the 1st pool or the 2nd pool if you have more than 2. So, yeah, maybe we will add this in the future, but at the moment, it's not possible at all. You do, however, can remove a replica or rebuild a replica. So if something, yeah, really happens with a with a server, you can rebuild this 1. So you can remove a replica and rebuild it. That that's possible. But truly scaling down in number of pools is at the moment, not possible.
[00:17:09] Unknown:
And given the fact that it does have clustering capabilities, I'm curious what the failure modes are for a given deployment and where Siri DB falls on the spectrum for the cap theorem?
[00:17:22] Unknown:
At least 1 serve in a pool must be reachable. So if if c o d b has a pool completely down, like, both servers are are are down in a pool, then the whole system doesn't work anymore. So, queries and inserts will not be accepted. As long as, each pool has 1 server online, the the system keeps working. Yeah. If you look at the replication process, I think we, try to be as consistent as possible with the data. So we yeah. Before we send a package to the replica server, we first, save it in a buffer on the disk. So if something happens to the server, if if it falls out or the network connection drops or whatever, it's still stored on disk, and then everything is back online. It will be replicated to, your server. So in terms of the replication, yeah, we try to be as consistent as possible. And in terms of, scaling across pools, like, yeah, each pool must be online. So that's, yeah, this can be a problem. It's a good idea to spread, yeah, your servers in 2 locations and make sure that each pool has 1 server at each site, so to say.
[00:18:32] Unknown:
And as far as the databases themselves, you mentioned that when you create the database, the shard that it uses for storage, you need to set a boundary of how long those metrics will live for. So I'm wondering what's the reasoning for that and what happens to those individual metric points as they age beyond the time horizon for a given shard?
[00:18:55] Unknown:
Well, that's that's a small misconception because, yeah, it's the the duration of a chart actually what you are referring to. But, it just it just creates more charts. So Siri will never throw away data. It will always keep the data, but just in different charts. So the duration you are referring to is the duration of a single chart. Okay. And we make this configurable because sometimes you want, for example, metrics with an interval with approximately 1 data point every second. So a good shot duration would then be like 6 hours or maybe a day or something like that. Sometimes you have like the total order spectrum, and you have only like 1 metric each or 1 point for a metric each day. And then you have like, you want to store maybe like 10 or 20 years of data or maybe even more. So a different shot duration would then perform better. So it's actually for performance that, I made this, configurable. But 1 downside is that at the time you need to choose this for the whole database.
So it's not possible to, choose for individual metrics a different charge duration, and I might change that in a future release. Like, I want this to be more dynamic that you don't have to choose it anymore, but CWDB decides by itself what the shot duration is the best for this series or metric, whatever you want to call it. But at the moment, you have to choose it yourself in creating a database.
[00:20:21] Unknown:
And 1 of the common issues that happens when you're using a time series database in an operations context is the need for having high cardinality of the metrics where you have multiple different ways to identify a given data point whether it's the stating that it's the number of I o operations for a disk, but also saying that it was from a given server with a particular purpose and, you know, maybe the environment that is being deployed in. So I'm wondering how the metrics in Siri are identified and whether it has support for adding that high cardinality for the metrics or any sort of tagging capabilities?
[00:20:59] Unknown:
Well, at the moment, we are using the metric name as a as identifier. So you must be careful choosing a good name. Like for example, we include, like, all these things inside the name of the metric. So besides that, we also support, like, the dynamic grouping. So you can create a group for a collection of series based on regular expression. That way you can, for example, create a group of all your CPU or all your memory or just a customer or something like that or a location. And then with these groups, you can perform set operations. So you can combine them, yeah, just with with everything which, is allowed with sets, like intersection, difference, things like that. Yeah. We are also working on a, another tech system, which allows you to, take, like, individual series, and then you could use them just like groups.
But we are kind of working on that right now. Yeah. The the current groups we are using are, dynamically updates. So when you add a new series which has a match with your group, then it automatically will add to this group. So for example, yeah, you have to choose a good metric name. But if you add a new metric where the names, corresponds to a certain group, then it will automatically be added or dropped if you remove the series. Yeah. This is the way that's, more or less how Influx worked, and we we came from Influx. So this is a little bit the reason why it works like this.
[00:22:22] Unknown:
And you mentioned briefly that you created your own query language for Siri DB. So I'm wondering what were the design considerations when you were planning out what that syntax would look like and I'm wondering how well that has stood up as the database has started to gain more use.
[00:22:41] Unknown:
Well, we we started very simple, just like with select statements and stuff like that. And then, yeah, we, had our own product, our own monitoring product, which required some, some expressions and aggregation methods and stuff like that. And, for example, what we really like is to combine aggregation methods together. For example, sometimes we want to take the difference of metric and then difference take difference again and again, just to, flat the series out. So we want our query language to be able to perform that kind of things. And another thing, which we also needed for oversight, our monitoring product, is the capability of merging metrics together. For example, if you have, like, let let's let's take the example of the CPU you are monitoring. Like, you take all the cores and you want to group them together and present them as a single a single metric. The query language should be able to allow you to say to Siri debate that you want this. Yeah. Maybe I'm explaining this a little bit, in a rough way, but No. That makes sense. Yeah. And so Siri has been built with a strong focus on being used in the context
[00:23:55] Unknown:
of systems operations contexts for use
[00:24:06] Unknown:
contexts for use? Now, yeah, early this year, we we made a CVDB open source. So now we, yeah, we are hearing of other projects. And, 1 of the things I heard about is that they use it for, like, a weather system. So they store, like, all this weather information on, I believe they are even using Raspberry Pis sometimes to store this data on. C o d b also runs on Raspberry Pis. So they are using it for that type of data, which is totally different from our use case. But in a sense, it's it's just time series data, so it's, it's possible to use it, for that purpose.
Another 1 I heard about was, financial data. So, we actually, as a as a demo 1 time, scratched all the data from, the Yahoo Finance side and put it into Siri DB. So these are other use cases, but, yeah, but you're right that, we created it mainly for monitoring IT infrastructure systems.
[00:25:03] Unknown:
And 1 of the things that's worth noting is that the database itself is written in c, which I imagine is why it was able to be deployed to Raspberry Pis because it doesn't have the additional overhead of running on something like the JVM.
[00:25:17] Unknown:
Yeah. Yeah. That that's correct. It's, it's all written in c. However, we have built some, tools to connect to serial DB. For example, we have, like, a client available, and this client is written in Go. So, yeah, I sometimes hear a misconception that people think that it's written in Go, but it's actually only the prompt, the client which is written in Go, and, CRDB itself is written in c. And I noticed that you also have client libraries for things like Python
[00:25:43] Unknown:
and Ruby for people who are using those languages as well and also an HTTP, plugin for being able to address it for things like Grafana, for being able to build dashboards on top of it?
[00:25:56] Unknown:
Yeah. We mainly create this, this HTTP library or add on, so to say, to be able to use, like, any programming language you you want because you usually should be able to connect using that that API. But, we we certainly want to support as many programming languages as possible to have native clients. And we have, available for Python, for Go, for c and c plus plus, and Node. Js. But we would also like to add, for example, at least, Java, which is, on top of the list, which, yeah, we really would want. But, of course, like, everyone to extend this to other programming languages as well as Ruby and maybe PHP, and stuff like that. Yeah.
[00:26:43] Unknown:
And it's also worth talking about the process of open sourcing it because as you mentioned, when you first began work on it, it was closed source. And then at the beginning of this year, you released it. So I'm wondering what was your reasoning for releasing it to open source, and what was the reason for that particular timing?
[00:27:02] Unknown:
Now the particular timing is that we felt that the product was more ready because, yeah, before that time, we were still in sort of development process. We were using it internally, but it was not really ready to release yet. And we decided to make it open source mainly because we think that, it's nice to know for our customers at least that their data is stored in an open source database. So for example, if you are a customer for from our monitoring solution, then it's nice to know that if, if something happens to our company, your data is still, yeah, in an open source database. So you can run it yourself or whatever you whatever you want. So this was the main reason why we decided to, to make it open source. And then, yeah, it's it's also like a thing we wanted to try. You know, we have, our monitoring solution is closed source and we like to try to make this open source and see where it goes.
[00:27:55] Unknown:
And what have been some of the most challenging aspects of building and maintaining the project?
[00:28:00] Unknown:
I think, building the scalability was the most difficult Because we want to be able to scale on the fly, so you have no downtime involved while scaling. So this gives, the system or the database, so to say, a whole new state during this process, where it needs to know both the old and the new state where where it is in. So, yeah, I guess that that part was the most difficult feature to build. And, right now, it's it's it's mostly extending with new features. So I think that's that's that's, simple, so to say, compared to the scalability, which we already, have we have created.
[00:28:41] Unknown:
And for somebody who wants to start using Siri, what does the deployment process look like and what are some of the, resources or environmental considerations that they should be thinking of?
[00:28:54] Unknown:
Yeah. Like we said, it's written in c, so it should compile on most systems. It's, not asking a lot of system resources, so it, uses a pretty low memory and pretty low CPU usage. But, yeah, you you should, think about that series not really, does not really fit well with then you have, like, for example, billions and billions of data points on a single metric. So it it it's better used if you have, like, a couple of 1, 000, 000 series or metrics, each having a 1000000 records. That's better than only a few series with 1, 000, 000, 000 of records on a single metric, if you understand what I'm saying. Sorry.
[00:29:42] Unknown:
And so that it sounds like that's largely because if you have your data spread across more different series then you're able to scale that horizontally. Whereas if you have a smaller number of series that you're tracking metrics for and a large volume of them, then because of the way that the that that the data gets balanced, it would constrain that all to a single host. So you would be, so so you would be constrained to having to scale vertically as opposed to being able to scale out. Is that accurate? Yeah. That that's, that's right. That that's the main problem. I I think we can solve this a little bit by maybe,
[00:30:20] Unknown:
adding continuous queries to 0 d b. That's something which we don't have right now. Maybe this can help a little bit in solving this issue, because then we would able to at least aggregate your time series to less points. So because I I guess that if you store, like, billions and billions of records just on a single metric, you don't want to query them all. So there might be a solution in, how do how do you say it in English? Like, compressing this data? Yep. To less points. I believe it's called like continuous queries in, on all of time series databases.
And I think it can help in this problem. But at the moment, CODB is better in having a lot of metrics, each bit like like a 1000000 records or or a few more is not a problem, but it shouldn't be like a 1000000000 of records, on a single metric.
[00:31:17] Unknown:
And given that the project is open source now, are there any particular areas of contribution that you are looking for help?
[00:31:25] Unknown:
Yes. Yeah. But what I mentioned before was that we want to add, like, more, programming languages. Like, we would like to have native clients, for example, for Java. And it would be nice if we got some support from the community which can help building these connectors, so to say. And then another thing is like I think CODP is also interesting for home automation projects, just because it runs on the Raspberry Pi and it uses low resources. I think it's a good time series database in that area. And there are also a lot of home automation projects. You can also, it would be nice to see connectors from these projects to 0db.
So it's, easier to use 0db in that, in that area.
[00:32:11] Unknown:
Are there any other topics that we didn't talk about yet that you think we should cover?
[00:32:16] Unknown:
No, I don't think so. I think you covered most of the things. Okay. Well for anybody who wants to follow the work that you're up to and keep up to date with Siri and the other things that you're working on, I'll have you add your preferred contact information to the show notes. And just for a final parting question, from your perspective, what is the biggest need that you see in the available tooling or technology for people who are working in the data management industry?
[00:32:45] Unknown:
I I would like to see, like, may maybe another database which is has a focus on different things. More like, the subscription model, you know, like, where you can subscribe to, for example, a metric and get updates back on the metric just when received when a new value is received or something like that. You know? I would like to see the database doing that instead of, like, our database, which you need to query all the time. And I don't think there are a lot of databases which are, doing this in a very good way, so to say. So I but may maybe I don't know them all, but, I don't know any database which is doing this really well. Alright. Well, thank you very much for taking the time out of your day to join me and talk about the work you're doing with Siri. It's definitely an interesting project
[00:33:32] Unknown:
and 1 that I am likely to start experimenting with on my own. So thank you for that and I hope you enjoy the rest of your day. Yeah. You too. Thank you, Tobias, for this interview and, I hope you enjoy your day too.
Introduction and Sponsor Messages
Interview with Yaron van der Heiden
Introduction to Siri DB
What is Siri DB?
Landscape of Time Series Databases
Comparison with Other Databases
Server Architecture and Clustering
Failure Modes and CAP Theorem
High Cardinality and Tagging Capabilities
Query Language Design
Use Cases and Open Source Transition
Challenges in Building Siri DB
Deployment and Resource Considerations
Community Contributions and Home Automation
Final Thoughts and Contact Information