Streamlining Data Pipelines with MCP Servers and Vector Engines

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Data migrations are brutal. They drag on for months, sometimes years, burning through resources and crushing team morale.

DataFold's AI powered migration agent changes all that.

Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches.

And they're so confident in their solution, they'll actually guarantee your timeline in writing.

Ready to turn your year long migration into weeks? Visit dataengineeringpodcast.com/datafolds

today for the details. Your host is Tobias Macy, and today I'm interviewing Kasper Lukovsky about how MCP servers can be paired with vector databases to streamline processing of unstructured data. So, Kasper, can you start by introducing yourself?

Of course. Hello. My name is Kasper Lukovsky, and I'm a senior developer advocate at Quadrant.

We are building a vector database that supports many

applications

which are related to large language models, but not only. And retrieve augmented generation is probably the most typical use case nowadays.

And do you remember how you first get started working in data?

Yeah. Of course. I have a software engineering background. And in one of my previous jobs, I used to work as a software developer, and we started building

big data pipelines at this point.

That was probably

around

2014

or '15. And we build a couple of projects in the automotive

industry,

which were using Spark and all the Apache,

tools that were popular back then like Kafka,

Uzi, and many, many more. And that actually was, like, a natural transition,

for me to start building this kind of solutions,

including not only data ingestion, but also data visualization

and business intelligence parts.

And so now digging into the

current

frenzy around the

applications of data, how to make use of it, how to use it to power these various AI applications,

obviously,

large language models have had a drastic impact on the

utility and applications

of unstructured datasets, which have largely been stuffed off to the side and used for bespoke purposes or used as training corpus for natural language processing tasks. But

with the capabilities and the scale that large language models offer, we can now turn those into

usable assets for various

applications, whether that's business analytics, but more generally for language model applications.

And And I'm wondering if you can just start by talking through some of the challenges that you're seeing teams face in

building the pipelines that are necessary to be able to take that corpus of unstructured data and turn it into usable data assets.

Yes.

Of course, there's been a massive,

impact on how we build data pipelines for unstructured data if we use LLMs.

And I feel like one of the things that we still forget about is that language models are not going to magically solve all the problems with our data, and they do not have any capabilities

to fix it in any way. And, from my per experience, there are many teams struggling with bringing this data because they still don't understand don't understand the nature of language models, which

might be making errors. It's not like a typical application where we write code. We can test it thoroughly.

With LLMs, things are a little bit different because we can figure out the way of how to process data using it, using these models and then face some issues because this is not going to work on all the cases that we have. And quite a typical enterprise case is that people have lots of scanned documents or PDFs,

and they want to bring them somehow,

to their applications.

And there are various ways of of how to do that. Like the selection of a proper large language model is key here or visual language model because we want to interpret images here. But still, there are challenges related to scalability,

to the deployment of these models, especially if we work in an industry that can't just use,

proprietary

SaaS based tools. Then the other teams start to struggle with with setting all the pieces up.

And in terms of the

destination

of those unstructured assets into some usable data assets, what are the typical shape that you're seeing teams use as that destination point where,

unstructured

sources, whether that's turning it into tabular data or extracting numerical data or potentially turning it into graph representations using something like named entity recognition. And I'm wondering what are some of the common applications that you're seeing teams use those LLMs for in terms of that

transformation?

Yes. So definitely

graph RAG is becoming popular. For example, we have just,

finished a case study with our one of our users, and they built a pretty interesting system that was using LLMs to derive ontologies

given some unstructured data. So that was applied to some restricted domains like law and medicine. And they were actually doing a pretty interesting system that was able to understand the relationship in the data, and they used dual modeling approach. So they not only they have vector embeddings used for capturing the semantics of the data, but also graphs

to capture the relationships between different entries. And I feel like this is kind of a typical scenario nowadays. Like, everyone is speaking about graph rack. But when it comes to all the other destinations,

yeah, actually, LLMs simplified a lot of problems that we are dealing with in the past, like name entity recognition, text classification,

translations

as well. So so there are various applications. Obviously, we had some other methods in the past. So,

like, algorithms trade solely for a specific problem. Right now, LLMs

became the factor standard for solving all of these problems

at the same time. And yet, in my experience, vector databases are typically the destination for all the data they process because our users typically want to build some sort of search system that will power their agents

or maybe just the search bar on the website. But this is actually the the,

most interesting case for our users, how to finally start deriving some insights from the data that was not searchable over in the past. So, yeah, I would say this is the most important application.

And then in terms of the modeling

that is involved in the vector storage of these systems, obviously, vector databases

as a category

have seen a massive growth in terms of their adoption and attention over the past two years because of the introduction of LLMs and the use cases around embeddings and rag. But there are also

vector extensions

to other styles of database engines, one of the more popular ones being PG vector.

And so then there's the consideration about what are the

additional

metadata fields or contextual elements that you want to collocate with these embeddings.

And so, obviously, Quadrant is more of a document store.

PG vector is an extension to Postgres, which sits alongside relational data. And I'm wondering how you're seeing teams think about the design elements about what the

broader context and utility

is that they want to get out of the storage medium beyond just the ability to

have some means of storing these n dimensional arrays?

Yeah. So first of all, I would distinguish

databases from search engines because that might be kind of confusing because I wouldn't call Quadrant a document store. It's more like a search engine. So if we would be looking for an analogy here, it's more like an elastic search that we use for keyword or lexical based search in the past. And this is just an alternative to to does to that paradigm that can also capture the semantics through the vector embeddings. And, yeah, like, quite a typical, way of storing these vectors is to also keep the

input data that was used to create them somehow. And the idea is that if we speak about text, then we usually just put this particular chunk that was used to create a vector inside the metadata

because in quadrant, every single point can have even multiple vectors and some sort of JSON like metadata.

And this metadata can actually contain the re the original data that that can be used to reproduce

the process of creating the embedding. But in many cases, we just keep the reference to the original data, which is just stored somewhere else. That might be your relational database or data like sometimes like a URL to a file because vector search is actually not only about text, but it can also make search over images, video, or audio data possible. That was impossible in the past. So, yeah, just to submit up. So the typical approach is to just keep the original data close to your vectors because obviously, you will need it as these vectors do not capture,

like, obvious,

unique of the of the of the information. And the typical

approach is to rather to use this vector for search, and they then take the raw data and pass it somewhere else, like to the LLM if you build rack, or maybe just expose this original data, the documents to the user if you build

just search.

And then once you do have your data

in this embedded format, you have the vectors, you have some reference to the original data, then there's the challenge of figuring out what is the retrieval method that you want to use where vector indexes play a substantial role. But to your point, with Quadrant being a search engine as well, I know it supports the BM 25 index versus the Lucene index that Elasticsearch relies on. And so for people who already have some investment in search infrastructure

as they're starting to do that analysis of how do I want to now store my vector embeddings, whether they use the capabilities

in their Elasticsearch or OpenSearch clusters versus

migrating

at least the vector pieces to Quadrant or migrating all of their search infrastructure to Quadrant, how they manage that evaluation and the,

key decision factors that go into that?

Yeah. Obviously, that's a common question. And, well, there are many companies that invested a lot of resources and time into this kind of search engines, and they eventually

also added this vector search capabilities. But the thing is that vector databases as a category were built solely to support dense embedding vector search and they are just

way more efficient

because the data structures that we use for dense vector search

are totally different

from the ones that we use for lexical search based, for example, min 25. So there are different ways of how you can incorporate

vector search into your existing,

retrieval pipeline. For example, you could obviously use the existing system and then just add these capabilities.

And,

there are some differences in how we implemented

this vector search compared to this, to this traditional systems, so to say. I remember in the past, they were just not scaling that well because they just had a single segment. So that was totally fine to use unless you are already dealing with millions of embeddings because then you could just store it in a single machine and the cost of running that would be just enormously huge. And all the vector databases were actually built with efficiency in mind. So we have implemented that as a

first functionality and Beam 25 is or sparse vectors are just the addition to that,

and vector search is still the key component. And if you have, like, a running search system and if you would like to experiment with vector search,

it's relatively easy to just build a service that will be connecting to both of the systems. So we will have a great system that will be doing the lexical search with all the heuristics involved, and then you will have the most efficient vector search engine that will support the semantics because

retrieval

is actually

very hard problem.

And sometimes you would be, handling your queries in a completely different way depending on

their length or

their semantics actually. LLMs are also quite, often being used for that. So so LLMs might be used to classify the input to your search engine,

and then they might be also used to make a decision of how we wanna handle that.

The traditional lexical search is just way better if we deal with lots of proper names,

some identifiers of your product when the exact,

match matters.

And on the other hand, if you have questions which are like long tail queries,

with lots of words, then it's better to just default to the vector search because that should,

handle this this type of of queries,

way better.

So you will be building a service

around your search stack. You won't be, like, calling the the search engine directly. There will be usually some preprocessing

done. And in that case, you can just add another component just to make sure you have the best system to support

a specific,

scenario. You also mentioned PG vector, the extension to to Postgres, which is also quite popular

when it comes to vector search. Yeah. I know it's popular. Many people are just happy with using it, but the thing is that database is not the best place to put an additional workload, which is related to search. If you have just thousands of documents, then that might be just fine because you won't even notice the difference. However, vector indexes

are pretty

resource

intensive. They require lots of memory. And if you just add this additional load on the existing database, like relational database, then at some point, you will see that the system starts to struggle, with performance

because majority of the resources will be just consumed by the by the search.

Even though the relational databases

are designed to collect and store the transactional data. So

the data which is just key for the business, like

your customers,

your products,

the transactions

which are made between

customers and products, and search engines are more supporting the the system, and they are just really like the most critical part of the application. So that's why we use different systems to support search, and the same the same applies here. If you wanna add vector search capabilities, then it's better to have, like, a separate system that will be doing that effectively,

efficiently, and without

generating this additional load on your existing components.

And then another interesting aspect

of these vector engines for search capabilities

is that they can be used for more than just the RAG and context retrieval use case that LLMs are making more popular. Their original formulation was as actually more of a semantic search capability.

And so I'm wondering how you're seeing teams leverage

quadrant and vector engines more broadly

beyond just the

hyped up use cases around RAG and agentic capabilities?

Sure. Yeah. You're absolutely right. When I joined Quadrant in 2022,

LLMs were not

that broadly adopted yet. Like, ChatDpT was introduced after a few months, and that changed the game completely. Like, right now, I would say 80% of the use cases that we see is somehow related to rack or agents.

But our early adopters

started implementing semantic search as an alternative,

for

keyword based search just because they saw it can just handle some different scenarios that wouldn't be able to be solved with traditional,

means.

And for example,

ecommerce has adopted,

semantic search even before

the introduction of the language models.

Imagine you were running an ecommerce

business and you had lots of products

with titles and descriptions, but they were all English.

And you also wanted to serve people who couldn't speak that language

and semantic search with multilingual embeddings became a pretty easy way of how to enable them to still use the system even though they do not speak the same language. On the other hand,

there are also lots of people who can't really express their intents

using just keywords and

you just don't want to ignore a huge part of your audience just because they can't properly put these keywords to find what they need. And in ecommerce, that's that's a key aspect. You want,

your users to be able to find what they want to buy. And semantic search enabled different approach to search and that was so broadly adopted. But right now, we also see lots of people use vector search for different for different scenarios.

Maybe let me just briefly speak about the basics of vector search. In vector search, we have an embedded model, and this embedded model can take

any input

of obviously, an embedded model will straight on some sort of data modalities like text, images,

or videos, for examples. And text is just the most typical problem that we solve. So then if we have this model, it can take

virtually any text and convert it into a fixed dimensional

vector, which is just list of

floats, some numbers.

And

the these vectors has a useful property of if two different vectors describe a similar object or sample, they should be close to each other in that vector space, and we measure the similarity

using, for example, cosine similarity.

So if the similarity is high, we assume that those observations

are just related to each other somehow.

Okay. And since we have a similarity

measure, like cosine similarity, we can also try to use that measure to detect some samples which are out of the distribution of of the data we want to support.

And vector search is nothing new.

This is actually a pretty good old k nearest neighbors algorithm,

but just approximated.

And this algorithm is pretty versatile. It might be used for

anomaly detection, for example,

in medical domain. Also, if you just want to experiment

and

see what are the clusters in your in your data, you could perform a classification

just by a simple voting procedure because you can select the most similar examples,

which should be labeled already.

So it solves classification and regression at the same time. So this is actually a pretty good method to solve multiple different problems that were typically solved with separate means. And, obviously,

that all might be used combined with large language models too. Like, if you have a system that accepts natural language like query from users

and you see there is a new

query sent to to the to the to the application,

and this query is just pretty far away from all the past observations you had, that should also trigger a human in their loop because maybe somebody is just trying to perform a prompt injection attack and try to use your LLM credits to maybe create some structure output and just attack the system in that way. And if you already calculate the embeddings, you can also try to to to perform this anomaly detection using the same,

the same vectors. So this is really versatile method,

and we see many people use that not only for rack or agentic proposal, but also to support this traditional problems that we had in machine learning in the past right now just with vectors.

Now in terms of

the work to be done as far as integrating

the

vector engine into a broader system context and managing the data loading and transformation

for creating the embeddings and maintaining them and managing the retrieval.

Obviously, there's a lot of work that has gone into that, and I know that one of the areas of focus that you have is in building and maintaining the model context protocol server for Quadrant.

And I'm wondering if you can talk to some of the ways that the introduction of MCP

as a protocol

has

simplified the work to be done or how it is integrated into that overall flow of information from managing your unstructured sources, loading it into Quadrant, and then also managing it for

retrieval? And I know it also unlocks the or it simplifies the work of using Quadrant as a store for contextual memory.

Yeah. Sure. So, surprisingly, I wouldn't recommend using the MCP server just to ingest the data into the system. There are many other tools that might be used for that. And, for example, Airflow being the the my preferred one probably just because the MCP servers are supposed to act as some sort of plugins so you can connect different tools to your LLMs. And it doesn't make much sense. Like, you have more possibilities if you just use the SDK directly. Like, vector databases

have a lot of optimizations

that might be, used to make it cheaper, faster, or just more accurate. And in that case, if you just use the MCP server, you are losing this ability or you still need to use both. So it doesn't make much sense. There are plenty of other, other tools that you can use out of the box, like Airflow

or some cloud tools if you prefer. However, MCP has enabled people to use

the data that you ingest for your pipelines

in various applications.

So there might be like a very heavy ingestion pipeline that will be just taking lots of unstructured data and sending it to quadrant.

And on on the,

other hand, there will be like lots of people connecting to the same quadrant instance

and using the ingested data for their own purposes. Because quadrant here acts as a knowledge base

and,

this knowledge base,

may contain different things that might be just some sort of data which is

specific to your business

or it might be just a set of code snippets coming from different projects you created within your organization

and depending on,

how you would like to use it. And MCP servers,

at least our MCP server is not designed to support the ingestion part. Of of course, there is such a tool. You can use that tool if you, let's say, connect your MCP server

with cloud desktop and you would like to build a personal knowledge base that will just store all your memories.

But I strongly

encourage you to use a different tool for the ingestion pipeline.

This is just not the best way of how to integrate with all these tools, and that applies not only to ours our server, but also to many other ones. Surprisingly, we also modified our existing MCP server in a way that allows you to run it in read only mode

so the LLM won't be even able to store anything inside your quadrant collection. So I feel like that should be the preferred way of using it except for some very niche use cases.

So in terms of the role of the MCP server in the overall system architecture

and the ways that data engineers should be thinking about the

context management within Quadrant.

What are some of the best practices that you found around how to

structure the data in a way that is conducive to the retrieval elements?

Obviously, chunking is is more of an art than a science currently.

Embedding models are generally going to

give you different results based on the particular domain that you're in. And I'm wondering how you're seeing teams think about that end to end flow and some of the ways that the MCP server maybe reduces the barrier to entry for the consumer side to help with some of the evaluation work to be done?

Yes. That's a great question. So definitely, chunking is one of the the problems we need to,

solve once we decide to implement

vector search. And there are no easy answers here. Like, obviously, the default

settings in all the popular frameworks are not the best ones. I mean, you can obviously try to use the simplest means possible, like set,

window size and try to divide your documents into

chunks of that size with some overlap between them so you do not lose the context. But in many cases,

context is not only about a particular piece of text, but it might be derived from, let's say, we work with some documents like PDFs. They are just they have some formatting like headings. And adding these headings to your chunks

usually helps to understand the overall context of of that particular piece. And,

on the other hand, if you work with code, you would like to, create a system that would be searching over your code base,

then a particular loop with some, with some function calls

do not say that much about the role of that particular piece of code in the whole application.

So in that case, it's better to maybe keep the name of the class. This particular method belongs to the name of the method itself, the parameters that are sent to it along with that real,

body of of this particular method. And also, like, including the documentation

strings

can also help with that. I really like the paper from Anthropic, actually, the an article, not a paper,

about contextualized

chunking. And they presented a pretty interesting idea of how to use LLM

to summarize the role of a particular piece of the document

in the context of the whole document.

So LLMs are actually pretty pretty good for that, especially if we deal with traditional PDF like documents because they can easily summarize the the documents. And then if we have a summary of the whole document,

we can

also clearly state the role of this particular channel that we create. But, yeah, there are no easy answers, as I said, and that really depends on the on the data you have. In many cases, ChangIn is, just an experiment that you need to conduct,

and

evaluation is key here.

But good good news is that evaluation

in information retrieval is nothing new. And if you are able to,

test your search pipelines,

I I assume you do if you, think about it seriously.

Then you could also use the same means to evaluate the quality of of retrieval in case of vector search because

you use the same metrics, the same tools,

and

nothing is new except for the paradigm that you use to to search.

One of the interesting

aspects of the current era that we find ourselves in is that software and data engineering

for the past few decades has been very deterministic

based on

standard practices that are provable using testing or theorem provers and have standardized architecture. And it seems that the introduction of large language models has

generalized the need for

the adoption and

expertise in experimentation

that data scientists have been dealing with for a long time now.

And I'm wondering

how you're seeing

data teams in particular

come to grips with that aspect of the work to be done where they need to be more in the loop of experimentation,

tracking the results of those experiments,

being able to iterate quickly on

versioning their different chunking strategies or deploying different embedding models and managing the reembeddings because we're not at a point anymore where we can say, okay. This piece of the overall workflow is done because the workflow is constantly changing as new capabilities and new models come out. And I'm just wondering how you're seeing that dynamic play out in data teams in particular.

Yeah. Evaluation is key. Definitely,

data teams should be should be focusing on evaluating

multiple pieces here. For sure, choosing the embedding model is an important component of that.

I'm not a big fan of just testing out all the embedding models that exist and carefully watching the new released models and trying to use them. There are different ways of how you can improve the quality of search than just taking the

the best,

according to the benchmarks,

the best model that exists and trying to re ingest all the documents you have.

Let's be honest with with that. If you have millions of documents, then creating the vectors for all these documents will take you a lot of time. Even if you can just scale up your environments, that would be expensive if you just take the biggest model that exists. And evaluation is key. Sometimes we can sacrifice

some of the precision

just for the sake of of,

effectiveness

and efficiency.

So what I usually try to to convince our users too is that they should just take the smallest model that does the job on an acceptable level. And I'm a big fan of using the the small models. They are easy to fine tune,

and there are plenty of ways that you can, just increase the quality of search

just after creating the embeddings.

But, yeah, let me just summarize, like, what are the aspects that we need to evaluate when we built reach value metric generation or anything related to LLMs

that has this vector search component.

So, basically, choosing the harmonic model and choosing the it based on some internal benchmarks, not the public benchmarks that we can easily find like MTEP. MTEP is the most popular benchmark that presents the

the quality of the retrieval for different embedded models. And the thing is that some companies can easily just train the models so they shine in these benchmarks,

but none of these benchmarks will have

the data that might be just specific to your to your own business. And in that case, doing this evaluation on your own is key. But on the other hand, if we use LLMs

in the whole pipeline that we also need to evaluate the quality of the outputs of the LLMs, so there are just two models to evaluate separately.

Like, if we are confident in our retrieval phase, we are sure that the embedded model does, its, its

job. Then we also need to evaluate the quality of the outputs of the LLM. If it's struggling with answering the question, even though the context provider is just fine to answer it, then choosing a different one might be important. And this

is a challenge, actually. Like, evaluating the retrieval is easy compared to evaluating the LLMs. There are various metrics, but there is no consensus yet of how to do it properly.

And we typically just take

a better LLM to evaluate

the worse or the smaller one,

that we want to use. Or if we can't use the

SaaS,

tool,

then we use it to evaluate the quality of, on premise,

LLM.

So this is also a challenge. But, yeah, many experiments have to be done in if you really want to build a system that will be able to to work with your data, and definitely, it's not gonna be deterministic.

I mean, LLMs might be working fine with 99%

of the cases, but at some point, you also need to make sure that you are able to trace back all the errors that may occur and monitor the use of the LLMs

in some way. Observability

here,

is a really important component that many teams forget about.

In terms of the evolution

of these systems, particularly

when you're dealing with the

embeddings as a corpus of context for a rag model or an agentic use case,

As new embedded models come out, as you change trunking strategies,

you can potentially balloon the overall storage size of the vectors that you're storing. And I'm wondering how you're seeing teams manage the life cycling

of those

embeddings and understanding which ones are being used, when they can age out older embeddings, and just some of that overall

management of the evolution of these systems without necessarily just letting costs grow

unbounded.

Yeah. That's definitely important, but it's not something that you would get for free. Like, it's not built into any database, vector database that I know. It's basically something that you need to monitor on your own. But the thing is that if you have an application then running already in production, then you rarely just experiment with new embeddings in that production environment.

There is typically an,

process of evaluating them offline

that you do on just a fraction of your data

on some ground truth dataset that you are able to build. And then once you have have proofed that the new embedding model, for example,

works well in that scenario, then you can just swap it with a new one. But, yeah, this is important thing that the cost of running vector search might be high because this is so memory intensive. That might be mitigated

by different,

means, like, various ways of how to optimize the storage

so it doesn't cost you a fortune to to run, semantic search. But in general, you rarely

just experiment with dozens of different embedding models in production because this is just too expensive

to to do. And if you

have a production

system that you would like to improve the quality of, you can experiment with some other techniques.

Like, for example,

instead of just retrieving the context using this,

single

vector embedding,

You can also try to use a re ranker.

Just take some more candidates in this initial retrieval phase and then try to re rank them

so the better results just pop up, in the top of the results. And that's a pretty easy way that doesn't require you to

change the

structure of your collections

in the vector database.

And, yeah, various means that you can you can use in order to make it better. But, yeah, these experimentations

are are not done in a living system, but it's more like a research that has to be done beforehand.

And changing the analytic model is actually an important decision.

And yet not that many teams do that that often.

As you have been

building and iterating

on the model context protocol server for Quadrant and working with the Quadrant community and coming to grips with the constant evolution in the space. What are some of the most interesting or innovative or unexpected ways that you're seeing the combination

of MCP and Quadrant applied?

Well, I wouldn't say it's that surprising,

but when people got excited about the wipe coding, like, everybody started to build applications over the weekend, but the MCP servers

might be pretty useful to

perform something that I like to call grounded wipe coding. I mean, if you work in a company

that has lots of different projects

and you wanna keep up the standards,

then you definitely just don't want to wipe code an application and let the LLM to do whatever it wants, but you want to have, like, a knowledge base that will understand

all the projects that you have. It may also contain, like, code snippets so you can reuse them, or maybe you have a very specific fronted framework that you use

in order to keep all the applications you create consistent.

And actually that became a pretty standard way of using our MCP server. People use tools such as Cursor, Windsurf, argument code, or even Versus Code nowadays,

and combine it with quadrant that acts as a knowledge base for this grounded wipe coding. So they put all the all the code snippets, for example, here. And then if they ask the,

their agent, the

coding assistant to create something brand new, it will start with trying to find some some similar code snippets from different projects. So maybe it's not necessary to create it from scratch, but maybe we can just use some of the components.

Or maybe it can just point you that there might be like a library,

internal library within the company that you can use for that specific purposes.

So you don't reinvent the wheel. And that's

not that surprising,

but pretty interesting,

area that we feel will be just growing because that's the majority of of users,

is trying to do now. And surprisingly, I I didn't really anticipate this huge success of the MCP. I thought it will be only adopted by Anthropic,

even though we just released this MCP server, like, few days after it was

announced. And then we were just astonished by how many people started to use it for different things

and by the fact that OpenAI,

Google,

also started to incorporate this protocol in their products. And that was actually

a a moment we realized this is more important than we thought.

And in your work of helping to build and guide and interact with the community around Quadrant

and experimenting

with these various

vector use cases, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

I think many people still deal with,

the the cost of running vector search.

And,

surprisingly,

not that many know that there are plenty of options of how to optimize it. I mean, methods such as quantization

can can help to reduce,

the cost of running vector search greatly. Like,

one surprising thing is that if you use relatively high dimensional model, like let's say over a thousand dimensions,

just not that rare nowadays,

then your model will be typically,

compatible with binary quantization.

So you can reduce the memory footprint by up to 32 times and make it faster by up to 40 times. And, yeah, many people that started building vector search were just dealing with this this problems. And when they just enabled binary quantization on their collections, on their vectors, things were started to to to behave way better. So this is, a common challenge that I see in the community. And there's also,

not that much of understanding of what vector search really is. I mean, people really think that vector search will magically magically solve their problems and just enable search over any data that they have. And, unfortunately, it's not that easy. Like, you you really need to have a a model that supports your data, and you can't just take the the simplest one that's available out there. You really need to choose it wisely. And since I'm based in Poland and I spoke to so many, companies here in Poland, I realized there are just many people trying to use the default ones that that come with blockchain

or any other, framework that that exists in that space. And for example, they take open AI embeddings because they were just the default ones for a very long time and try to use them for non English data. That's okay. In some cases, I I think, like, that works pretty well with German, but it's just not documented anywhere. There are just various options. Like, there are various

model providers that can officially

support multiple languages at the same time. But if you just take whatever exists as a default, then you will just come to the conclusion that vector search doesn't work. And that's a homework that everyone has to do, just choosing the embedding model wisely. And I'm not even saying about the evaluation, like building ground ground of datasets and running this

whole evaluation process processes. But it's more about just read the model card on hacking phase or on the model provider's website

and see whether your particular language is supported by the model you selected. I can strongly recommend using coherent models here if you deal with multilingual,

data

because they claim to support over a 100 languages at the same time with a really great quality. But there are also multilingual

open source models available out there, for example, in sentence transformers.

And this is quite

typically a challenge. And on that note, we all know that fine tuning LLMs is a pretty expensive thing, and not all the companies around can afford

to fine tune their models to to the very specific problem they have.

But contrary to that, fine tuning the embedding models is

relatively easy and cheap.

And even if you can't find the ideal model that would be supporting

all the cases you need to support,

fine tuning it is

relatively easy, and that might be done on a very limited hardware

in actually no time. It doesn't require you to have, like, a cluster of GPUs.

Maybe a single GPU with

a good base embedding model will be enough to just adjust the model so it works better in your specific domain with your specific terminology.

And, yeah, and that's actually something that many mature teams started to do at some point when they started to struggle with the pretrained models that they just started to use from the very beginning. So I really encourage to have a look at fine tuning the embedded models,

because there are lots of interesting materials on that, and it's not that,

complicated as it may see at the first glance.

And as people are

designing their

systems, they are evaluating

different approaches to context management,

vector storage,

vector search. What are the cases where you would say that MCP

and or Quadrant are the wrong choice?

So definitely MCP

shouldn't be used for the data ingestion if you really deal with lots of data. That's something we've already,

mentioned. But they're great if you just want to connect multiple clients to the same instance of the of your vector database. So let's say your CTO can use cloud desktop and still search over the data easily and use that to extend the context of the prompts. So this is great. And on the other hand, your developers can also connect to the same collection if they use this AI coding agents.

So, that's definitely

a good choice if you really deal with lots of different clients that would like to connect to that,

same,

quadrants collection.

But on the other hand, there are many cases in which vector search is not the best choice. I think we have discussed that that example already, but

if you have a running search system

and if you see that majority of your queries

come from the users that can speak the same language like as you do, and I don't mean a foreign language, but

they use the same terminology

as people who create the datasets,

and also they can express themselves

in a concise way. For example, they can provide a very specific

product identifier they are interested in buying because you are providing a tool for the domain experts.

In that case, vector search based on dense embeddings do not make much sense. It's maybe better to just use the traditional lexical search. And, yeah, this is actually an edge case, but I also remember one of our users just started to use Quadrant as if it was a regular database.

So so, they were not putting any vectors, but they were mostly using it as if it was a MongoDB,

database. You can technically technically do it. That's not the best choice because obviously search engine is not something you would like to to use

as your primary data store. And but let me think about it. I feel like we should definitely evaluate every single case,

individually

because if you

work with search and if you see that there are some cases you can't support with the existing means, then vector search is usually a a good alternative to that. And quadrant is a

really efficient,

vector search engine. So so maybe, we we would be we would be able so maybe we would be able to to help to improve the quality of the search results easily.

And as you continue

to build and iterate on the quadrant

technology

as well as the MCP

server for it, what are some of the things you have planned for the near to medium term or any particular use cases that you're excited to explore?

Sure. So so first of all, we started the MCP server as a template just to show people how to build their own MCP servers that would be connecting to quadrant. And to our surprise, people started using them as if they were just regular tools. So we decided to go this dual dual way. So first of all, our MCP server is available as a regular Python library.

So we can build your own MCP server that will be connecting to Quadrant. You can add some additional functionalities

if you prefer to. And on the other hand, you can just use it as a as it is, as a tool that you would use to as a as a gateway to your existing Quadrant instance. And one angle that we are currently exploring exploring is related to code generation, this grounded vibe code that I mentioned or crowded,

coding with the use of AI agents,

we actually want to create some separate

MCP servers

that will be handling the code search specifically.

So they will use embedded models that were trained on code. So they should handle the code search way better than the traditional

general purpose embedded models that that we used in the in this base,

MCP server. So that's definitely something that we are exploring. And, yeah, we are open to discussion. Definitely, we don't want to expose an MCP server that will be just overloaded with different tools. So for example, you can manage your quadrant instance or scale it up from the chatbot interface. We believe that's not the way to go even though there are some other MCP servers that try to

expose this administrative

tasks to

to the LLMs.

And one thing that I'm currently,

having a look at is the support of different parts of the protocol in different tools.

We haven't mentioned that already, but actually MCP is not only a simple tool calling. But MCP server has differ,

can have different

resources or prompts, for example. So they might be also used not only for tool calling, but maybe as a a source of truth for the prompts, you know, that work in very specific cases.

And I believe that might be pretty useful,

for cogeneration.

However, the adoption is not not so so wide here. Like, many of the existing MCT clients focus solely on the tools. So they treat the model context protocol as if it was just a fancy way of performing the cool tool calling. Hence,

I just believe that

they should start to incorporate

or incorporate some additional,

parts of the protocol, like the possibility

for the MCP server to call the LLM back or just to ask for some

clarifying questions because that's something that we definitely need

in order to be able to support really various scenarios. So I believe that's the direction we'd like to go in the future, and cold search is just the beginning of it.

Are there any other aspects of the work that you're doing on the

MCP server for Quadrant or Quadrant itself or the overall space of vector engines that we didn't discuss yet that you'd like to cover before we close out the show?

I think there's,

different aspects of the model context protocol because, model context protocol is not only about the tools, but also about resources, prompts,

and this abilities to query the LLM.

But I think we covered that already. So so that that would be it probably.

Okay. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

I feel still many companies can't just I'll be speaking about the applications

of LLMs in terms of data management because I feel like that's something I feel confident about. But I feel that many companies can't just easily take the SaaS tools and bring their data into the LLMs so they can perform these these processes. And, definitely, we still lack some hybrid cloud capabilities

so we can write, run products with the ease of cloud

on our own premises.

That's actually something that we've done in quadrant. We have a hybrid cloud offering, so you can just bring your own Kubernetes cluster and run quadrant,

using the UI that you would get with managed cloud. And on the other hand, we don't have access to your infrastructure at all. It's only like one way communication, so we can scale it up, but we won't be able to see any of your data. And I I feel that enabled the adoption of vector search in many,

of our customers

because they couldn't just use the the managed cloud easily. They didn't want to host the open source version of their premises because they wanted to have the support that comes with the cloud offering. And hybrid cloud was actually a game changer.

I feel like not that many providers

have this kind of capability,

especially when we speak about batch language models or embedded models, at least some of them. It's not that easy to bring them to any,

corporate because of that reason.

If we

want to build

data pipelines, then definitely that's something that we still miss.

Alright. Well, thank you very much for taking the time today to join me and share the work that you've been doing at Quadrant and on the MCP server for it. It's definitely a very interesting project. It's definitely one that I'm seeing get a lot of adoption. I'm actually using it for some of my own use cases. So I appreciate all the time and energy that you and the rest of the team are putting into that, and I hope you enjoy the rest of your day. Thank you. Thanks for the invitation. That was a real pleasure.

Thank you for listening, and don't forget to check out our other shows. Podcast.net

covers the Python language, its community, and the innovative ways it is being used. And the AI Engineering Podcast is your guide to the fast moving world of building AI systems.

Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast dot com with your story.

Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and colleagues.