Designing Data Platforms For Fintech Companies

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Introducing RudderStack Profiles.

RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable enriched data to every downstream team.

You specify the customer traits, then profiles runs the joints and computations for you to create complete customer profiles.

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack.

You shouldn't have to throw away the database to build with fast changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades old batch computation model for an efficient incremental engine to get complex queries that are always up to date. With Materialise, you can. It's the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Whether it's real time dashboarding and analytics, personalization and segmentation, or automation and alerting, Materialise gives you the ability to work with fresh, correct, and scalable results, all in a familiar SQL interface.

Go to data engineering podcast.com/materialize

today to get 2 weeks free.

Your host is Tobias Macy, and today I'm interviewing Andre Korchak about how to manage data in a Fintech environment. So, Andre, can you start by introducing yourself?

Sure. Thanks for having me here. So my name is Andre. I'm a chief technical officer in the Fintech company,

Mennite. We're working in,

Fintech

API space. So we,

our company,

is kind of

it's like AWS,

services for those who want to create Fintech applications,

but instead of supplying our customers,

with, low level components like databases

or cloud service, we basically provide them the building blocks like invoicing or accounts or payments engine. So instead of writing all the business logic, you can plug in, all our p all APIs in, and they're good to go.

That's that's me, and then that's our product.

So, obviously, I'm a technical director, and, I'm responsible for everything, we're doing in our company.

So taking care about data is part of my responsibilities.

I'm not actively writing data processing pipelines anymore

or I'm not, managing

data storages and data warehouses directly, but I know quite well how everything works, and, I'm I'm responsible for all decisions we take.

And do you remember how you first got started working in data?

Sure.

I have solid backgrounds in software engineering engineering

almost 24 years since I started

writing code. And,

my previous company, my previous startup was, an EdTech company,

and, we were doing natural language processing.

Obviously,

machine learning was involved, and, I was doing

data management, data processing, all these kind of things. Now I'm working in Fintech space, and, we're not machine learning savvy, but, we still have to process customer data.

They still have to train some algorithms, and, obviously, I'm involved in into all these things.

And now getting into the Fintech

specifics and some of the challenges

around data in that space, I'm wondering if you can just start by giving a bit of a summary of

the ways that data is used in Fintech and some of the particular challenges in that space pertaining to data.

Sure. So our company is working with small and medium businesses, and, the space is a bit different from, say, private banking or investment banking. So my examples will be relevant,

only for,

SMEs, and, we have few challenges. So challenge number 1,

we have to analyze and and interpret

quite a lot of financial documents. I'm talking about invoicing,

invoices,

cancellation notes, purchase orders, receipts,

contracts.

So, basically,

in our case, we have to

do OCR. We have to do,

pattern recognition and,

other things related to document processing.

Now second thing, we have to do document categorization.

So we have to be able to distinguish

invoices from cancellation notes and purchase orders and surprise. Those documents look

very similar to each other, and,

separating this this

files in this document. It's indeed very hard task.

And third thing, we have to deal with compliance. And, in order to do this, we're doing fraud prevention.

We do transaction monitoring,

and, other things that might be relevant to that topic.

So these are main things we do in, in our company,

and data related things we do in our company.

Now, well,

a few words about the data sources,

we we have. Obviously, we have, financial operations.

Talking about, bank transactions or card payments.

This

this type of data is extremely,

well, strike well, not extremely, but it's well structured.

Yes. There are tiny differences like a different data formats or different currency codes. But, when you have,

a list of, transactions made from the bank account,

you can make pretty good guesses

where where to find

transaction descriptions or card numbers,

and, that's not very challenging.

Well, documents.

Financial documents coming from customers,

and, there are all kinds of documents,

PDF files,

photos,

scans,

or bank statements,

receipts,

and

dealing with these docs is is is indeed very hard.

Now, we have some data that is not directly

relevant to,

financial,

transactions, but this data is is very helpful,

when you're doing finance management.

We're talking about

CRM records or,

information from the marketing campaign about marketing budgets or

customer support requests regarding some financial operations.

And last source of information we did dealing with, is internal data. We're talking about,

logs or

conversations with our customers,

streams of events generated by the users,

and other internal things.

This this this data is not feasible to to our customers, but,

it it plays critical role, when we operate in our software and then making business decisions.

In terms of the application of data in the business context for fintechs,

I'm wondering what are some of the

key capabilities

that are powered by data in a fintech context

and some of the complexities that are involved in being able to bring data to bear for those problems?

Sure. So,

transactions,

and, all the financial operations. As I mentioned before,

this data is is,

well structured, so it it's kind of easy to deal with financial records.

There is no need in sophisticated

algorithms or machine learning because we already have all the numbers in place, so we can just

do do the math,

or these records. So it's it's not complicated, but but it's it's crucial to,

provide to our customers the spacing functionality related to, financial transactions.

Settlement management. It's something

that's

it's kind of hard to deal with this because as I mentioned before, there are tons of different documents. And,

in the modern world world, people still,

use, invoices or contracts

printed

on paper,

and it's it's extremely hard to deal with the these records before before you'll be able to to do something with these,

documents. You have to scan them, you have to verify them, and you have to categorize these documents. This is called preparatory accounting,

and it's it's extremely challenging area.

Obviously, it it's pretty exhausting to deal with all these documents because,

sometimes

they they come in from different channels,

in different from different countries. They may look different,

and, it's it's for me knowing to categorize these documents to assign

understand these documents to correct department, to right department. So,

that's what we're doing. We're processing all,

all the flows, informations coming on paper, via fax, via email,

and we,

we categorize, recognize these documents, and

head into the right direction.

And analytics, it's something that comes on top of,

transaction information and on top of documents.

So we have to be able to calculate,

the cash flow. We have to be able to identify

gaps in the cash flow. We have to do some forecasting,

and other things. So so it's basically 3, 3 major areas we're operating in, documents management, transaction,

transactions management, and analytics.

I know that in the Fintech sector,

there are a lot of regulatory

requirements.

The risks of getting things wrong in the business are quite high because you can start to lose a lot of money very quickly.

And, also,

any errors in the data or in the application of data can lead to a loss of trust in,

among the customers, which can also lead to a pretty substantial financial hit. And I'm wondering what are some of the ways that those regulatory

and trust issues

factor into the

ways that organizations think about the collection of an application of data?

So as a as a fintech company, we have to

obtain, ISO 2701

certificate. The certificates

defining, the way we are processing

critical financial information. Well, not only financial.

Everything that's, that is crucial for our customers, including scans of IDs,

receipts, and transactions,

all these things.

In in order to deal with all these things,

we have to be ISO certified.

By by the law, when you obtain ISO certificate, you have to enforce some policies on engineers, on the dev ops team, and on all the managers.

So the, the access to the data is restrict,

and only,

a few approved and certified engineers inside of the company can have access to this.

And,

couple of DevOps engineers,

obviously, analysts and managers also can have the access to to the data, but all the critical data fields, they they masked and removed or an anonymized.

And

we have some, data protection policies, backup policies, disaster recovery policies. So, basically, we have policies for everywhere. And we also have to make sure that we not only have these policies, but we're able to,

identify,

let's say, data leakages or critical disasters happening in infrastructure.

And

we do practice,

we do run some training exercises with engineers

and trying to simulate disasters

or

infrastructure cluster collapse and see what what happens after.

Obviously, you have backups for backups, and

it's not because we want this, but because,

we must have,

backups even for backups. And, all the all the backups are encrypted

and, not accessible by third party

companies, even AWS administrators won't be won't be able to get access to to our data.

So, basically,

all the all the cybersecurity,

all the all best cybersecurity practice are in place.

Also, we have multiple data centers, and they have we have separate cloud infrastructures there. So

people from from 1 when 1 data center cannot can cannot get access to the data restoring another data center.

That's partially because we have to be compliant with GDPR and, other regional regulations, but,

also, it's yet another security measure, and

we have to make hackers' life quite miserable who they will somehow get access to to what we have.

Also,

we have to be very cautious when we're choosing third party providers. As a fintech company, we have to do background checks, and we have to

go to the documentation,

and we have to be,

very cautious when we choose, our service providers. So they have they must be ISO 2701

certified because, otherwise, we're, we won't be able to maintain our certification.

They have to

be regulated by local laws and regulations.

And, also, when we're processing, the data with third party providers, we have to,

include,

some closures into, terms and conditions because without written consent, we cannot do absolutely anything, and we have to inform our users that, their data will be used

by a third party machine learning

company that will be doing,

invoice processing.

Basically,

working in Fintech

forces you to sign and to maintain lots of contracts,

agreements even with your own employees, with DevOps team, with engineers.

They cannot have access literally to anything unless sign the contract, and that's pretty annoying.

Basically,

dealing with those things,

well, definitely makes sense, but it slows business down because before we can take any actions, we have to make sure that we're good on the legal side. And before we can,

start working on new feature or before we can sign a contract with new counterpart or vendor,

our legal team have to review it, and

it's basically just pre annoying.

Apart of that, we are we are not very different from other software companies. We just extremely have it regulated, and we have backups to backups because we have to make sure that,

we won't lose any customer data.

To that point of all of your service providers requiring ISO 2701

certification

and the strict

security requirements around the data and the ways that it's processed,

how does that influence

the typical build versus buy decision when you're figuring out what your architectural and system components are going to be? What are the different data flows that you're going to support?

Okay.

That's interesting question,

because when you're working in Fintech,

you cannot build MVP. Well, yes, you definitely can, build a software that does something, for your customers, but at the same time, you have to be compliant, and you have to follow all these

laws, rules, and regulations.

Means that you have to invest quite a lot of time into hiring and training people, into

signing all these contracts, and establishing all these data protection policies.

It means that in our in order

to to get to Fintech space, you have to pay quite a lot for entry ticket.

Now if you're a Fintech company,

you probably will try to buy as many components,

APIs, and services as possible because,

they're usually cheaper than, doing, all these things in house. If you if you're

a seed stage company, you cannot afford to have a compliance officer

or,

for DevOps engineer,

available 247.

But you can pay 10,

15, 20, 000 for that, and,

that sounds like a reasonable price

comparing to alternatives, and alternatives is to hire everyone, to train their people, to find managers, to write product specs, to establish all these,

data regulation policies. So, basically, even even simple invoicing solution, if you want to build this in house, might cost you a few million. And,

MVP of invoicing

engine doesn't seem like a extremely sophisticated,

software,

but it could be because of these,

laws, rules, and regulations.

So if you're early stage company and you want to enter Fintech space, you definitely have to consider,

service providers that's already,

work in this space because

you don't have any other alternatives. Now if you're a big company, at this point of time, you already can

afford to build something in house, and there are usually a couple main reasons for that. So

maybe you already have experienced team and you have all the expertise that is required in order to build

things you want to build, in this case, yes, you can afford it. Another thing, if you already have,

established business and you have,

and you're already pumping in significant volumes through your your software. In this case,

probably,

building your own solution will be cheaper

than

buying third party solution simply because of their wallets.

Or maybe,

signing contract with third party

service provider is contradicting to your business strategy. And,

yes, building a new fintech vertical inside of your company will be expensive,

but you have to,

but you have to make this, choice in order to

make your long term strategy successful.

If you're not a big business, you you don't have any strategic goals,

my suggestion will be to consider,

for third party service providers for your Fintech

vertical.

But, yes, it creates vendor lock, but the alternative,

it can be left

away out of the Fintech market because, as I've said before, the entry ticket is extremely expensive.

And going back as well to your point of requiring

very rigorous backup capabilities,

I'm wondering

what are the

orders of magnitude in terms of the size of the data that you're dealing with and some of the ways that you have to think about managing backups

and in particular being able to

maybe use backup strategies that allow for just using compressed deltas as opposed to having to do multiple full copies the data?

So transactional data doesn't take lots of space because,

we're just talking about numbers

and very short strings of text.

There is no problem with transactional data. And,

usually,

Fintech companies not processing payments directly.

Usually, there is a second copy of information that is always available on on the side of the payment provider. That's easy part. Now we're talking about the fun part, financial documents.

Well, financial documents, they have significant sizes.

Something well, typical invoice, you know, could be a could have a size of 1 megabyte or an average.

And, some companies, they're having, like, a multipage long contracts, and and the size of those documents can be up to,

up to, like, a half gigabyte, and, obviously, we have to take care about that.

And that's a place when we have to rely heavily on on the cloud infrastructure, and the, s 3 storage is here to save the day, obviously, but we cannot give direct access,

to, to the buckets stored inside of s 3. So, usually, we have to

store and create its binary data inside of s 3, and we have to stream data

from those buckets to to our customers. Now talking about backups,

basically, we created not very sophisticated, but the same time reliable,

system of,

that source,

of files in s 3, Basically,

using cross regional replications

of AWS, obviously.

Also, we're making secure backups

via other

AWS services like AWS Glass here, which which which is a perfect solution for this kind of things. But we have to make sure that those binary files, they're encrypted, and they're not, available,

outside of our private network. If our customers want to get access to binary data, we have to

stream and decrypt, these files on the fly directly to our customers.

Data lakes are notoriously

complex.

For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte scale SQL analytics fast at a fraction of the cost of traditional methods so that you can meet all of your data needs ranging from AI to data applications to complete analytics.

Trusted by teams of all sizes, including Comcast and DoorDash,

Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises.

And Starburst does all of this on an open architecture with first class support for Apache Iceberg, Delta Lake, and Hoody,

so you always maintain ownership of your

data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst

and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.

In terms of the application of data

in the Fintech sector,

they have been 1 of the longest users of machine learning in the context of things like fraud detection,

particularly in banking contexts. And I'm wondering what are some of the primary ways that you're seeing ML being applied in the Fintech sector

and some of the ways that those,

ML requirements

as far as training and serving influence the architectural design and the capabilities of data platforms for those types of applications?

Well, obviously, after OpenAI,

released the chat GPT, everyone starts started worried about, their future in the business.

And,

I had some conversations with our product managers and other c level managers,

and I wasn't really worried at all because, like, from my perspective,

this raise of, large linguistic models and

other AI technologies won't change Fintech industry at all. So I have a few reasons, for that.

First of all,

machine learning is already in Fintech for at least couple decades.

And fraud prevention, that's that's 1 story, but, you also,

machine learning algorithms are widely used for credit scoring or for risk analysis.

These algorithms are

quite old, and

they were here even before,

deep learning and large

neural nets. So banks using this, their subgrains already for decades.

And, plus, if if you're talking about transactional data, as I mentioned before, this data is extremely well structured. So we don't we don't have to run all this sophisticated algorithms,

on financial data because we simply can do math.

So modern,

machine learning,

won't change that much because

pretty much everything was was done in this space before.

Now,

the big thing,

was your optical charity recognition. As I said before, we have to deal, with documents that are coming on paper,

and we have to deal with status of receipts, and it's green line.

And, deep learning and other machine learning algorithms, they they definitely

revolutionize that space because,

the quality of OCR processing increased dramatically.

And,

if you remember, like, 10 years ago, there was a thing called Tesseract,

and the quality of,

character

optical characters recognition was pretty poor. The accuracy was pretty low. So you could expect, like, back in the day, something around 40, 50%.

Now with with a w s text track,

we can have accuracy up to 95, 97 percent on some documents.

So pretty much all the text,

is getting recognized properly.

And, also, the quality of preprocessing

increased dramatically.

So if you if you have a folder

taken some if you have a folder of receipt that was taken

in the dark,

with flashlights

and,

with the wrong angle, the text rack will still be able to recognize and extract data

from the,

from that folder. So,

OCR

definitely,

reshaped,

the preparatory accounting space and the way SMEs working with financial data and financial documents right now. Another thing that is definitely affected by a rise of more in machine learning technology,

I'm talking about machine,

large linguistic models,

If you're a Fintech company, normally, you have 1 support,

customer support

officer, 1 customer support engineer

per 1, 000 customers. And if,

all these

customers will decide to write to that person at once, you basically,

are going to have

problems because

this, usually, the customer support,

team is not designed to handle big volumes. And that's what typically happens when you have

major incidents in in banking infrastructure. People start texts immediately start texting and calling you, and your customer support center is not able to handle this request.

So large linguistic models,

decrease the operational cost for, customer support,

And,

help desk team is 1 of the biggest drivers drivers of the expenses if if you run a fintech company.

So

LLMs are potentially able to completely eliminate,

well, not potentially completely eliminate, but significantly

reduce the side of the support team. It'll be possible to have, let's say, 1, 1 agent serving 10, 000 people and 20, 000 people. And it's something that definitely

will, will improve over the time, and we have big plans for that.

Another

major challenge in all data applications and data platforms, but in particular with Fintech because of the regulatory

burdens, is that of data governance.

And I'm wondering,

what are some of the ways that you're approaching that problem of data governance and ensuring that you have appropriate

visibility and access control and segmentation

of data access and some of the

organizational and policy aspects that you've had to invest in to be able to ensure that you're doing data governance in a way that is,

compliant and fulfills the regulatory needs?

Yeah. As I mentioned before, data access is extremely limited, and only, a few trained people can get can get access to this.

Also, every team in every vertical

in our company, they have separate database clusters.

So they can see only only,

the data of their users, and,

only few people in the company can have access to all databases in our company, including me and maybe engineering, managing, couple DevOps engineers.

Now all this we have to do,

we have to deal with data analysis, and we have to give access to,

to the data to analysts and managers.

So we have to offload

all the all the information from database to data warehouse.

And before,

if before we send data to the data warehouse, we have to remove or mask or anonymize critical parts of information.

So, basically,

analysts and managers, they can do

they still can analyze data without

the risk to

accidentally leak some confidential

information on the site. Also,

all the configuration files on all the operations, all database clusters, they're recorded

in repositories.

So we're a GitOps company. And in order to do something,

with, with production or even staging data,

You have to,

write your database queries in repositories. Some somebody is going to review them. And if if the code is reviewed, then the database queries can be executed.

So, basically,

code reviews are mandatory, especially for infrastructure

changes,

and,

everything should be recorded. That's the first

maybe okay. That's maybe the first thing.

And,

we, by the law, we must store data for 5 to 10 years, and,

we have

data retention policies

defined and maintained

on a permanent basis. So people aware that they cannot delete any financial documents whatsoever.

If,

even if they see,

the company that is not touching the data already for a couple years and they say, okay. Well, they have, like,

1, 000, 000, 000 of invoices there. Maybe you can delete something.

They know they cannot because,

maybe tomorrow someone will come and ask the the copy of the invoice that was issued, like, 4 years ago.

And,

oops,

we're gonna have problems if if this data will get get deleted.

And,

yes. So that's

3 major things.

Access is restricted.

Everything we need to do, in terms of and data management

should be recorded and reviewed, and, all data records should be stored for years.

Another

aspect of that governance challenge is also on the side of discovery,

because it's hard to do anything with the data if you don't know that you have it. And then once somebody,

finds a particular dataset that they want to work with,

I'm wondering what the process looks like for being able to request and gain access to that data

and some of the audit controls that you have around how to ensure that they are doing what they say they were going to do and that the, access is managed appropriately?

There is a bureaucracy,

and we have we have to we have to

enforce the bureaucracy on employees because all the certifications

and compliance. So if you need to get access to data, you have to explain to

dev ops team and to me, why do you need this and what exactly do

you want to get from the database.

And you have to also specify,

the time span you want,

have access to. Maybe have to specify,

IDs of the customers of users

you want to see in the database.

And after that and after we,

me and the engineering manager and DevOps team came up with a conclusion that,

yes, you indeed can have access to that data.

We'll manually execute

the database queries and prepare the dump and hand this over to to 1 of the engineers and analysts. Obviously,

all the critical information will be removed from that dump. So, let's say,

invoice, well,

IDs of the bank accounts or personal IDs will be completely removed. If the data is needed on a permanent basis, so for instance, you're working on a new data model, then we have to prepare something for the snowflakes. So we have to create

a a data expert process that will be,

sweeping off all the critical information and automatically pouring this data to to Snowflake.

But,

before we're doing this, we have to understand that we'll be,

we'll be dealing with

some portions of data on a permanent basis. Otherwise, we just do a manual dump and then give it to engineer,

making sure that there is no critical data inside of this.

In your work in this space and working with customers

to enable them to build financial applications, what are some of the most interesting or innovative or unexpected approaches that you've seen to data management in this Fintech sector?

In Fintech sector in general, I was surprised that,

some companies, they're building

quite sophisticated proxies for database servers.

And as proxies, they're able to identify on the fly that

direct credit card numbers stored in database, and as proxies will automatically

remove all the critical information right from your database queries. So

you won't be able to get the date access to that data, what's the error, and you don't have to write any configuration files. So obviously, the proxy servers,

when you're writing this,

config saying, okay. There is a column called card number. Please remove all the day,

all the digits from there and just leave

4 last digits.

But you have to configure this this rules manually, and it's extremely annoying. So I was surprised to see

that, machine learning can identify quite well all these patterns in database columns, and everything was and and everything will be stripped away on the fly. That's that's a game changer because,

preparing data for, analytics analytics or for product managers is extremely annoying process, but we to deal with this anyways.

And in your own experience of working in this space and building a company focused on Fintech, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

Definitely had a challenge this year. It's not directly related to Fintech,

but it's related to API First Business and the API First Company.

It was surprising to me

that dealing with multiple API versions,

on a massive scale

is pretty much tough,

because

we not only have to maintain multiple versions of APIs, but we also have different versions of data created by different API calls.

And managing all these things,

inside of application was indeed very challenging.

So

we identified 3 major strategies.

And, first of all, we were thinking about

store storing and managing,

data in different Kubernetes clusters with different versions of APIs.

It's obviously it's pretty tough approach because

we have to maintain different branches of the code. And whenever you need to patch 1 cluster, you have to cherry pick your commits from the repository and have to somehow patch other clusters too.

But the same time, you're still going to end up with multiple versions of data created from multiple cluster inside the database,

which is which is nightmare.

There is a second approach that is commonly used,

in software companies. You just create little folders,

obviously, different versions of your code. So let's say you have version 1, and now we have to create version 2. So you basically copy paste version 1 to folder called version 2, make changes there.

But this code works with the same database, but same time well, looks like a very simple approach, but,

if you're dealing with, like, a 10, 15 different versions of APIs and update that, that's

extremely challenging.

And we're API first company, and we're dealing with with Fintech. So we have,

some our customers using our APIs for years. So we just simply cannot kill old versions because there is always company using this, and we just cannot send them a letter saying, yes. We're going to switch out this version completely,

something from,

January 1st. Well, please switch. Well, we can't because they pay money for us. So we have to support this more multiple versions, and that was the approach we initially took. So we created these multiple folders,

and, we had all these problems.

And 1 of the engineers,

he created

a framework

for API versioning.

The philosophy behind this solution is quite simple. So we always have last version of data available in database and last version of the business logic.

Now if you have to serve data to someone who is using previous versions of APIs,

this framework basically

downgrades

the data with the presentation

to, to the requested version. So let's say we have 5 versions in total. So last version's

version number 5. So all data and database and all the data warehouse

is stored according to v 5 specifications.

But if you have customer

using v 4, then we basically,

downgrade their v 5 back to v 4. It's like a, database migration process, but done backwards.

And

turns out that it it's extremely efficient process. The the number of lines of codes

dropped dramatically. We removed, like, a hundreds and thousands of lines of code that

was so hard to maintain, and we ended up with this solution. So we always

store the latest version of data in our data system,

designed according to latest version of text specs we have, and we just migrate

all the data

representations

backwards to to the requested version.

API versioning is a hard challenge

no matter what space you're in, but having to maintain it for a indeterminate amount of time definitely adds a significant amount of burden and probably

technical debt to ensure that you're able to maintain that backwards compatibility.

Sure.

Yeah. That that's definitely a very interesting solution that you found to that, being able to maintain the latest representation of the data, but then being able to translate it backwards. So that's that's interesting. And,

so you mentioned that you

keep the latest in the prior version. Do you have to because of the fact that you have people using

some indeterminate version of the API for

in perpetuity,

do you have to maintain the, you know, multiple versions and multiple downgrade steps for a particular data representation?

Yeah.

Basically,

it's it's it's it's pretty it's pretty hard to to jump from v 5 back to v 1. So probably,

it's a good option to to do a couple of migrations in the middle.

And

that also is very helpful when you're doing debugging because you can see all these changes between different versions.

Yes. It's still pretty annoying process because you have to migrate data representations.

And

as you continue to build and iterate

on the product that you're building at Monite and invest in this Fintech sector and enabling other businesses to build financial applications on top of your platform, what are some of the things you have planned for the near to medium term as far as your data capabilities or your platform architecture at Monite or some of the problems and projects you're needed to dig into?

We have,

more like a long term strategic plan because we're dealing with quite a lot of transaction information

coming from SMEs in different countries.

We came up with our

ultimate goal and with our North Star. We're eventually planning to to keep, to kill all,

financial documents and OCR in fintech space because there is a there is a swift product of financial transactions,

made between different banks. And

before that, people

used used telegraph in order to make money transfers.

But,

right now, we live in 21st century, and people still print invoicing and paper and delivering these invoices to each other via physical mail.

And, our plan is to completely kill that industry. Obviously, it's kind of ambitious, but we're gonna take baby steps towards that goal. And probably it's gonna take,

5 years at least.

But the idea is to completely kill all these PDF documents,

scans, and replace them with machine readable,

messages. And, these messages should have very strict structure inside.

So instead of also wiring your

PDF files, you'll get access,

to all your financial records instantly,

and have to manually adjust,

and verify your financial records.

I guess it's something that's,

will probably eliminate thousands of the jobs because someone has still has to scan

and print these papers and deliver. But I guess it's a good thing to do, because,

well, yes, 21st century. We definitely can do better than a bunch of PDL files and paper envelopes developed,

by a mailman.

That's that's in our store.

But on the way to that, to that goal,

we're gonna we're gonna keep expanding in different markets.

It's pretty challenging process because we're infrastructure company, and we have,

spikes of traffic. So when we onboarding on onboard new customer,

you may, start serving,

let's say, a 100 k new users overnight.

And,

usually, when you're doing, b to c software, you don't see this kind of, traffic spikes,

quite often. Usually,

you see a number of users,

gradually increasing,

over time. But if you have platform and, your customer just start importing their users,

to infrastructure,

then, yes, there are spikes. And dealing with the spikes

in something that's,

is is it it's it's pretty hard process because,

yes, we have plans for everything.

We we have,

all the all the infrastructure,

challenges modeled, and we're prepared for everything. But reality is, however, a bit different. And,

yes, that that's a that's a that's a long process of our system evolution

towards that goal.

Are there any other aspects of the work that you're doing at Moneight or the overall problem space of data management in a Fintech in a Fintech context that we didn't discuss yet that you would like to cover before we close out the show?

Yeah. There is 1 interesting topic. So currently,

all,

pretty much all garments,

across the globe, they they're trying to introduce local

standards for financial documents.

And seems like a nightmare because

even in European Union, now we have Italian invoicing standard, Portuguese invoicing standard.

We have German,

standard for financial documentation.

And, basically,

we have the zoo of data formats and standards,

and some of the standards are not even to even have any explanation

in English how it works. So if you want to, get connected with Italian invoice and you have to circle through,

2 or 300

page long document written in Italian language explaining how the scene wasn't works, and after you have to give a call to,

Italian authorities and say, look, like, you would have a plan to get connected with your

documents and exchange gateway.

So, basically,

that's a big night nightmare that is coming

on our heads. And,

I I hope, like, in the 5 to 10 years, people,

will abandon this idea because right now,

it

just multiplies the problems. And instead of just dealing with PDF files and financial records coming from different banking systems, we now have to deal with,

hundreds of different XMLs,

JSONs, or binary formats. And so it's it's it's

it's just for it's just, a very big challenge.

Yeah. The

is there at least any

sort of common subset of information is

there at least any sort of common subset of information across those standards, or is it every country has decided on their own bespoke

format that you they are forcing people to comply with and you have to do

a specific implementation for each 1?

So there is no definite single standard across different countries,

and

I guess,

governments don't have any intentions even to start discussing these topics because everyone

says that, okay. Well, it's time to introduce our own standard

and,

get some extra money from the taxpayers because the invoicing,

will get give us better access to data and, hence, better access to taxes. So

everybody is excited by,

everybody is

excited excited by the fact they can get extra money from the taxpayers,

but nobody is is thinking about the consequences. So that's a problem.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

Well, definitely,

in the Fintech space, we have to build

manually

all this data management solution. As I said before, there are lots of restrictions.

There are lots of,

rules that enforced by, local laws and regulations.

And

that will be absolutely awesome to see a database or data storage solution that will take care about all these things,

because,

roughly saying we spend, like, a 25%

of our tech team budget on on,

dealing with this and

having database that

is already

prepared

to face all these Fintech challenges

and that is already combined with all these laws and regulations, that will be a game changer.

I guess it's not hard to build this

technology from technical perspective.

But from the compliance perspective,

it's complete nightmare. And probably in order to achieve that,

someone needs to have, like, a 10 skilled software engineers and 100

compliance managers and and lawyers and product managers in order just to explain how it's supposed to work.

It's

I I hope someone someone will create that solution in the future because,

it's just so painful to to deal with this all these all these things manually.

Alright. Well, thank you very much for taking the time today to join me and share your experience and perspectives on data management in Fintech and the work that you're doing at Moneight. It's definitely a very interesting and important problem domain, so I appreciate all of the time and energy that you and your team are putting into that, and I hope you enjoy the rest of your day.

Thank you for listening.

Don't forget to check out our other shows, podcast.init,

which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning Podcast,

which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com

Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at dataengineeringpodcast.com

with your story. And to help other people find the show, please leave a review on Apple Podcasts and just tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links