From a Thousand Undocumented Interfaces to a Data Mesh

The data mesh concept, as Zhamak Dehghani articulated it, describes a sociotechnical approach to data architecture built on four principles: domain-oriented decentralised ownership, data as a product, self-serve data infrastructure, and federated computational governance. The academic and consultancy literature on data mesh is now substantial. What is conspicuously absent from most of it is an honest account of what implementing data mesh actually looks like when you start not from a clean architecture but from the reality of a mature bank — one that has been accumulating systems, integrations and data flows for decades, and whose interface estate is large, partially undocumented, and in some cases understood only by people who left the organisation years ago.

This article is that account. It draws on the experience of building a data mesh architecture at a BaFin-regulated commercial real estate finance bank, starting from an interface estate that, on first assessment, numbered close to a thousand connections — the majority of which had no formal documentation. The lessons are specific, sometimes uncomfortable, and directly applicable to any established financial institution facing the same challenge.

Why Legacy Banking Data Architecture Is the Way It Is

Before describing the solution, it is worth understanding why the problem exists in the form it does. Established banks do not have poorly documented data architectures because of negligence. They have them because of rational, incremental decision-making over long periods of time. Each individual integration — the feed from the core banking system to the risk engine, the nightly batch to the regulatory reporting platform, the real-time push to the customer portal — made perfect sense when it was built. Nobody designed a tangled, undocumented mess. It emerged, one pragmatic decision at a time, over twenty or thirty years.

The result, in a mature bank, is typically a large number of point-to-point interfaces built on a mix of technologies and protocols spanning multiple generations: batch file transfers over SFTP, database-to-database connections, message queues, REST APIs of varying vintages, proprietary ERP integration frameworks, and some connections whose mechanism is genuinely unclear from the available documentation. Each interface carries implicit knowledge about data formats, business rules, transformation logic and timing dependencies that was understood by its creator and may or may not have been passed on. When the creator has left, that knowledge often goes with them.

In the institution where this work was carried out, the initial estimate of the interface estate was in the hundreds. Systematic discovery revealed the actual number was close to a thousand. This is not unusual for a bank of its age and complexity. It is simply what accumulation looks like at scale.

What Data Mesh Actually Means in a Regulated Banking Context

The four principles of data mesh require some translation before they can be applied in a regulated financial institution. Domain-oriented decentralised ownership is compelling in theory but immediately raises a question: what are the domains? In a commercial real estate finance bank, the answer is not obvious. The candidates — loan origination, risk and collateral management, treasury, finance and accounting, customer management, regulatory reporting — overlap in ways that resist clean separation. A loan object is simultaneously an origination artefact, a risk exposure, a collateral-secured asset, and a regulatory reporting unit. Deciding which domain owns it, and therefore which team is responsible for the data product built on top of it, is a governance decision with significant organisational implications.

Data as a product is philosophically attractive but practically demanding. A data product, in the mesh sense, is a unit of data that has an owner, meets defined quality standards, is discoverable, and can be consumed by other domains through a stable interface without direct coupling to the source system. Building data products from legacy sources — particularly from ERP systems like SAP, which have deep, complex data models that were designed for transactional processing rather than analytical consumption — requires significant investment in transformation and abstraction layers that the source system's owners are often reluctant to build and maintain.

Self-serve data infrastructure implies that domain teams can build and publish data products without requiring central platform team involvement for each new product. In a regulated environment, this collides with the governance requirements that regulators impose on data — particularly the data lineage, quality and access control requirements of BCBS (Basel Committee on Banking Supervision — the primary global standard-setter for the prudential regulation of banks) 239, GDPR and DORA. Complete self-service is not achievable in a regulated context. What is achievable is a platform that enforces regulatory guardrails automatically, so that domain teams can move with reasonable speed within a constrained but well-defined space.

Federated computational governance — the principle that global standards and interoperability rules are set centrally, while implementation is distributed — is the data mesh principle most naturally aligned with the way banking regulators think about data governance. The tension, in practice, is between the mesh philosophy's preference for minimising central control and the regulator's expectation that there is a clear accountable party for the quality and integrity of regulatory data. Resolving this tension requires architectural choices that are specific to the regulatory context — choices that the standard data mesh literature does not address.

"You cannot build a data mesh from a legacy estate by applying the principles directly. You first need to know what you have. The discovery programme is not a precursor to the architecture work — it is the first and most important part of it."

The Interface Discovery Programme: Starting from Reality

The practical starting point for any data mesh initiative in a legacy banking environment is an honest, systematic inventory of the existing interface estate. This sounds straightforward. It is not.

In the institution where this work was carried out, the discovery programme proceeded through four overlapping phases, each of which revealed a different category of undocumented or incorrectly documented interface.

Systems of Record Mapping

Starting from the application landscape — every known system in the inventory — and working outward to map all documented connections. This phase typically recovers around 40–50% of the actual interface estate. The documented interfaces are generally the newer ones, the ones built by central teams following a methodology, and the ones that have been subject to regulatory scrutiny. The undocumented ones are older, built by individuals under time pressure, or inherited through acquisition.

Network and Infrastructure Analysis

Examining network traffic patterns, firewall rules, database connection logs and middleware configurations to identify connections that exist in the infrastructure but not in the documentation. This phase consistently produces surprises — scheduled jobs running on servers that nobody has updated the CMDB to reflect, database links between systems whose owners believed they were independent, API calls visible in gateway logs with no corresponding entry in the integration register.

Institutional Memory Interviews

Structured conversations with long-serving staff — particularly in operations, finance and risk functions — to surface interfaces that exist primarily as organisational knowledge rather than documented architecture. A substantial number of batch processes, data extracts and manual reconciliation workflows are only discoverable through this route. Identifying the right people to speak to, and asking questions in a way that makes it safe to reveal informal or undocumented processes, is a human and organisational challenge as much as a technical one.

ERP Deep Discovery

Separate, specialised analysis of the SAP/HANA estate. ERP systems in banks are simultaneously the most important source of authoritative financial data and the most complex to analyse. SAP's internal integration framework, the number of custom Z-transactions and reports that extract data to external systems, and the interdependencies between SAP modules create an interface sub-estate of significant scale and complexity that requires dedicated expertise to map accurately. This phase alone often doubles the interface count for institutions with a mature SAP implementation.

The output of this programme — a comprehensive, documented interface register — is not only the foundation for data mesh design. It is also a significant regulatory artefact. BCBS 239, the Basel Committee's principles for effective risk data aggregation and risk reporting, requires that banks can demonstrate the completeness and accuracy of their risk data from source to report. That demonstration requires knowing what the interfaces are, what data flows through them, and what transformations occur in transit. An interface register built for data mesh design purposes simultaneously satisfies a core BCBS 239 documentation requirement. This alignment of architectural and regulatory purposes — both pointing toward the same underlying need for documented, governed data lineage — is one of the most practically useful insights from this work.

The SAP/HANA Challenge: Your ERP Is Both the Problem and the Solution

In any bank that runs SAP as its core finance and accounting platform, the ERP system occupies a unique and uncomfortable position in the data mesh journey. It is almost certainly the most authoritative source of financial data in the organisation — the system of record for ledger entries, cost accounting, vendor management and financial reporting. It is also, typically, one of the hardest systems to expose as a clean data source for downstream consumers.

SAP's data model is optimised for transactional processing, not for analytical consumption. The tables that hold financial data are numerous, inter-related in non-obvious ways, and populated through processes that embed business logic in ABAP code that may not be well documented. Many banks have accumulated years of customisations — Z-tables, custom function modules, bespoke reports — that extract, transform and route data in ways that are not visible from the standard SAP documentation.

The practical implication for data mesh design is that SAP cannot simply be designated as a domain and expected to publish clean data products from its native structures. A data product built directly on SAP's raw tables will be brittle, hard to maintain, and dependent on expertise in SAP's internals that is increasingly scarce. The more sustainable approach is to build an abstraction layer — an Operational Data Store or equivalent — that sits between SAP and the data mesh, presenting a stable, well-documented interface to the mesh while absorbing the complexity of SAP's internal structure. This layer becomes the SAP domain's data product: the canonical representation of financial data that other domains consume.

A Practical Observation

In the institution where this work was carried out, the SAP/HANA integration was the single most resource-intensive component of the data mesh programme — consuming more design effort, more implementation time and more ongoing governance attention than any other domain. The complexity of the ERP estate is consistently underestimated in data mesh planning, and the institutions that plan for it explicitly — allocating dedicated SAP architecture expertise, building the ODS layer as a first-class programme deliverable, and securing SAP team ownership of the resulting data products — consistently deliver better outcomes than those that treat it as a standard integration task.

Domain Boundaries: The Organisational Decision Disguised as an Architecture Question

Defining the domain boundaries for a data mesh is presented in the literature as an architecture exercise — a matter of identifying bounded contexts, aligning with business capabilities, and drawing clean lines between areas of responsibility. In practice, in a bank, it is an organisational and political exercise in which the architecture question is secondary.

The reason is that data ownership in a bank is not currently aligned with any domain model. Data about a commercial real estate loan is created by the origination team, enriched by the risk team, priced by treasury, accounted for by finance, reported by the regulatory reporting function, and managed throughout its lifecycle by operations. When you propose that one of these teams becomes the "owner" of the loan data product, you are proposing a change to accountabilities, resource commitments and decision rights that the teams involved have strong views about.

The architectural principle that domains should be aligned with business capabilities is correct and useful. But the work of actually establishing domain ownership — identifying the domain owners, securing their commitment to the data product responsibilities the mesh model requires, and navigating the organisational boundaries between teams that have previously operated independently — is leadership and change management work. It cannot be delegated to the architecture function and it cannot be resolved by choosing the right tooling. It requires sustained senior sponsorship and a clear organisational mandate that the data mesh programme is a business transformation, not a technology project.

Governance in a Regulated Data Mesh

The federated governance model that data mesh advocates describe — where global interoperability standards and data quality rules are set centrally, but implementation and enforcement are distributed to domain teams — is the right model for a regulated environment. The difficulty is operationalising it.

In a BaFin-regulated institution, the central governance requirements that must be applied consistently across all domains include: data classification according to the bank's information classification framework (which determines who can access data and where it can be processed); data quality standards for regulatory data, including the accuracy, completeness and timeliness requirements of BCBS 239; lineage documentation for all data that flows into regulatory reports; access control and audit logging requirements under GDPR and BAIT; and retention and deletion obligations. These are not negotiable at domain level. They are constraints within which domain teams operate.

The practical implementation of federated governance therefore requires two things. First, a data governance platform — in the institution where this work was carried out, Collibra was used — that serves as the authoritative registry for data products, data assets, lineage, ownership, quality metrics and policy compliance. Collibra becomes the single source of truth for data governance across the mesh, enabling central oversight without requiring central teams to be involved in every data product's development and maintenance. Second, a set of technical standards for data products — covering data formats, API contracts, quality assertion patterns and lineage metadata requirements — that are enforced at the platform level so that compliance is a consequence of building on the platform rather than a separate review step.

The Collibra implementation itself is a significant undertaking. Populating a data catalogue for a bank with a thousand interfaces and dozens of systems is not a one-time exercise. It requires sustained effort from data stewards embedded in the business, tooling to automate lineage capture where possible, and governance processes that ensure the catalogue stays current as systems and interfaces evolve. Institutions that implement Collibra as a project deliverable — populating it once and declaring success — typically find it has become stale within eighteen months. Treating data catalogue maintenance as an ongoing operational responsibility, with clear ownership and KPIs, is essential to the long-term viability of the governance model.

What the Textbooks Do Not Tell You

Having worked through the full cycle of data mesh implementation in a legacy banking environment, there are a small number of observations that I would offer to any architecture or data leader facing the same challenge that I have not seen adequately addressed in the published literature.

Start with regulatory data, not business intelligence. The instinct in most data programmes is to demonstrate value quickly through analytics and reporting use cases — dashboards, self-service BI, management information. In a regulated bank, the more powerful starting point is regulatory data: the data flows that feed risk reports, the BCBS 239 data lineage requirements, the DORA data inventory. These use cases have mandatory regulatory deadlines, executive visibility, and documented requirements — all of which provide the project sponsorship and clarity of purpose that data mesh initiatives frequently struggle to maintain.

The interfaces you cannot account for are the most important ones to find. In every interface discovery exercise, there is a moment when the team believes it has found everything — and then another category of undocumented connection surfaces. The connections most resistant to discovery are typically the ones with the highest business criticality: the feed that produces the overnight risk report, the extract that populates the regulatory return, the batch job that nobody knows how to restart when it fails. Investing in discovery until you are genuinely confident the estate is complete — not just until you have found most of it — is worth the additional effort.

Domain ownership must be non-negotiable and non-reversible. The most common failure mode in data mesh initiatives is domain ownership that erodes over time. Domain teams accept the responsibility at programme inception, engage actively during design and build, and then gradually revert to their previous behaviours as competing priorities emerge. Preventing this requires that data product ownership is embedded in role descriptions, performance objectives and team operating models — not treated as a voluntary contribution to a central programme.

The mesh is never finished. A data mesh is not a target architecture that you build and then operate. It is a capability that you establish and then evolve continuously, as the business changes, as new data sources emerge, as regulatory requirements shift, and as domain teams develop the skills and confidence to build more sophisticated data products. Treating the initial implementation as the deliverable, rather than the capability to evolve the architecture, is the single most common cause of data mesh programmes that deliver less value than they promised.

Questions for Architecture and Data Leaders

Do you have a complete, current inventory of your interface estate — including interfaces discovered through network analysis and institutional memory, not just those in the formal integration register?
Have you completed a specific deep-discovery exercise for your ERP/SAP estate, with dedicated SAP architecture expertise rather than treating it as a standard integration domain?
Have you identified domain owners for each proposed data mesh domain, secured their explicit commitment to data product responsibilities, and embedded that ownership in their role accountabilities?
Is your data governance platform — whether Collibra or an equivalent — populated and maintained as an ongoing operational responsibility, with clear KPIs and ownership, rather than as a project deliverable?
Does your data mesh design satisfy the BCBS 239 lineage documentation requirements for risk data — and have you aligned the interface discovery programme with your BCBS 239 compliance obligations to avoid duplicating effort?
Are your federated governance standards for data products enforced at the platform level — making compliance a consequence of using the platform — or do they rely on domain teams voluntarily following documentation?
Have you secured the senior sponsorship and organisational mandate necessary to resolve domain boundary disputes at the level at which they actually need to be resolved?

Conclusion: The Mesh Is a Destination, Not a Shortcut

Data mesh is a genuinely powerful architectural concept for large, complex organisations that need to scale their data capability beyond what centralised approaches can sustain. In a regulated bank, it is also an approach that is well-aligned with the direction of regulatory travel — toward documented lineage, explicit data ownership, and governed data sharing rather than informal data proliferation.

What it is not is a shortcut. The legacy interface estate does not disappear because you have adopted a new architectural paradigm. The organisational dynamics that caused data ownership to be unclear do not resolve themselves because you have drawn domain boundaries on a diagram. The regulatory requirements for data quality and lineage do not become easier to satisfy because your data platform has a new name. The work of discovery, documentation, organisational alignment and platform governance must be done, seriously and completely, before the benefits of the mesh model can be realised.

The institutions that invest in that foundational work — that treat the interface discovery programme as the critical path it is, that resolve the domain ownership questions at the right organisational level, that build and maintain a governed data catalogue as a living operational asset — will find that data mesh delivers on its promise. Those that treat it as a technology project, selecting a platform and migrating workloads without addressing the underlying architecture and governance questions, will find that they have built a mesh in name while preserving all the problems of the centralised data warehouse they were trying to leave behind.

Written by Peter Pitkin · Senior IT Consultant & Enterprise Architect · Germany
Views reflect practical experience of data mesh architecture and implementation within a BaFin-regulated German financial institution.

From a Thousand Undocumented Interfaces to a Data Mesh: A Practitioner's Account