By Chris Judd, Enterprise Integration Architect at Devoteam

The use of the words ‘common data model’ seems to elicit a response of ire from those project and programme managers that hear it. It is often associated with time-consuming and ultimately failed attempts to categorise an organisation or aims into a model which is inaccurate and ends up becoming a straitjacket.

This ‘jacket’ could require constant workarounds where ‘fudging’ a solution becomes so systemic that the model is all but useless, making future attempts at modelling all the harder to justify.

There are of course those examples where a model does do its intended job but it is quicker and easier just to write a request and response model by hand service by service. After all we have a programme to deliver, right?

Well, maybe not and this article will cover some thoughts on why.

Microservices and Agile Methods

Plenty has been written about microservices and positioned as the latest trend allowing business to break down monolithic, expensive, legacy applications into small packets of functionality that can be exposed through management platforms (usually called API-first principles). These are typically delivered by modern agile methodologies which allow several teams to produce dozens of these microservices quickly.

What can be missing, however, (based on a reluctance to undertake any form of data modelling) is a suitable model that ties all these services together consistently; without this the services can descend into a mess of mismatched types, missed critical information or they need adapter services to plug holes, thus reducing the understandability, usability, efficiency and effectiveness of that service.

In short, a good data model plays a key role in digital transformation and is a central building block on which your journey is based.

Why people don’t model

Before we dive into why, let us start with the typical why-nots (see how many you have heard before) and provide some counterarguments.

Boil the ocean argument. We have all heard some version of this: ‘You’re going to spend ages developing a huge model and we only need a small fraction of it for this phase’. It is true that badly run modelling efforts can produce huge unwieldy structures. However, accepting that also means it’s possible to produce a good data model that is focused on the delivery ahead (with a roadmap to complete the model) and delivers value when needed (see Roadmap section later on).
We can standardise it later, right now we just need to deliver. Yes, you can but it is probably a bad idea. Integration and microservices live and die on how well those interfaces are defined. If it is done poorly the service will not be used (or redeveloped quickly if critical to the business). Even if you did come back to ‘standardise’ the interface later the knock-on effect to clients and applications should be factored in and the cost versus benefit could simply be too great. We have all seen time-cost graphs like the one below in real life.

This means it’s often cheaper, and easier, to ensure the foundations for your integration are laid down with even a small amount of consideration for the future, first.

How does this fit in an ‘agile’ company, seems very waterfall driven? OK, this is a good one. It is used as an excuse to shorten delivery times by missing certain steps. Agile methodologies provide a business with the ability to iterate quickly and deliver business benefit in smaller but faster chunks. Arguing you are an agile company does not mean you will be successful despite missing such steps including the need for sprint-by-sprint proper analysis to lock down what is being attempted and the data used. The creation of an agile data model that can expand and be delivered alongside or ahead of each sprint is possible and provides a stable basis on which interfaces can be defined while providing consistency. A final point is that a successful agile project requires a target state to be defined and understood per sprint. By developing a common data model ahead of each sprint helps provide the target state to those services being delivered within that sprint.
But it does not align with a microservices / API-first mentality? Even microservices and the API that exposes them need to have clear interfaces that align with needs. Without a model that aligns with how the business operates and wants to present itself to clients (externally and internally), how can it be sure at design time it will meet the aims set for these services? Does that implementation actually agree with how your organisation sees a product, or worse, a customer?

We have all heard some version of at least one of these and the concerns are not entirely without merit, but with proper management and expectation-setting it is entirely possible to deliver a lean, focused common model that standardises service interfaces, enhances business understanding and delivers value.

However, let us assume this is not sufficient, that there are still concerns and we need to spell out what value a data model actually brings. I would suggest these to start.

Why we should model

Modelling provides a consistent language for the entire business, free from technological constraints. Data modelling as part of the analysis phase (or even reviewing the business requirements) means that common business terms (including what they mean) can be defined and agreed upon early, then used consistently across an entire programme (or even the enterprise). This is especially useful in typical integration work where different systems and data sources need to work together despite being developed in different technologies, data sets and even at different times with older technological drivers. Remember that integration services (micro or otherwise) can fill gaps by obtaining or deriving data as needed. There is no excuse for a legacy system with an existing model not to be compatible with newly evolved business needs. What you’re aiming for is a model that is well defined, but loosely coupled.
Modelling provides an opportunity for the business to restructure how it sees itself. Completing a data model solidifies the direction in which the business would like to go and provides a tangible artefact as active demonstration of goals. An example might be the business decides that a key aim is to engage its customer base more with the brand. A data model that links customers with marketing and satisfaction metrics then provide that as a goal to the integration platform, helps shape how services are defined, used and thought about. This in turns drives new data through the business allowing better understanding of those customers and promoting better marketing and eventually an upturn in brand engagement. The data model, based on programme goals and aims, can drive those results if implemented correctly.
A data model can be split and used as a basis for an API- first design. A well-considered model enables a business to compete in an API-first economy. Each integration service and data source system will at its core only be dealing with a subsection of business needs. This may be customer profile data (names, addresses, etc.), billing (usage, last invoices) or similar. These parts of the model can be split off and offered to clients (external or internal) without having to provide an entire taxonomy, thus enabling users of APIs to take the smaller model, interact with it using terms the entire business understands and stay focused on a specific task, free from the underlying technology that enables that function. A counterexample would be to look at your favourite credit check service provider, look at some of the WSDLs they provide then tell me this is not linked to the technology rather than the goal and is easy to understand as a result.
A good data model enhances business intelligence. If you allow garbage to enter into the system then the BI tool is going to have to work harder to find value. One of the major problems with integrating across systems is that each typically requires a different minimum dataset. All too commonly one system allows updates with one set of data (for example, just the customerID) while a second system requires a complete data object thus creating different levels of detail and impacting data quality. What can compound this further is when this dataset is shared between the two data stores, allowing one to remove detail from the other by virtue of having an older data model. By introducing a façade service with a clear consistent model allows the business to define what a minimum dataset looks like, thus mandating a minimum data quality for information flowing between each part of the solution.

Agile Common Data Modelling

Accepting some or all of the points above, we can agree that a common data model will provide real value to a digital transformation programme (small or large) and the concerns previously mentioned are mitigated successfully.

How can we get started on the right foot?

Let us look briefly at an example of how a data model might be defined, built and made available within a commonly used agile pattern and its subsequent wider use.

The figure below shows very simply the core phases within a single agile iteration where the main steps highlighted are Define, Build and Release.

The same concept can be applied to the development of a data model where we would build an agile cycle to feed into subsequent developments with the listed activities below in each phase.

Define

Identify what is coming up in future sprints and functionality they will likely need.
Agree upon the main entities to support the sprint and how much has to be built on top of the already defined model.
Map these entities against the roadmap (see section later on) to ensure we are not designing into a dead end.
Identify the key stakeholders, data sources so the final model is representative of the final need.

Build

Build the model with attributes as required using best practice for things like naming standards (camelCasing) and XML conventions (complexTypes with element references)
Try out the new model frequently, fill it with data and ensure it can meet the requirements. Does it allow a customer to have multiple addresses, or several satisfaction scores?

Release

Provide the model to the developers (typically via a repository / Artifactory location) as they are defining their interfaces and before they start building as an immutable object (you don’t want them changing the model for you).
Listen to feedback and allow them to guide you to useful changes they see as required. You’re not going to get it right the first time but release it early enough in their cycle and you can collate and refactor as needed.

Now, these modelling cycles should not take too long and often happen in parallel as there will be views on what functionality needs to be delivered by when. An example organisation could be:

The Roadmap / Accelerators

A couple of final points to consider briefly are roadmaps and accelerators. All are useful but should be treated with care as:

A roadmap for the data model is one where a high-level entity diagram is created based on business requirement specifications or other high-level documents. This diagram, with no detail, would describe some of the high-level concepts and how they relate. For example, customer to address, product to package, service to subscription, etc.
This near-logical model (even though not set in stone) means that designers can ensure the physical data model aligns with goals set out by the business with best practice in mind (a primer on physical versus logical models can be found here). This should help avoid design of a solution that has dead ends (i.e., one where it cannot be extended any further and thus harder to support evolving business needs without workarounds). Using this roadmap, it is possible to understand the ‘size of the prize’ and schedule work / estimate effort based on each area being delivered.

Predefined vertical-based data models (sometimes considered an accelerator or industry model) help an organisation by providing a starting point on which they can develop their own solutions. A great example comes in the guise of the SID model / Frameworx which is owned and curated by the TMF forum and is used extensively by the telecommunications industry. The key point here is that these accelerators should be considered a starting point and while they may capture some of what you require, it is likely they would have over-complicated a simple concept within your business. This makes using it harder than it need be. If your intention is to use one of these accelerators, the first thing you should look to complete is to de-normalise the model; make it simpler to ensure it fits your needs. From there you can map some of the concepts found in business requirement specifications and overlay these onto the simplified model to start personalising for your own needs. The takeaway here is, don’t assume an accelerator does not require review against business concepts and aims; it is just a starting point and some effort will be required to make it useful.

Devoteam’s Integration Strategy Proposition

Much of what we have discussed here is part of Devoteam’s integration strategy proposition, where we can not only guide your organisation through a process that would cover how to produce a good common data model but can also run a collection of workshops, provide pointers and set your digital transformation modelling efforts off on the right foot, including how you should control and curate your data model to keep it active and useful for many years to come.

Final Thoughts

In this article we have touched upon a wide range of topics from why common data modelling is seen as a bad idea to why it is central to a successful programme of change. Along the way we have covered (briefly):

The impact of microservices on modelling
How agile methods do not preclude data modelling
Pitching data modelling to those who don’t see the value
How agile data modelling can be achieved
What to look out for in starting points

For those who want to ask questions or reach out for advice, Devoteam UK would be more than happy to provide our expertise. Please contact me at chris.judd@devoteam.com or at uk.info@devoteam.com.

About the Author

Chris Judd, Enterprise Integration Architect at Devoteam, has over a decade of experience in Integration, data modelling and master data management. When he is not doing that he is a busy father to twin daughters, a climber, former BASE jumper, current skydiver and wingsuit pilot.