Talk: Data on the Inside, Data on the Outside

When we move from a monolith to microservices we abandon integrating via a shared database, as each service must own its own data to allow them it to be autonomous. But now we have a new problem, our data is distributed. What happens if I need one service needs to talk to another about a shared concept such as a product, a hotel room, or an order? Does every service need to have a list of all our users? Who knows what users have permissions to the entities within the micro service? What happens if my REST endpoint needs to include data from a graph that includes other services to make it responsive? And I am not breaking the boundary of my service when all of this data leaves my service boundary in response to a request?

Naive solutions result in chatty calls as each service engages with multiple other services to fulfil a request, or in large message payloads as services add all the data required to process a message to each message. Neither scale well.

In 2005, Pat Helland wrote a paper ‘Data on the Inside vs. Data on the Outside’ which answers the question by distinguishing between data a service owns and reference data that it can use. In this presentation we will explain reference data, how it is classified, and how it should be implemented. We will include a discussion of using reference data from ATOM feeds, discrete messaging and event streams. We’ll provide examples in C#, Python and Go as well as using RMQ and Kafka.