Straight from the trenches: Engineering experiences with Microservices Architecture

Andrew Harmel-Law interviewed by Maurice Driessen

Earlier this month, while working on a proposal which turned out successful, I got the opportunity to have a discussion with Andrew Harmel-Law, around his personal experience with constructing a system based on a Microservices Architecture leveraging a evolutionary approach. It made sense to us to share this experience with all of you software engineers. Focal point of the discussion was how the non-functional characteristics of a Microservices Architecture contribute to the business continuity and agility of his client and helps software engineers to sustain a high quality and complex solution while improving the collaboration with the business.

Could you tell me something about your client and how your collaboration with the client evolved?
My client is large mail & parcel fulfillment provider in the UK. We build and run an array of their services for them. Initially started with eBussiness, the public facing websites and micro sites. We used to interface into legacy backend systems, provided by others, not the client and not us. We recently won contracts to take over a good proportion of the legacy backend systems as well. They have an integration gateway, sitting in front of lots of backend systems, ranging from old mainframes to .NET based smaller solutions, the typical standard estate of large and small business critical system one can expect in any large organization grown over time, with all kinds of integration patterns, from point to point to ESB.

Within the program, individual projects driven by the client are run on individual business cases. Because there has not been an overall integration program and as a result how projects are funded and the individual scope of these projects, we are again and again requiring an integration between micro site A, with backend systems B & C,  we are currently leveraging a microservices based architecture to address this challenge. We found the reuse of the microservices between front office and back ends as we went forward project by project, which is typical for the agile approach used to execute the projects. The evolution of the microservices is the current vision how to address the integration challenge, as opposed to almost a decade ago were we would do a large scale SOA analysis, where we would try to identify services and agree how we would produce & consume them. With the current microservices approach we see the benefit of the microservice evolving and we see the benefit of an individual microservice being added to the mix later on. Applying a microservices architecture enabled us to develop a very large and complex system in a more modular and flexible approach.

Although the client are driven by projects, which are not very agile, even within the scope of a fixed price project we do run with our the client, we are required to accommodate for last minute changes due to changing business needs required to run their business, the cost of those changes, whether they are early or later in the project, are constant. While in a traditionally architected project, where you will incur technical debt, the cost of these changes tend to increase at the end or after the project.

From a business perspective the evolution of the microservices has created a transparency which helped us to make the business understand the complexity any request for change to the evolving microservices ecosystem. This is because the business could grasp how the microservices map to their business, (c.f. Conway’s Law) and as a result the business could also grasp and understand how a change to business would be a more or less complex change to the microservices. This the complete opposite of the old days with the traditional monolith approach, where we as software engineers could only tell them about this “blob” of integration code & change challenge, which we could not make them understand and the business was required to trust us.  As a result the microservices architecture contributed to improving the business agility of our client and improving the collaboration with the business to accommodate change and involving the business in the evolution of their system architecture.

What can you share with regard to the maintainability of your microservices solution?
The microservice architecture created a very maintainable code base because for each microservice a separate relatively small code base is created, this code base is put in a separate repository, the name of the microservice makes sense to developers and business, and this code is deployed and maintained as a physically separated executable component. The typical kind of technical debt we see in a microservice, is a piece of business logic which somehow got into the service but which should be in another service, due to responsibilities of each service. Because of the small code base, these kind of issues stand out and are identified easily.

In the old monolithic approach obviously, we would also create the same modular code, but we would wrap this in a single executable with a relatively simple architecture, but in the end very complex codebase due to the size of this monolith. If one would by honest mistake create a class in a wrong location within the architecture and this would not get noticed and others would build upon this, in the end the quality of your code will just drop and eventually become un-maintainable.

A microservices architecture also tends to have an evolutionary lifecycle. Because each microservice makes sense to a product owner and software engineers, they can have decent and meaningful conversations around them. If they agree a microservice needs to be split up, the development team just go ahead and do it and don’t over analyze the decision. After all, the product owner and software engineers are ultimately responsible for them. This a result of the fact the microservices architecture is very bare, open and explicit. It is because a microservice is an individual component, which exposes a real and explicit interface, which is documented and which may be consumed by a component developed by the guy sitting next to you. All of this however requires software engineering discipline and craftsmanship, because if you don’t do, it it won’t work. The design and engineering has to happen, because it can’t happen by accident.

Suppose we would have created two versions of a microservice, we would have done this explicitly for good reasons. Then again, if we decide to retire an old version, because the version is  deprecated, we can do so explicitly and will get rid of the old version of the code as a whole. If we compare this with removing code from a monolith application, removal from a monolith comes at a cost and would be neglected. As a result the code base would just be growing all the time, because nobody removed unused code and over time the solution’s maintainability would decrease.

Because in a microservices architecture your deployment units of are a lot smaller, you are most likely to fix things faster in case there is for example a security issue with a library used by the microservices in the solution. You can pull down all these microservices, make the changes required to fix the issue, do a build and a test of all microservices. If for example 7 out of 8 microservices pass the tests, you could deploy the 7 fixed and patched microservices back into production. If the remaining issue with the 8th microservice requires more time to fix, than that is regrettable. But key message here is 7/8th of your system is already up and running. Compared to a traditional monolith that is a great improvement. Because with the monolith you would be required to fix and test everything, before you can make your solution available again.

In a microservices’ based solution any discussion on removing an old version of a microservice from the solution environment will require explicit conversations and decisions.  The migrating from old version of a microservice to an new version will impact depend service consumers. This needs to be planned and organized explicitly. But you are removing a complete moving part (the old version of the microservice) from your solution environment as a whole and replacing it with another moving part, the new version.  You don’t need to remove any unused pieces of code from a monolith, you just stop deploying the old version and it is gone.

How does this contribute to the business continuity of your client?
Well, we engineered our microservices to be stateless and our persistence tier (where our cross-request data resides) is a combination of MongoDB, MySQL and Redis. This enables us to scale up or down elastically, deploy new microservices, retire old microservices, without outage. We do service outages, because our client has gating processes and a way to release software into production, but we have build our solution in the same way as Netflix or other companies do. We could deploy new version of microservices and then use load balancers to redirect a fraction of the network traffic to the new ones and see if it works. And if they are happy, we could direct some more traffic or if it is bad and it’s blow up, we could redirect traffic back to the old versions. We have used this technique in our test environments giving zero-downtime upgrades.

How do you actually deploy?
We are looking at being more mature. Currently we are packaging the microservices as jar files, which are executable. They contain Netty, which is our http listener and Camel to do the pipeline processing. Http client and Hystrix are used to call down-stream services or MongoDB or Redis. These jar files are promoted trough our testing environments, levering Puppet and Capistrano to automate our deploys and Jenkins and JMeter to run our smoke test.

We are considering moving to using Docker, because Docker is good in development as well as in production. Docker provides process isolation – “bulkheading”. But Docker would also provide us with the opportunity to deploy in a cloud based or a PaaS platform. This can be done without code changes and limited changes to our provisioning scripts and would enable us to move to any private, hybrid or public cloud environment. These capability is inherent in the architecture.

What is your experience with regard to performance?
We do need to spec everything for the Christmas season, because around Christmas people tend to sent lots of things and that is what our client takes care of. Obviously around that time our systems get the highest load. Take for example the service which provides stamps. A user can sent in a request of up to 200 stamps and that request will turn into 10 calls to a set of sub-microservices, which are 2000 transactions.  For stamp requests we can handle around 10 to 15 a second, resulting in 20 to 30 thousand transactions a second overall within this solution. This is not massively high scale, but what we have proven is that we can scale horizontally almost linearly because we do not require any co-ordination between the microservice instances. Our scaling bottleneck is actually our 5 node MongoDB cluster. Because providing stamps is basically like printing money, obviously we must keep track of what we sell to the client’s customers. The synchronized write across the 5 node MongoDB cluster, guaranteed to have written to disk on at least 3 nodes, is our limiting step with regard to our transactional capacity.

For everything we have constructed in our microservices landscape is very strongly non- functionally tested, specifically for throughput and scalability. We spend a lot of time tuning timeouts, tuning thread pool sizes, tuning connection pool sizes. We spend considerable effort putting logging and monitoring in place, so we can see what is actually going on in the system. Sometimes you want things to fail and fail fast. As a result the configuration is set on how we perceive the demand for our microservices, making sure we do fail at the right point of actual demand and in our case that is just because some of the backend systems, our solutions is depend on, are really slow and have limited capacity. So in case of unforeseen issues we do need to set timeout lower and we have to make sure there are any knockout effects. Again, the good engineering practices that Capgemini is good at, are again brought right to the front. You can get very high throughputs with a microservices architecture, but you need to check. You won’t get it for free. You need to make sure all your configuration settings are setup correctly. As long as you stay in control of your configuration, you can basically scale linear, however knowing the bottlenecks in your system environment is key.

In lots of ways with microservices architecture we are doing SOA, an architecture Capgemini talked a lot about in the early 2000s, but which was at that time SOAP based. Nowadays with microservices architecture we are evolving a REST based SOA architecture at a very high level.

What about the choreography of all these individual microservices? What does it take to make these individual microservices act a fully fledged enterprise level computing system?
In our solution we ended up with choreography services. Take the example the case of making a stamp. To make a stamp you need to have a tracking number, which are pre-allocated. You need to reserve a tracking number, get the tracking number, use that tracking number to make the stamp. When you have created the stamp with the tracking number, you need to mark the tracking number as used. Then finally you need to make a barcode image of the tracking number and make that into the label. So that is our business process which business people do understand. Our solution has services which sit at that level and marshal the top level request, which we protect from the choreography services, with what we call adapters but are basically microservice session facades.

We do expose our microservices to various consumers, which get a distinct functional and non-functional flavors of our services. Consumers like for example Amazon and Ebay and the client’s own website. These consumers do their own decoration and adaptation to the request, but in the end when they need stamps they call our choreography class with the actual request for stamps. Then the choreography class calls down to the various resource classes; the resource microservices which make a tracking number, give a barcode, ect. These are all responsible for timing out effectively, tiding things up when things go wrong, making sure no mess is left behind.

Have you seen any patterns evolve in your solution?
In our solution we see the following reoccurring pattern. We have a single microservice sitting in front of each data store, which for example passes out tracking numbers. We also typically have a management microservice for a data store to setup items in the data store, like for example setting up new tracking numbers, which is deployed separately, because there is no need to massively scale out  this kind of service, a few will just do fine.  Typically there is also a reporting microservice, if there is a resource which needs reporting on. Again we have that as a separate service. So most logical resources in a system, like for example the resource tracking numbers , have these 3 types of microservices allocated  to them. But you only need to scale the microservices which handle to public available request, in our example the “pass out a tracking number” microservice. The other two types will not be hit with a heavy load, they just need to be available when there is a demand for their service. This approach enables us to scale our system at a more granular level, at the level of microservices, compared to the tradition monolith systems, which in turn enables us to more effectively and efficiently leverage the computing resources available and elastically scale the services for a given load profile.

Any last advice for our  software engineering community members?
The biggest thing about the microservices architecture to remember is, people had to remember they are software engineers and they don’t get a set of readymade, given practices to leverage on a plate. You do need to be aware of good engineering practices when constructing a microservices based solution, which makes a good case for Capgemini, because Capgemini is known in the market for the expertise of their engineers. But the core around microservices is fun; our engineers like creating these kind of architectures.

_________________________________

If you want more information on our experiences with microservices, check out the engineering blog at http://capgemini.github.io/categories/index.html#architecture. A reading list Andrew suggested on this topic is shared on http://bit.ly/MicroservicesArchitecture.