Dive-In Microservices #2 – Patterns – Service discovery

With monolithic applications, services invoke one another through language level methods or procedure calls. This was relatively straightforward and predictable behavior. As application complexity increased we realized that monolithic applications were not suitable for the scale and demand of modern software, so we moved towards SOA or service-oriented architecture. The monoliths were broken into smaller chunks that typically served a particular purpose. But SOA brought its own caveats in the picture with inter-service calls, SOA services ran at well-known fixed locations, which resulted in static location of services, IP addresses, reconfiguration issues with deployments to name a few.

Microservices are easy; building microservice systems is hard

With microservices all this changes, the application typically runs in a virtualized or containerized environment where the number of instances of a service and their locations can change dynamically, minute by minute. This gives us the ability to scale our application depending on the forces dynamically applied to it, but this flexibility does not come without its own share of problems. One of the main ones knows where your services are to contact them. Without the right patterns, it can almost be impossible, and one of the first ones you will most likely stumble upon even before you get your service out into production is service discovery.

With Service discovery, the services register with the dynamic service registry upon startup, and in addition to the IP address and port they are running on, will also often provide metadata, like service version or other environmental parameters that can be used by a client when querying the registry. Some of the popular examples of service registry are Consul, Etcd. These systems are highly scalable and have strongly consistent methods for storing the location of your services. In addition to this, the consul has the capability to perform health checks on the service to ensure its availability. If the service fails a health check then it is marked as unavailable in the registry and will not be returned by any queries.

There are two main patterns for service discovery,

Server-side service discovery

Server-side service discovery is a microservice antipattern for inter-service calls within the same application. This is the method we used to call services in an SOA environment. Typically, there will be a reverse proxy which acts as a gateway to your services. It contacts the dynamic service registry and forwards your request on to the backend services. The client would access the backend services, implementing a known URI using either a subdomain or a path as a differentiator.

Server side discovery eventually runs into some well known issues, one of them being reverse proxy bottleneck. The backend services can be scaled quickly enough but it requires monitoring. It also introduces latency causing increase in cost for running and maintaining the application.

Server side discovery also potentially increases the failure patterns with downstream calls, internal services and external services. With server side discovery you also need to have a centralize failure logic on server side, which abstracts most of API knowledge from client, handles failures internally, keeps retrying internally and keeping client completely distant till its a success or catastrophic failure.

Client-side service discovery

While server-side service discovery might be an acceptable choice for your public APIs for any internal inter-service communication, I prefer the client-side pattern. This gives you greater control over what happens when a failure occurs. You can implement the business logic on a retry of a failure on a case-by-case basis, and this will also protect you against cascading failure.

This pattern is similar to its server-side partner. However, the client is responsible for the service discovery and load balancing. You still hook into a dynamic service registry to get the information for the services you are going to call. This logic is localized in each client, so it is possible to handle the failure logic on a case-by-case basis.

Dive-In Microservices #1 – Patterns – Event Processing

Event processing is a model which allows you to decouple your micro services by using a message queue. Rather than connect directly to a service which may or may not be at a known location, you broadcast and listen to events which exist on a queue, such as Redis, Amazon SQS, RabbitMQ, Apache Kafka, and a whole host of other sources.

The message queue is a highly distributed and scalable system, and it should be capable of processing millions of messages so we do not need to worry about it not being available. At the other end of the queue, there will be a worker who is listening for new messages pertaining to it. When it receives such a message, it processes the message and then removes it from the queue.

Due to the async nature of the event processing pattern there needs to be a requirement to handle failures in a programmable way,

Event processing with at least once delivery

One of first and basic sync mechanism is to request for delivery, we add the message to the queue and then wait for an ACK from the queue to let us know that the message has been received. Of course, we would not know if the message has been delivered but receiving the ACK should be enough for us to notify the user and proceed. There is always the possibility that the receiving service cannot process the message which could be due to a direct failure or bug in the receiving service or it could be that the message which was added to the queue is not in a format which can be read by the receiving service. We need to deal with both of these issues independently, with handling errors discussed next.

Handling Errors

It is not uncommon for things to go wrong with distributed systems and is the essential factor in micro-service based software design. As per above scenario, if a valid message can not be processed one standard approach is to retry processing the message, normally with a delay. It is important to append the error every time we fail to process a message as it gives us the history of what went wrong, it also provides us with the capability to understand how many times we have tried to process the message because after we exceed this threshold we do not want to continue to retry we need to move this message to a second queue or a dead letter que which we will discuss next.

Debugging the failures with Dead Letter Queue

It is most common practice to remove the message from queue once it is processed. The purpose of the dead letter queue is so that we can examine the failed messages on this queue to assist us with debugging the system. Since we can append the error details to the message body, we know what the error is and we know where the history lies if we need it.

Working with idempotent transactions

While many message queues nowadays offer At Most Once Delivery in addition to the At Least Once, the latter option is still the best for large throughput of messages. To deal with the fact that the receiving service may receive a message twice it needs to be able to handle this in its own logic. One of the common methods for ensuring that the message is not processed twice is to log the message ID in a transactions table. If the message has already been processed and if it will be disposed.

Working with the ordering of messages

One of the common issue while handling failures with retry is receiving a message out of sequence or in an incorrect order, which will end up with inconsistent data in the database. One potential way to avoid this issue is to again leverage the transaction table and to store the message dispatch_date in addition to the id. When the receiving service receives a message then it can not only check if the current message has been processed it can check that it is the most recent message and if not discard it.

Working with atomic transactions

This is the common issue found when moving the legacy systems to micro-services. While storing data, a database can be atomic: that is, all operations occur or none do. Distributed transactions do not give us the same kind of transaction that is found in a database. When part of a database transaction fails, we can roll back the other parts of the transaction. By using this pattern we would only remove the message from the queue if the process succeeded so when something fails, we keep retrying. This gives us a kind of eventually consistent transaction.

Unfortunately, there is no one solution fits all with messaging we need to tailor the solution which matches the operating conditions of the service.

Setting up the development environment for Kata Containers – proxy

Summarizing the information for setting up the development environment for my first project in Kata-Containers. I have setted up the dev environment for proxy project.

First Things FirstInstall Golang as a prerequisite to the development. Ensure you follow the complete steps to create the required directory structure and test the installation.

Get The Source

This guide assumes you already have forked the proxy project. If not please for the repo. Once you have successfully forked the repo, clone it on your computer

git clone https://github.com/<your-username>/proxy.git $GOPATH/src/github.com/<your-username>/proxy

Add the upstream proxy project as remote to the local clone to fetch up the updates.

$ cd proxy
$ git remote add upstream https://github.com/kata-containers/proxy.git

The proxy project requires following dependencies to be installed prior to build. Use following command to install them.

$ go get github.com/hashicorp/yamux
$ go get github.com/sirupsen/logrus

Do the first build. This will create the executable file kata-proxy in the proxy directory.

$ make
go build -o kata-proxy -ldflags “-X main.version=0.0.1-02a5863f1165b1ee474b41151189c2e1b66f1c40”

To run unit tests run

$ make test
go test -v -race -coverprofile=coverage.txt -covermode=atomic
=== RUN TestUnixAddrParsing
— PASS: TestUnixAddrParsing (0.00s)
=== RUN TestProxy
— PASS: TestProxy (0.05s)
PASS
coverage: 44.6% of statements
ok github.com/coolsvap/proxy 1.064s

To remove all generated output files run

$ make clean
rm -f kata-proxy

This is for this time. I am working on setting up the development environment with GolangD IDE. Keep you posted.

Event Report – Expert Talks 2017

Expert Talks 2017 was my first participation in the Expert Talks conference held in Pune. The conference started a couple of years before as an elevated form of Expert Talks Meetup series by Equal Experts, this year’s conference had a very good mix of content. It included talks on a variety of topics including BlockChain, Containers, IoT, Security to name a few. This is the first edition of the conference which had a formal CFP which witnessed 50+ submissions from different parts of the country and 9 talks were selected out of it. This year the conference was held at Novotel Hotel Pune.

 

The conference started with registration desk which was well organized for everyone registered to pick up their kit. Even for a conference scheduled on a Saturday, the attendance was quite noticeable. The event started with a welcome speech to all participants and speakers.

 

The first session delivered by Dr. Pandurang Kamat on demystifying blockchain was a very good start to the event with much anticipated and buzzed topic at the moment. He covered the ecosystem around blockchain with precise detail for everyone to understand the example of most popular blockchain application “BitCoin”. He also gave the overview of Open Source Frameworks like Project Hyperledger for blockchain implementations.

 

The following session Doveryai, no proveryai – an introduction to TLA+ delivered by Sandeep Joshi was well received by the audience as the topic was pretty unique in terms of the name as well as content. The session started a bit slowly with the audience getting the details of TLA+ and PlusCal. This was well scoped with some basic details and a hands-on demo. The model checker use case was well received after looking at the real world applications and we had the first coffee break of the day after it.

 

Mr. Lalit Bhatt started well with his session about Data Science – An Engineering Implementation Perspective which discussed the mathematical models used for building the real world data science applications and explained the current use-cases he has in the organization.

 

Swapnil Dubey and Sunil Manikani from Shlumberger gave good insight into their microservice strategy with containers with building blocks like Kubernetes, Docker and GKE. They also presented how they are using GCE capabilities to effectively reduce the operational expenses.

 

Alicja Gilderdale from Equal Experts presented some history about container technologies and how they validated different container technologies for one of their projects. She also provided some of the insights into the challenges and lessons learned throughout their journey. The end of this session gave thunder to the participants with the lunch break.

 

Neha Datt, from Equal Experts, showcased the importance of Product Owner in the overall business cycle in the current changing infrastructure world. She provided some critical thinking points to bridge the gap between business, architecture and development team and also how product manager can be the glue between them.

Piyush Verma, took the Data Science – An Engineering Implementation Perspective discussion forward with his thoughts about Distributed Data Processing. He showcased typical architectures and deployments in distributed data processing by splitting the system into layers; defining the relevance, need, & behavior of each. One of the core attraction points of the session was the drawn diagrams incorporated in his presentation which he did as a part of the homework for the same.

 

After the second official coffee break of the day, Akash Mahajan enlightened everyone with the most crucial requirement in the currently distributed workloads living on the public clouds, the security. He walked everyone with different requirements for managing secrets with a HashiCorp Vault example while explained the advantages & caveats of the approach.

 

The IoT, Smart Cities, and Digital Manufacturing discussion were well placed with providing application of most of the concepts learned throughout the day to the real world problems. Subodh Gajare provided details on the IoT architecture, its foundation with requirements related to Mobility, Analytics, Big data, Cloud and Security. He provided very useful insights into the upcoming protocol advances and the usage of Fog, Edge computing in the Smart City application of IoT.

It was a day well spent with some known faces and an opportunity to connect with many enthusiastic IT professionals in Pune.