Digital payments impacting everyday life – What must happen to keep them reliable?

Part 2

In Part 1 we had discussed about resiliency as the underlying principle that drives the design of mission-critical payment systems.

In this article we would like to discuss performance as one of the essential cornerstones for such systems.

Performance is closely associated with service expectations. This is not limited to the immediate consumer of the service, but usually goes all the way up to the edge where the actual users interact with the system.

When a card holder swipes a card of a particular scheme and it does not work for her, it is immaterial whether the failure is at the periphery or at the center or anywhere in between. The cardholder will immediately try with a card of another issuer bank or another scheme or both. This non-performance hurts the business of multiple players in the chain.

The expectations are also not limited to the needs of the system in the immediate and near-future, but what if the business scales up rapidly?

How to deal with such non-performance?

It would require a set of approaches to deal with such a situation. Some of the key ones are listed below:

  • Preventive Measures: The service provider verifies the interface points with the immediate neighbors of the service and certifies the interface. This is part of the onboarding process which captures other details about the service subscriber mostly about the quality of their business. This is usually part of the business process.
  • Fault Tolerant Design: The system should not have a single point of failure, so that alternative paths and resources can be provisioned in case of a fault in one of the components. For example, in an active-active infrastructure deployment configuration, if one site becomes non-responding, the load-balancers immediately shift the traffic to the other site. This is part of the deployment architecture.
  • Intelligent Monitoring: All components of the system need to be monitored and appropriate alerts and severity levels need to be notified in time. To provide maximum time to react, the monitoring needs to be intelligent – i.e., AI-aided profiling used in modern systems can detect deviations early through machine learning and correlation of events.

This is the reason why mission-critical payment systems are built such that if the expected maximum load (like transactions per second, or concurrent users, or dynamic payload per transaction) is LMAX units, the systems are designed to withstand 2*LMAX or more load.

Now comes the interesting situation where there is success and the load on the system increases rapidly. This is called the scalability requirement of the payment system.

Will the payment system scale up with the increase in business or would scalability become the proverbial Achilles Heels of the business?

A typical approach to handle this is to architect the payment system to be scalable horizontally, where adding similar processing servers scales the system linearly, and to be scalable vertically, when the processing resources of existing servers are augmented, and the system scales up. However, this is easier said than done.

The scalability needs to be supported by all the components of the system – online processing, offline processing, in-memory data stores, databases, connection end points, and much more. The approach to making these items individually scalable differs from one another as the characteristics of individual components differ widely.

RS Software has not only developed robust scalable payment systems, but we had the privilege of handling scale-up demands of account-based payment systems at an extremely high rate: 100% YoY for 5 years, as against the expectation of 15% YoY. RS Software was closely associated with the journey of scaling up card payments of one of the largest card payment rails in the world from 500 transactions per second to 55,000 transactions per second – 110x over 20 years, i.e., 126% YoY for 20 years.

In case of extreme scalability needs, we have seen vertical scalability fail as the server specifications needed become financially unviable; we have seen horizontal scalability hit a “scalability wall” where the communication overhead neutralizes the gains of adding more servers. We have also experienced situations where the in-memory datastore cluster hit performance challenges as the processing was internally single-threaded or was plagued by inefficient connection pooling. In certain situations, we had to move from software solutions to hardware solutions as the latency was just not acceptable.

In such situations, RS brought in technology and infrastructure innovations to deal with extreme scalability needs so that the performance continued to meet the stated and even unstated expectations!

Thus, performance is a key expectation from payment systems. It has to meet the current needs of the business and of the growth that every business expects. RS Software is well acquainted in building systems that serve these expectations and even beyond when the business has a hockey stick success, and the scalability should not be found wanting in such positive scenarios.

To be continued – stay tuned for part 3.

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment moderation is enabled. Your comment may take some time to appear.