Digital payments impacting everyday life – What must happen to keep them reliable?

Part 3

In Part 1 we had discussed resiliency as the underlying principle that drives the design of mission-critical payment systems. In Part 2 we discussed performance as one of the essential cornerstones for such systems.

In this article, we would like to discuss scalability as one of the key desirable features for such systems. Although we had briefly touched upon scalability in Part 2, it needs an in-depth treatment.

As per an article in the Business Standard (01-Sep-2022), based on the data published by the National Payments Corporation of India (NPCI), Unified Payment Interface (UPI) transactions grew 85 percent by volume and 67.85 percent by value year-on-year (YoY). In August 2022, UPI crossed the 6-billion mark in volumes, the first time since its inception in 2016.

According to the Annual Report 2021 of Visa, the transaction volume stood at 164.7 billion, up from 140.8 billion in 2020, which is a year-on-year (YoY) rise of 16.97 percent.

This rise in the volume of such mission-critical systems needs technology diligence during design, development, maintenance, planning and upgrade in a deliberate and concerted manner.

Why build for scalability?

All businesses including payments, plan for growth. Competition in the payment business tends to drive down margins, hence the growth must be substantial in order to ensure success in business. Payments is a business of scale. Adding one more customer – consumer or organization or merchant to the network – should be possible at a negligible incremental cost. Therefore, from inception it must be built with scalability as an architectural principle.

How do you design for scalability?

Let us look at some of the components that are needed to build a payment system, and how scalability would impact these components.

  • Online Components: These components interface with the external world. External systems invoke payments through these components and the payment system needs to provide a response in real-time. This form of invoking a request and waiting for a response in real-time is called a synchronous process. To increase scalability, the system should be able to (a) increase the performance of each point of interaction, termed vertical scalability, and (b) support the addition of more points of interaction, called horizontal scalability. The processing that happens could be stateless, that is, it does not need to look up any data elsewhere as all information is available in the transaction it processes; or it can be stateful, where it needs to refer to data stored from past flows. Horizontal scalability for a stateful execution is a challenge and needed to be resolved skillfully.
  • Offline Components: These do not come in the path of online execution; however, often an online component triggers an offline execution. Online components are designed for speed, offline components are not. It is typical to have buffers and queues between the two components for speed adaptation. The offline components could be long-running or short-running processes. From the scalability perspective, interfacing a high-speed online component with a long -running offline process needs special care.
  • Batch Processing Components: We just saw that online components trigger offline processing. However, it could be that the invocation could come from a batched set of transactions that are fired offline. Here, the process reads items from the batch files and invokes the processing offline. It is typical that there are upper limits to the number of items in the batch and the number of batches that can be processed. Scalability impacts the batch size and number of batches. Whether horizontal or vertical scalability will serve this best depends on the structure of the processing.
  • Event Processing: Unlike online processing of synchronous processes, events are asynchronous processes, that is the process notifies the requester after the process gets over – the requester need not “wait” for the response. This saves resources. Asynchronous processes become an important tool to enable scalability of systems by using the resources optimally.
  • Dashboard Components: Dashboards typically operate as near-real-time (NRT) processes. After a transaction is processed, it is expected that the transaction “appears” on the dashboard almost immediately. Thus, it is an NRT process. When online components are scaled up, the NRT components need to be appropriately scaled as well, to avoid the feedback of “sluggish dashboards”.
  • Reporting Components: All systems are expected to generate human-readable reports to understand how the business components and technology components are performing. These are typically purely offline, and they are often fired by a scheduler (timer-driven invocation) and shared with recipients – directly or as a link to a shared copy – using email, message, or other mechanisms. As volume increases due to scale up, it can impact report performance. We have seen situations where the generation of the report takes so much time that, if the process fails, there is not enough time to rerun the reports fast enough for the business to take decisions in time, leading to a missed deadline! Scale-up of reporting typically calls for intelligent preprocessing, should the need arise.

After teams build newer, scaled-up versions of the payment platform, they need to verify it. This process is called benchmarking. The challenge here is to pump in real-life like transactions at a very high rate to simulate a future state and collect all operational statistics to observe and refactor the kink points so that it is ready for tomorrow.

Having said this, life is not always that scary. Constant innovation is underway in all these components and many more that have not been covered in this introduction. These technology upgrades make life easier for the technology team designing or upgrading payment systems.

If you are interested to read about “Designing a Payment System”, you can visit the blog of the same name authored by Gergely Orosz. He writes The Pragmatic Engineer Newsletter and is an advisor at who previously worked at Uber, Microsoft, Skype, and Skyscanner.

To be continued – stay tuned for part 4.

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment moderation is enabled. Your comment may take some time to appear.