Luc Pâquet
Customer Challenges and Business Case
A long-standing client engaged Levio to architect, design and implement a Real-Time Payment (RTP) solution. Due to the aggressive time to market, resilience, and scalability requirements, they wanted the solution to run in the public cloud to leverage their managed services. This tight timeline necessitated a highly automated hybrid cloud development and delivery solution. Levio’s team of Cloud and DevOps experts were hired to deliver. The RTP market in the U.S. was moving rapidly and the project timelines needed to be expedited to align with the FedNow’s capabilities coming online.
Proposed Solution
The proposed solution had a significant number of non-functional requirements, which included high reliability, resilience, scalability, and high availability. Meeting these requirements in a hybrid cloud solution required automation, collaboration, and multiple iterations to prove out solutions first (MVP) then operationalize (production grade). A hybrid cloud solution was developed because integrating with multiple on-premise enterprise tools and systems was necessary. Achieving the desired solution on-premise alone would have taken much longer without the support of cloud services. DevOps capabilities and automation were required to achieve velocity across the entire software development and delivery value stream and meet the timelines. This allowed for very quick development and validation iterations as well as automated security, compliance, and testing.
To begin, Levio implemented an Infrastructure as Code (IaC) practice and pipelines using Terraform to provision all cloud infrastructure and services. All infrastructure was defined as code using HashiCorp Terraform. Furthermore, IaC pipelines written in Terraform Cloud were then used to deploy the infrastructure to all four environments (Dev, UAT, Cert, Prod). Manual provisioning of infrastructure was not allowed. As such, any change to infrastructure code has review and approval pipelines.
CI/CD pipelines in GitLab on AWS orchestrate the build and testing stages of the 12 microservices built into containers. Build artifacts are stored in AWS ECR after Security scans (Fortify, Sonatype and WebInspect) run on the build. Once a build passes, AWS CodePipeline is triggered to deploy to AWS ECS Fargate through AWS CodeDeploy. A Blue/Green deployment strategy is used in every deployment. Once the microservice is deployed, it is opened to test traffic. Using AWS Lambdas, the AWS CodeDeploy Lifecycle hooks verify the microservice is in steady state before the route is switched to the new target group. AWS CodePipeline deployments are done in this manner into four environments (Dev, UAT, Cert, PROD) and in two regions in parallel for active-active redundant configuration. This supports the solutions’ high resilience requirements for Disaster Recovery (DR) multi-region.
In addition to the IaC and CI/CD pipelines, a configuration management lifecycle was implemented using GitLab, SOPS, and AWS Parameter store.
Integrations were required from AWS to on-prem data center for data encryption (Voltage), security (Sonatype, WebInspect, Wiz.io) and logging (Splunk). These communications were also driven by the CI/CD Pipelines via an AWS Transit Gateway and AWS Direct Connect Router connecting the AWS Cloud to the customer’s Data Center.
Change management was required and implemented for any changes to upper environments (Cert, PROD) and was implemented using AWS CodePipeline. The change request form for deployments into upper environments was implemented in the CD pipeline. As such, promotions from lower environments (Dev and UAT) to upper environments (Cert and Prod) are controlled via manual confirmation through an approval gate in AWS CodePipeline.
Once in production, monitoring of the running microservices was done using existing monitoring tool stack, with Dynatrace and Splunk. Alerting on any error were configured to comply with Customer requirements. Wiz.io is used to detect any vulnerabilities on the deployed infrastructure. Levio also provided a postproduction team to support, maintain, and enhance applications and pipelines, including SREs, Technical Support Engineers and Cloud Developers.
AWS Services and Solutions Used
AWS Public Cloud was selected as the cloud provider for this project. As such, the following AWS Services were used to achieve a Cloud-native, resilient, secure and easy to operate solution:
- AWS native applications running in ECS Fargate
- AWS CodePipeline and AWS CodeDeploy are used to build and deploy the microservices
- AWS Parameter Store is used to store the configurations
- CI build pipeline artifacts stored in AWS ECR
- Separate pipeline for AWS account creation used to segregate deployment and application duties
- AWS ALBs used to direct traffic
- Active/Active Multi Region DR Strategy using AWS Route53
- Us-east-1 – Virginia
- Us-east-2 - Ohio
- Supplementary tools deployed on AWS Lambda
Third Party Applications and Solutions Used
- HashiCorp Terraform
- Maven
- Voltage (data encryption)
- Fortify, Sonatype, WebInspect, Wiz.io (security tech stack)
- Splunk (logging)
- Dynatrace (Observability)
- GitLab (Code Repo, CI)
- SOPS
Outcome and Results
The client’s Real-Time Payment offering was an overwhelming success. They were able to launch the service in beta on time and on budget. The main challenges of aggressive timeline, transaction processing SLAs, high availability and disaster recovery were all met.
The client’s RTP solution, currently comprised of 12 microservices, is running on AWS in multiple regions. Four environments (Dev-UAT-Cert-Prod) support the application SDLC, and provisioning of infrastructure is 100% automated. An end-to-end DevOps toolchain supports builds, testing, and deployments which achieves the velocity to deliver new features.
Success Metrics
Since the client’s RTP launch, the number of financial institutions on the platform have grown from 0 to 36 in 17 months and more are being added regularly. In November 2023, an average of 2,860 transactions per day were processed. The average dollar amount moved per day is $1.28 million.
The transaction processing time SLA is 15 seconds, and the solution is well within the SLO of 10 seconds at an average transaction processing time of 4 seconds.
The solution uptime over the past quarter has been 100%. Transaction cancellations and failures are caused by external factors such as unauthorized accounts, invalid data being sent, or other integration points having issues.
The deployment frequency to PROD is twice a month. This is mostly inhibited by strict certification process in Pre-Prod Cert environment. The lead time from code merge to deployment to cert is 2 hours.
+2800
Daily processed transactions
4
Average transaction processing time in seconds
$1.28M
Average dollar amount moved per day