CICD pipelines makes managing the lifecycle and deployment of a product easier and consistent. Travis has a tight integration with GitHub which made it a perfect candidate for internal and open source projects to utilize. For our customers and partners it was a prerequisite to support and run their product on the Z platform.
Internally we had a Travis environment for x86 and ppc64le, but none for s390x. I was tasked with supporting Travis on Linux on Z, so began a month-long dive into the source code of Travis learning it from end to end. Utilizing a forked project with modifications for s390x, I deployed Travis regression pipeline for the release for IBM Cloud Private (ICP) on Z, a Kubernetes offering by IBM.
In collaboration with my team lead, we built and ported the Travis codebase to `s390x`. Jay focused on the infrastructure, initially hooking OpenStack up to a z/VM hypervisor but ultimately settled on KVM. I focused on the provisioning tools that power the magic of Travis and its integration with GitHub. The code base consisted of Chef cookbooks and custom bash scripts. After three weeks of effort we got our first working build and the result was promising.
For the first time ever our developers were able to modify their Travis pipelines and run their builds on Linux on Z. However as they integrated the s390x builds, all of a sudden Travis on Z became mission critical service due to the nature of Travis and GitHub. If even one architecture failed to pass, Travis integrity checks prevented code from being merged into the main branch. When Travis on Z took an outage, the messages and alerts quickly started flooding in from concerned consumers with deadlines on the horizon. The current solution was not sufficient for a rapidly growing user base.
Utilizing load balancing via RabbitMQ we stood up another OpenStack instance located in Hursley, United Kingdom and replicated our port of Travis on Z to those systems. Since then we have been able to take independent outages without affecting an ever growing demand for Travis on Z. However we ran into issues with big bursts of builds which caused the networking layer in Openstack to intermittently not initiate. After days of debugging and testing we abandoned our initial strategy, pivoting to use LXD to run the builds. Ultimately this turned out to be a much better solution, resulting in a huge speed increase and more portability. Using Packer I built xenial/bionic base images for Linux on Z and deployed a production configuration supporting 20,000+ builds a year.
Once we moved to the LXD based builds I was able to automate the deployment of LXD and the Travis Worker on a Ubuntu Bionic host using Ansible. Still using the previous OpenStack environment we could now provision infrastructure with Terraform and configure workers in tandem using Ansible.