
Background
Ola Cabs is the most successful cab aggregator in India with more than 1 million rides per day across 169 cities. After growing significantly in India, they expanded internationally to Australia and later to New Zealand and the UK. Ola has had an enormous impact on the unorganised private cabs sector in Indian cities, and their growth has been fueled by the lack of reliable public transportation infrastructure. Thousands of private cab operators moved their fleet to Ola’s platform along with many individual drivers, leading to a valuation to the tune of billions of dollars.
Problem
Given the immense expansion of Ola’s customer base, it was necessary for the company to ensure a robust, well-performing codebase. After five years of fast-paced development where maintenance took a back seat, some cracks were starting to show, and they asked us to evaluate possibilities for improving their technology.
Our focus was on their Android-based Driver Platform, and we spent time analysing the codebase, architecture, and engineering processes. We learned…
- Nearly all the driver partners were using low-quality, white-label Android devices. The version of Android running on these phones often behaved erratically, which made development difficult.
- Traditional engineering practices prevented engineers from building and releasing software rapidly, imperative for the vast growth in business.
- Lack of streamlined release engineering meant spending significant effort on manual testing and verification of releases.
Process
We spent considerable time exploring various solutions, discussing them with key stakeholders who had far more domain knowledge than we did. Eventually, we reached a fork in the road and had to choose between refactoring the existing code or rebuilding it from scratch.
Refactoring would have required many more consultants working on the project -- a luxury we did not have. Our suggestion was for Ola to rebuild the platform: utilising its own team, and choosing a modern architecture along with radical improvements in their engineering process. To this end, we conducted training workshops with their engineering team to supply a fresh approach. We discussed various industry best practices and how they’d fit the team's unique requirements:
- Modelling the flow of data in the system becomes simple using a unidirectional data-flow architecture, similar to React/Redux on the web. This adds predictability to the system, making it easy to understand why and how something changed.
- An extensive test suite guarantees the stability of shipped software and reduces the effort of refactoring in the future. The bulk of the suite is usually composed of unit tests and integration tests, along with a thin layer of UI tests. A continuous integration (CI) server runs tests against every change so that the developers know exactly when something breaks.
- Trunk-based development removes the need for long-running feature branches that never get integrated. This also prevents “merge hell” meetings, saving precious time. Combined with a feature-flagging system, trunk-based development increases the frequency of code deployments -- a key indicator of high-performance teams.
Outcome
We gave Ola a well-documented case for how to proceed: many techniques and practices that are overkill at small scale become key factors for success at large-scale organisations. Ola received an external perspective on how to increase the robustness of their platform, and we gained experience working with teams that maintain large and complex legacy codebases.