Continuous Integration in a Data Warehouse
March 28, 2012 9 Comments
By Chris Mills – Development Manager – Cobalt Intelligence Center of Excellence
Over the last two years, we have almost tripled the number of developers working on Intelligence applications at Cobalt, gone from 20 to over 70 production releases per year, removed our QA team, and improved production quality from roughly one “must fix” production issue per month to one every six months. All of this in the face of scalability challenges that come with rapidly increasing data volumes and processing complexity. Read on if you’d like to know more about how we’ve done this.
Cobalt’s engineering organization made a transition from “waterfall” to Agile development in 2009. At the time, testing of warehouse and reporting applications was an expensive process — highly manual, and error-prone. Often the testing phase of a warehouse project would take more time than the design and implementation phases.
There were plenty of disadvantages to this approach, including:
- Inconsistent testing from release to release. The quality the testing was influenced by the quality of documentation and handoffs from team to team, as well as human factors like system knowledge and attention to detail.
- All testing, even routine testing, was expensive because it was so manual and because it depended on coordination and communication across teams and environments.
- Testing was late in the process. By the time issues were found, developers were often days away from making their changes. Another round of handoffs and deployments from Dev to QA was required before issues could be resolved.
Cobalt’s transition to Agile was driven by a desire to provide our customers with incremental releases and improved support for the fast-paced worlds of online commerce and digital advertising. Monolithic releases were out and smaller more frequent releases, often with short lead times, were in. The current testing approach was clearly incompatible with this, so the team began to pursue automation strategy.
The system we have developed for Continuous Integration of all Database and ETL changes relies on a variety of technologies: Anthill for nightly builds, Ant for orchestration, SQLUnit for unit testing, and Informatica web services for remotely launching workflows. Test suites are managed via Ant scripts, which orchestrate the following tasks for each ETL workflow:
- Set up the testing environment with seed data.
- Ensure that any system-level preconditions for the ETL being tested are met.
- Execute the ETL
- Execute a series of unit tests
- Cleanup the environment so that any data changes made as a result of these steps are removed.
A central application (Anthill) controls the scheduling of test runs, and provides an online reporting interface where results can be reviewed. A history of test runs is also maintained. Test results are also delivered to the development team via email, and the team treats automated testing failures as the top priority in its daily work. At any one time the team will have multiple warehouse releases in flight, each of which gets its own Continuous Integration test runs set up in Anthill.
At the time of this writing, more than 10,000 tests are run under automation against various versions of Cobalt’s BI codebase. Database level tests confirm that DB structures, indexes, grants, and other objects are appropriate after DB deployments. ETL tests confirm that processing rules and policies are enforced, and that dependencies between ETLs are accounted for. In the spirit of Test Driven Development, new tests are added to the suite early in new projects rather than after coding is complete.
This automated testing “safety net” has enabled a number of major changes for the Intelligence product team, all of which have had a direct and very positive business impact. Our ability to execute thousands of tests against any change in an hour or two has shortened project turnaround time dramatically. Developers have an easy way to get near real time feedback on the impact of their changes, which has improved their efficiency. Production quality has improved through more test coverage, and because executing the tests via software ensures consistency.
Finally, testing has moved far enough upstream in the development process that the need for a separate testing team has been removed. Headcount that used to be allocated towards a “QA” are now fully devoted to Intelligence roadmap development.
We are now approximately two years into our automated testing initiative. The successes enjoyed by the DB/ETL team from Continuous Integration have spread to the rest of Cobalt’s Intelligence product stack. Team culture has evolved to the point that testing is an initial consideration for any new work, rather than an after-thought. We continue to learn and refine, but the initial project goals of improving quality and team velocity have been achieved. Our team of Intelligence developers did over 70 production releases last year. Even though we no longer have a separate QA team, our production quality is higher than ever.
In an era of “big data” and increasingly complex and prominent BI applications, the ability to rapidly evolve a data warehouse is more important than ever. The solution here demonstrates not only that robust automated testing possible in a BI environment, but also that would bring similarly large business impacts to other organizations that follow a similar approach.
While automated testing is commonplace in the software world we have found it to be quite rare in the data warehousing world. Heavy system integration, large data volumes, and the variety of technologies in a typical BI environment pose special challenges. We believe that the degree to which we have automated our testing process is unique, and something that other organizations seeking to improve quality and the pace of their BI development could learn from.
If you would like to share your own experiences with test automation in a warehouse setting, or if you’d like more detail on the above, please comment and we’ll get the conversation going!