The U.S. Army Corps of Engineers, Institute for Water Resources, Hydrologic Engineering Center (HEC) has been developing software for decades, and during that time, the technologies to develop, deliver, and support the software have profoundly changed. HEC's practices have evolved, and the collection of tools and team of developers have grown. Although HEC tools are considered industry-standard, HEC recognizes the need to improve the quality of code, development process, user and developer experience, and support for the U.S. Army Corps of Engineers’ mission and the greater hydrologic profession. As the demand for new features, bug fixes, technical support, and training increases, it becomes unsustainable to keep up with current demand without changing how business is done.
Christopher Dunn, the now retired HEC Director, issued the DevOps Moonshot challenge in August 2020. The vision for Moonshot was for a modernization of software development practices across HEC, driven by Center-wide collaboration towards adopting DevOps principles. DevOps emphasizes automation, collaboration, and continuous feedback, and its adoption requires heavy up-front investment in new tools and techniques. However, everyone sees the benefits - the development teams, customers, and the greater water resources community. DevOps has a noticeable impact on development timelines, as changes to the code make it into the end-user's hands quickly without sacrificing quality. It also requires an entire philosophical shift - automate what you can, simplify the process, and create value for customers more quickly.
To meet the challenge the DevOps Steering Committee (DOSC) was formed to serve as the collaborative environment where HEC's software teams could set their goals and help each other meet them. Although HEC celebrated "mission accomplished" for the moonshot goal in September 2022, it did not mark the end of the journey. DevOps is about continuous improvement, and each incremental improvement has a payoff. HEC is not adopting DevOps just for the sake of it, but as a playbook for how to do software development better.
So, how did HEC get to where it is today, and where is it planning to go?
The Wild West Days - HEC-HMS Transformation, 2017-2019
Gregory S. Karlovits, DevOps Steering Committee Facilitator recalls his journey through the transformation:
“Each of HEC’s software teams had different paths to our collective moon landing in 2022, but I can offer some perspective on the HEC-Hydrologic Modeling System (HEC-HMS) team’s motivations for adopting these technologies before the challenge was issued. I was fortunate to get involved in the DevOps journey early on. In February 2017, I joined the HEC-HMS team. Although I had several years of programming experience, I did not have software development experience, nor had I written code as part of a team before. Like most of the HEC team, I was an engineer that learned how to write code and am not a computer scientist. The HMS team was growing quickly, and the processes in place for developing code, fixing bugs, building the software, delivering it to the field, asking for feedback, and documenting the software, were not suitable for even our modest team size.
This became clear to me when I was tasked with investigating and repairing my first bugs in HMS. I had to figure out how to get access to the right version of the source code, get the project set up in an IDE (a code development environment), run HMS from the IDE so I could see the impact of my changes, test to make sure the fix was correct and I didn't break something else, then get my changes back into the master code. There was a pain point at every step, and it was clear to me that if more developers were going to be working in the codebase, all of these actions needed to be easier than they were. I knew there had to be a better way, but my lack of experience in real software development meant I could only throw stones, unless I learned what that better way was. I spent time on personal programming projects to learn and practice how these things were done in the Real World™ and try to figure out what we could do as a development team to make our own lives easier. It turns out that the developer experience is only one aspect of DevOps I was considering at that time.
The actual priority issue was getting bug fixes to the field quickly. A full release cycle for HMS was typically much longer than a year, and in the meantime, bugs reported by users were repaired and one-off builds of the software were sent for testing. The publicly available software still contained the bug. It was not until we posted a full release of HMS to the HEC website that the problem was solved for most users. This process needed to change. What could we do to get these fixes to the field faster? We repeatedly ran into several process bottlenecks that could not scale to a larger team working on more things in parallel and made getting a release out the door a stressful ordeal.
We tinkered and tested new processes and technologies to remove these barriers. Over a period of about a year, the HMS team made several changes to its development and release cycle. The biggest efforts follow:
• The team switched to a more modern IDE that made it easier to adopt several new technologies.
• Any HMS team member could use an automated build tool on their computer to generate a build of the software;
• We started using unit testing to validate that updated and new computational code produced the expected results;
• We moved the HEC-HMS source code to a more modern repository system that HEC hosted, and we switched to an industry-standard version control system to track changes to the code;
• The HMS User’s Manual and other standard documentation was posted online using a tool that let us update it continuously which made it much easier to collaborate;
• Once the rest of the infrastructure was in place, we stood up a server-based automated build tool that generated software builds with no user intervention.
On the programming side, our changes lowered the barrier for new programmers to contribute to the HMS codebase, improved our beta testing program by letting us make more frequent beta releases with more of the features under development available, and reduced stress at release time by having better-managed code and an easier path to a final build of the software. However, the most impactful change we made was moving our documentation online (HEC-HMS Documentation). The process of updating those documents prior to release was an ordeal, and now we were able to continuously integrate our changes into the documentation. Furthermore, this change also let us move all the workshop materials from our training classes online for anyone to use (see HEC-HMS Tutorials and Guides). Now when we get ready for training classes, we can make small updates to a living document online and point students to the most up-to-date version, which we can change even as the class is underway. Online documentation has truly revolutionized how we deliver our reference and training materials to the field, and I cannot imagine a world without it.
The HMS team kept forging forward, making changes to the way we delivered our software to the field. We were not the only team making these strides, but teams were not working together on a common strategy for improving their processes. We were all in different places, considering different technologies, tackling different priorities, and worst – not collaborating as thoroughly as we should have.
One question we (and likely other teams) kept facing was, "what happens if ‘it’ doesn't work?" "It" (a wide range of changes we were already making) already was working but were deviating from the norm in a big way, and there is always risk in early adoption. Several of us on the team spent personal time researching technologies and techniques and struggling through their implementation. We were personally invested in making our continuous integration/continuous (CI/CD) experiments a success for HEC-HMS, and other teams were on the same journey. How could we come together to elevate the state of the practice at HEC?”
Going Center-Wide, 2019-2020
In 2019, the first seeds of the Center-wide DevOps movement were planted when Chris Dunn asked the former Water Management Systems Division Chief, Chan Modini, to draft a white paper on the state of software development at HEC, with the aim to get a baseline before we would start identifying goals and plans for the future.
The "Software Modernization Team", with representation across the Divisions at HEC, started meeting in August 2019; and used surveys to collate current practices and technologies being used across the Center. The baseline clearly showed that the software teams were all in different places in the adoption of DevOps technologies and processes, and the landscape was continuously changing.
HEC-HMS was not the only team adopting new technology and improving its processes across the Center; the decision for each team to move towards DevOps was entirely organic and grassroots. Some teams were better positioned to advance more quickly. They had more staff and funding to pursue it, while smaller teams lacked the resources to make big changes without help. The work of the Software Modernization Team identified that CI/CD technologies were available and being used in day-to-day work, and that there was in-house expertise to move the whole center forward.
Beginning in mid-2020, the HEC management team took a deep dive into The DevOps Handbook to get everyone on the same page about the question, "Why DevOps?" At the August 2020 HEC town hall, Chris stated his goal was for the Center to move to a DevOps environment within two years. If the entire Center were to incorporate DevOps into their software development processes; then collaboration, open communication, and tech transfer was the way that teams could lift each other up to meet these goals.
The entire Center stood, at the bottom of the mountain, looking up.
Landing On the Moon, 2021-2022
The vision for Center-wide adoption of DevOps practices was to improve customer service and improve HEC products while reducing stress on developers. To accomplish this vision, five key strategic goals for DevOps adoption set the roadmap for software development teams to meet the mandate:
• Make software builds routine and stress-free;
• Invest in HEC team development skills in project management, programming, building software, and software testing;
• Improve timeliness and quality of HEC products by incorporating continuous integration/continuous (CI/CD) practices into our workflows;
• Improve practices related to contractor development of code;
• Use data to drive decision making.
To help teams meet these goals, a cross-Center steering committee was formed. This relatively small leadership group was tasked with setting and monitoring implementation goals, fostering communication and collaboration, and most importantly facilitating tech transfer between the teams. The group's plan was to rely on internal knowledge and capability and share it freely with software teams to get everyone across the finish line.
Chris Dunn signed the charter for the DevOps Steering Committee (DOSC) March 17, 2021, and the kickoff meeting was held April 14, 2021. The above strategic goals were a good outline of "what" needed to happen, but the software teams and the DOSC needed to figure out the "how". Then, it was up to the teams to implement these practices and technologies in their workflow.
The DOSC identified several measurable goals to give the software teams some milestones in meeting the strategic vision:
• Host all source code locally using a common repository;
• Increase the number of contributors to the software’s code;
• Employ an automated build tool so anyone can build the software; use a CI/CD tool for delivery;
• Improve code quality with repository documentation and code review processes;
• Develop and implement automated tests;
• Expand public beta testing programs;
• Improve project management by using standardized issue tracking;
• Make software documentation more accessible.
The first strategic goal of automating software builds had a lot of dependencies. A software build involves taking the source code and all the libraries and other resources it depends on and getting it into a state where the software is ready for the end-user. For most programming languages this involves activities like compiling the code, linking to libraries, generating an executable file, and so on. The goal was to automate this process so that changes to the code could reach our users faster, with minimal intervention, and done the same way every time.
Automating software builds and other DevOps goals required teams to adopt modern software development technologies. HEC chose a collection of tools by Atlassian that met the requirements, integrated well with each other, and were cost effective. Teams began moving their source code over to BitBucket or GitHub, tracking issues and software development with Jira, and moving documentation online using Confluence. Most development teams chose JetBrains’ TeamCity as their build management and CI/CD tool and implemented automated build tools such as Gradle or Maven to automatically trigger software builds with new code commits, which were being tracked using the Git version control system.
This stack of technology had an upfront cost, but it was a team effort to support each other in their adoption. Teams pitched in through the DOSC to help each other, had ad-hoc meetings to iron out wrinkles, and our HEC Information Technology staff, Darren Nezamfar, was there every step of the way to get teams off and running. For teams that made this migration, they could trigger an automated build of their software with every new commit to the codebase via Git, run all the unit tests and produce a pass/fail report, and if successful produce a new build. Anyone on the team could also manually set off a build at any time.
To increase the number of contributors to our code, we had to make it easier to on-board new team members and set up better environments for developing and testing code. It also meant providing better standards and guidelines for contributing, a more robust code review process, and more testing, both in the automated sense and through public beta testing programs. This cultural shift resulted in more people at HEC contributing to HEC software code than ever before, and we are now better positioned to work with other developers inside and outside of USACE.
Software documentation, tech transfer, and field support got a lift from DevOps efforts as well. Teams began migrating their documentation online, which had an up-front cost of importing everything from the existing Microsoft Word® documents and then fixing the formatting, but the payoffs were immense: HEC software documentation receives hundreds of thousands of pageviews per year and is never out of date. Training materials were moved online as well, reducing preparation time for training classes and enabling the public to work through the same training materials they would get from an in-person course at HEC. Discourse was established as an online forum and knowledge base for gathering tech support questions from the field and making answers to past questions searchable. The amount of HEC software knowledge available to USACE to modelers across the Corps of Engineers and the world has never been higher.
Regular check-ins at ongoing DOSC meetings on the DevOps goals served two main purposes: to track progress, but also to ensure that no teams were being left behind. Teams facing a headwind had an opportunity to ask questions and get help, to keep them on track to meet the 2-year implementation goal. By August 2022, HEC's software teams had either met the goals, were on the way to completing them, or had a solid plan and support to finish soon.
A little bit like the dog that finally caught the car, we were left with the question of where to go next.
The Evolution of DevOps: What's Next?
Today, the DevOps Steering Committee within HEC has evolved from a smaller leadership group into a broad software developers' community. As software development capabilities continue to improve and in-house expertise grows, the DOSC tackles bigger issues facing HEC teams and sets bigger goals.
November 2023 saw a change in DOSC leadership with Gregory S. Karlovits taking over from Richard Nugent. The main job of the facilitator is to structure productive conversations among the HEC software teams, generate achievable goals, foster technology transfer, and create a sense of community in software development practice. The DOSC meets monthly and all HEC staff are invited and welcome to attend. Representatives from each HEC software team serve as voting members, should there be a need for a formal vote based on requirements in the DOSC charter.
The Committee now serves as an open forum for teams wrestling with questions about software development. Its most important function is to help every team meet DevOps goals. Every DOSC meeting begins with an open call for questions, which are either discussed during the meeting or added to the agenda for the next one. The greatest success comes from HEC software teams sharing what they know and helping each other reach individual goals. Many prickly questions become agenda items at the next meeting and the group has a facilitated discussion to sort them out. Recent discussions have covered topics like data interchange between HEC software, the state of HEC open-source projects, and the next generation of HEC tools.
Soon, the DOSC will set the five strategic goals for Fiscal Year 2025 goals, which will serve as a compass to guide the group. DOSC will take stock of where they are and be honest about what they can do better. HEC would not exist without its customers, so their needs must be a priority. The goals must also improve the experience for customers in a sustainable way and consider the health and welfare of the HEC team.
Date Taken: | 10.09.2024 |
Date Posted: | 10.09.2024 14:09 |
Story ID: | 482845 |
Location: | ALEXANDRIA, VIRGINIA, US |
Web Views: | 20 |
Downloads: | 0 |
This work, Hydrologic Engineering Center’s Quest for Sustainable Continuous Improvement, must comply with the restrictions shown on https://www.dvidshub.net/about/copyright.