Problem

Mention “delete” to most anyone in IT Operations and they get twitchy. Deleting anything, particularly data, is risky. You won’t get reprimanded for leaving data in production storage. But you can certainly get reprimanded and potentially lose your job if you delete production data incorrectly, even if your intentions are worthy. So we air on the side of caution and avoid The Fear Of Deleting Data (FODD).

While leaving data untouched in production storage might be considered safe, the continuous accumulation of data leads to bloated systems, decreased performance, poorer user experience, higher costs, and increased carbon emissions.

This is essentially a maintenance or technical debt issue. If systems are not maintained, your customers or employees will notice and will start to complain. Or they will find other tools to use, creating shadow IT and shadow DevOps issues. Sometimes tools are so badly maintained that when the time comes to upgrade them, we discover it is not possible or feasible. Upgrades of poorly maintained systems may fail, or take so long that the outage while upgrading will not be acceptable (e.g. a source control system takes a week to upgrade, because it is 4 major versions out of date, the database has gone through 3 schema changes, and the data in the legacy system is large and has never been maintained.)

The causes of this maintenance mess are the usual suspects:

no guardrails to entering data
no data maintenance policies, so once data gets in, it remains for the lifetime of the system

There is also the Storage is Cheap (SIC) fallacy. What is the issue with storing a few extra GBs or TBs? Storage is cheap, databases are fast. We have data lakes growing by 10s and 100s of TBs per day (e.g. weather systems, LLMs, etc), how can a few extra TBs matter?

Others say that it is compute that is expense, not storage, so why worry about some extra storage. But clearly as storage increases, so will compute. Searching a database with 0 records will be faster than searching 1000 records, which will be faster than searching 1,000,000 records. Searching, indexing, updating databases all take longer as the data storage size increases. More data requires more CPU. While the relationship between storage and compute growth may not be linear, it is certainly co-related.

So storage is not as cheap as we think. Increased storage increases the need for compute resources, which in turn increases costs and carbon emissions. DevOps tools can and will degrade when data volumes become too large.

“Delete Nothing” not only applies to data records, it also applies to many other aspects of DevOps tools such as projects, plugins, users, and custom fields, workflows and screens. The same fears apply. Custom fields may be added to a tool, often for one use case or product or team. At some point, the use case is no longer relevant yet the customisation remains. The original reason for customisations is often forgotten, so FODD kicks in – if in doubt, leave the customisation there. Yet the existence of any of these things can and will affect the performance of your DevOps tools.

Finally, if you want a real data party, mix Delete Nothing with Data Explosion through Automation (Antipattern #1)! Not only will your data requirements grow, but they will grow exponentially and very quickly you will be forced to take action.

Solution

To effectively manage this data overload, it is crucial to establish clear data retention policies within your organization. Understanding the duration for which various data types need to be retained, including source code, documentation, change and request tickets, monitoring and incident data, and binary artifacts, is essential. Without this understanding, data tends to accumulate indefinitely due to the Fear of Deleting Data (FODD). Once data retention policies are established, automated data management solutions can be implemented. This in turn eliminates FODD.

However, it’s important to approach this issue with sensitivity and avoid using triggering terms such as “delete” or “deletion.” Instead, alternative terms such as “data maintenance” or “archival” should be used to mitigate fear and emotional responses.

The following steps can be taken to address the challenges effectively:

Automate Archival: Data that is no longer required for regular use, but that should not yet be deleted, should be archived away from production systems. This will improve production system performance and user experience while reducing the cost of quick retrieval storage. Archival data should be created by automated service/system accounts and owned by product or service teams rather than individual users. Archived data should still be managed by data lifecycle and retention policies.
Automate Cleanup: Data not required by data retention policies can and should be “maintained” (i.e. archived or deleted). Data created by individuals, not owned by product or service teams, are also candidates for automated cleanup. Individuals come and go, and once they are gone, there is no one to ask what to do with the data attributed to the individual. So FODD kicks in, and the data remains. Implementing automated processes to identify and remove unnecessary data can help keep your DevOps systems clean and efficient.
Sandbox Areas for Individuals: Establish sandbox areas that allow individuals to experiment and innovate. However, these spaces should have automated cleanup mechanisms in place to remove inactive data after a certain period.
Non-Production Areas for Services: For frequent automated processes that generate large volumes of non-production data, such as temporary binary artifacts or request build and test logs, use non-production data storage systems. Then implement rolling data removal policies to keep these environments clean. This data should be much easier to maintain, with low risk of deleting sensitive information. Production artifacts, which may require long-term storage, should be generated and stored elsewhere and should be owned by product or service teams rather than individuals.

By implementing these solutions, organizations can effectively manage data growth, optimize system performance, improve user experience, and reduce costs and carbon emissions.

See this Expert View article (part 1) and part 2 for more examples of data retention policies in DevOps tools.