In this Expert View article, our expert Andrew Thompson discusses the use of automated “FinOps policies” to improve efficiency and cost control in large organisations’ cloud environments. These policies serve as financial guardrails, aligning cloud usage with financial objectives and promoting cost savings while empowering application teams to make informed decisions. The article highlights the benefits, considerations, and common mistakes associated with implementing such policies.
In large organisations, where cloud environments undergo frequent changes, maintaining high levels of efficiency in your cloud usage can be challenging. Deploying automated ‘FinOps policies’ to enforce minimum standards can drive significant improvements in cloud efficiency. These policies can be thought of as financial guardrails, allowing an organisation to enforce financial controls across the entire cloud environment whilst supporting application teams in maintaining best practices. By centralizing these efforts, organizations can achieve consistency, increase cost savings, and improve overall cloud efficiency, without impacting development speed. This is good for the environment, and it also reduces your cloud bill.
Most cloud engineers are familiar with the concept of automated security policies that run in their cloud environments. For example, there may be a rule that blocks public access to AWS EC2 instances hosted in non-production environments. FinOps policies extend this approach by incorporating financial controls. They provide a framework to control cloud spend and establish minimum standards for cloud efficiency. Similar to how security policies align with a business’s security requirements, FinOps policies align with financial objectives.
Depending on how your organisation operates, these FinOps policies and their framework may be created by the FinOps team themselves, or may be developed by a platform team with input from FinOps specialists. For this article we’ll assume the FinOps team themselves are handling the work, though there are no strict rules on this, and the work can be owned by whichever team in your organisation is the best fit.
While the policies themselves are valuable, their implementation goes beyond simple rule enforcement. When the FinOps team creates and implements policies in the right way, it fosters cost awareness among application teams. By clearly defining the minimum standards of cloud efficiency expected by the business, the FinOps team can educate and engage stakeholders about the financial impact of their actions. This empowers application teams to make informed decisions, optimize cloud resource usage, and contribute to cost savings. Additionally, by showcasing the tangible value of FinOps engineering to a wider audience within the company, the FinOps team can gain support and recognition for their efforts.
Below are examples of FinOps policies a business might deploy to reduce waste. This example list is not definitive, as every business will have different opinions on acceptable tolerance levels.
The problems these types of policies address cannot be targeted via ‘shifting-left’ and running analysis of the infrastructure when it is initially deployed. FinOps policies like these are designed to monitor the active cloud environment.
Why do I need Automated Policies?
Cloud computing offers development teams unprecedented agility and control over their infrastructure. It allows them to focus on innovation, solving critical organizational problems, and delivering value quickly. While some might argue that waste reduction actions should be the responsibility of the application teams, the reality for many large businesses is that, whilst in theory this would be great, in practice it does not happen. Too often, simple waste reduction is not handled due to competing priorities within the application teams, or the application teams have not been provided with a means by which to easily detect resources that are not being used. To overcome this challenge, automated policies that target the minimum standards expected by a business are essential, whether being used is a ‘passive’ mode to supply data to application teams, or in ‘active’ mode to actively remove unused resources.
Automated policies enable organizations to be more opinionated in how their cloud accounts are governed. While cloud providers consider all systems to be “production,” businesses can differentiate non-production environments through these policies. By collaboratively working with application teams, organizations can define a set of automated guardrails that effectively balance lean non-production environments and the ability to deliver new features.
Benefits of Automated Policies:
- Lean and Efficient Non-Production Environments: Automated policies ensure that non-production environments maintain minimum best practices expected by the business, eliminating unnecessary waste and inefficiencies. These guardrails promote lean and optimized cloud resource usage, enhancing overall cost-effectiveness.
- Minimal Disruption to Development Teams: The objective of FinOps controls is not to hinder development teams’ work or maximize waste reduction. Instead, these policies provide a sustainable approach to reduce risks in cloud environments, by defining and automating the minimum required standards.
- Cost Reduction and Environmental Sustainability: By implementing FinOps policies and reducing waste in cloud environments, organizations can significantly reduce their cloud bills. This cost optimization not only benefits the bottom line but also contributes to environmental sustainability by minimizing unnecessary CO2 emissions. There is a growing requirement for corporations to report CO2 emissions, and implementing these policies can effectively contribute to reducing such emissions.
How do I create Automated Policies?
When designing and implementing automated policies, it is crucial to not only decide which policies to create but also to consider the capabilities of the framework that will run these policies. Equally important is the collaborative effort with application teams throughout the development of these capabilities. Overlooking the non-functional aspects can result in the policies failing to achieve the intended impact.
Considerations for the Policy Framework
When designing the framework that will run automated FinOps policies, several key considerations should be taken into account to ensure its sustainability and effectiveness. The framework must possess the necessary capabilities to handle the complexities of enterprise-level cloud environments, provide value to application teams, and offer observability features. Failing to address these considerations may result in brittle cost reduction policies that do not deliver the expected reductions in cloud waste.
The initial framework creation stage is a great opportunity to collaborate with application teams, giving them a say in how the framework will operate. These application teams are a customer of the framework…and they are software engineers. If the policy framework does not meet their requirements, you can be sure they will find a way to circumvent it, to safeguard their ability to keep delivering new features.
There are many different patterns for policy frameworks. Third party vendors, in-house platforms like a framework based on AWS Lambda functions, open-source tools like Cloud Custodian, and many other techniques all provide some form of this capability.
When deciding which pattern to use there are 2 key aspects to consider as a first step:
- “Does my organisation already have systems in place that offer automated policy techniques?”
- For example, automated policy frameworks may already be in place for security or internal compliance projects, or your organisation may already pay for a 3rd party vendor tool that has this capability.
- Leveraging existing frameworks will help speed up the Proof of Concept work and provide feedback on the value of your FinOps policies quickly
- “Will my organization support and allocate resources for an internal policy framework project?”
- While an open-source solution may seem suitable for your needs, it’s important to recognize that a policy engine is an application like any other. Even if the engine is open source, integrating the technology successfully will still require a significant amount of technical work.
If you already have an automated policy framework in place, consider whether the framework can answer the following questions. If it cannot, this is a sign that your framework may not be meeting expectations:
- Are all the policies functioning correctly?
- How much value is each policy providing us?
- How many resources have been opted out, or have not been opted-in, and which teams or business units do these resources relate to?
Many automated policy systems, out of the box, do not deliver this functionality, so it may need to be built into the system.
If an automated policy system lacks these capabilities, senior stakeholders may question the effectiveness of FinOps policies.
The most effective way to understand all the features necessary for the FinOps policy framework is to have a dialogue with the application and business teams to discuss the requirements. Here are some examples of key topics that should be considered:
Addressing the Elephant in the room – Passive Vs Active Policies
Engineers on high performing teams with a strong DevOps culture may be thinking there’s something not quite right here. With DevOps, a great culture for application teams is based around full ownership of their product and large amounts of autonomy. If you decide to go with ‘opt-out’ systems, and start automatically deleting application team’s resources (albeit in non production), this goes against the very culture you are trying to create.
For many businesses, using policies in passive mode to provide the required data to application teams will be the best fit.
However…many businesses find themselves some way from achieving DevOps Nirvana. For example, a business may find itself in a situation where exploding cloud spend has become a material risk to the business, or a business may outsource the development and maintenance of some applications running within their cloud accounts to 3rd party vendors. In these types of scenarios you may need to be a lot more opinionated in terms of how you enforce best practices.
For any new venture into automated FinOps policies, it is always best to start with passive policies, coupled with strong reporting capabilities. This will allow you to have discussions and make decisions on whether to switch policies into ‘active’ mode, using real data from your own cloud environment.
What policies should be created?
To develop effective FinOps policies, it is crucial to base your decisions on real data. By analysing detailed billing reports, you can identify the resource types that contribute the most to your spend and usage. These areas should be the primary targets for automated policies. For instance, in some organizations, ‘EC2’ and ‘RDS’ may dominate the cost and usage charts, while in others, it could be ‘Lambda’ and ‘Dynamo DB’, or other types of services.
The next step is to perform an analysis – hypothesize a policy and use the data to understand how much waste reduction could occur.
Let’s consider a simple hypothesis:
“Removing unused AWS EC2 instances from the Development environment will result in significant cost savings.”
Begin by examining the daily cost of the resources that you expect to impact, accounting for any applicable discounts
- If the estimated cost savings appear substantial, delve deeper into the specific types of resources that will be affected. Remember that ‘EC2’ encompasses various aspects such as CI/CD systems running on EC2 instances, Auto Scaling Groups, EKS cluster worker nodes, and many more. Focus on the specific areas where you want the policy to be enforced and ensure you measure the potential cost savings within that dedicated area.
- If you find the estimated cost savings do not appear substantial, move on and look for other, more effective, policies to create.
Adopting a data-driven approach is essential. Without this approach there is a risk of creating policies that may seem promising on paper but fail to have any material effect on your Cloud usage.
Common mistakes to watch out for:
Common mistakes businesses make when implementing automated policy frameworks fall into 2 categories:
- Poor Communication with engineering teams
- If a strong relationship with application teams is not created right from the start of the work, the central team creating the policies can easily find themselves in a bubble, building centrally deployed policies that are terminating resources without telling any of the resource owners.
- At first the central team may pat themselves on the back as they measure how much money they have saved, however very quickly the project will fail, as they will rightly have no support from the engineering teams, who will collectively work to disable the central project as quickly as possible before it does any more damage.
- Failing to agree how to measure the value of the work
- Agreeing on the metrics by which to measure the projects success must be agreed up front.
- The team building out the automation must work to get agreement at a detailed level on this subject.
- If this topic is left until you are presenting the success story to request additional funding, you can expect stakeholders to point out that your metrics do not tally with actual real monetary values, or that you turned off a set of resources that were covered by a reserved instance agreement, making no material difference to the cloud bill.
To ensure success in a large enterprise cloud environment, a FinOps policy framework must be opinionated and flexible enough to create a balance between cost control and developer productivity. The potential financial risks associated with wasted cloud spend can reach multi-million-dollar values, so relying solely on visualising data, and hoping the unused resources get cleaned up is not a sufficient level of control.
Establishing effective guardrails to mitigate these risks will allow organisations to capitalise on the advantages of cloud computing, while still maintaining financial security. Developing these guardrails collaboratively with the application teams will also strengthen the bridge between your finance and engineering groups, helping to raise cost awareness throughout your organisation.
If you’d like to know more, you can get in touch with Devoteam here to talk to me or one of our other experts.
Andrew Thompson, Principal Consultant
FinOps Devoteam UK
About Devoteam A Cloud
Devoteam, an AWS Premier partner, can offer you the support and guidance you need to implement Voice ID on Amazon Connect. With 500 clients across Europe, Devoteam A Cloud offers excellent know-how on AWS technologies since 2012. Our team of 550+ AWS experts supports customers with scalable infrastructure, new ways of thinking and operating enabled by AWS so that they can explore new possibilities, re-invent their business, and evolve into an enterprise platform.