AWS Cost Optimization – Keeping an eye on unused resources on your account

AWS allows you to scale up and down in a snap with no downtime, and to get the best return on your investment; it’s crucial to strike the perfect balance between performance and cost. Now, development and software teams can instantly spin up the resources they need and unleash their full potential, with lightning-fast deployments, even for testing purposes; innovation is no longer a pipe dream,

With the ease of resource deployment comes the responsibility of managing costs. Left Unchecked, AWS spending can quickly skyrocket due to unused or under-utilized resources running on the account. The focus is to NOT get stuck paying for resources you’re barely using and instead take advantage of the ability to change and adjust as your usage requires it so.

Your AWS implementation whether it is new or long-running there is waste either because it was overprovisioned, you have not moved to newer and cheaper resources or simply there is no financial control keeping you accountable for the overall bill!

By keeping tight control over your AWS bill you can hack away half of your spending and keep your hard-earned cash from going up in smoke.

So how does one keep an eye on all the resources popping up in the account and filter out the used resources from the unused or underutilized resources? Here are some of the ways that can help.

Have a Centralized Deployment Policy:

One of the most common underlying factors behind “cost waste” is resources being consumed while left orphaned because everyone and anyone can deploy infrastructure unchecked. Meaning, everyone can log in to your AWS accounts -with the right permissions- and create plus provision costly resources. By restricting the source allowed to create your resources you limit this occurrence and can set rules for the tags these resources must have (project; cost centre; owner e.t.c) making it easier to identify and monitor your spending.

Otherwise, a developer can easily log in to the console and spin up expensive instances and services resulting in limited visibility on what is being provisioned daily and only get notified through billing alarms after the change has been made. Consequently, a search for resources that caused the cost spike would follow but by then, it would too late as the bill is already grown.

Many of our clients have asked us to introduce individual cost reporting and cost back at the software development team project or business unit right to the detail of cost going up or down with a change at the software commit. This has proven to be an effective strategy to make everyone cost-aware and always thrive to make the best technical and cost decisions.

Action Item: Instead of allowing infrastructure to go up without proper visibility, rely on a common entity or department to undertake infrastructure provisioning. Ideally, infrastructure should only be published via a deployment pipeline (for additional best practices, you can have deployments triggered via code merges to branches that have been peer-reviewed by the infrastructure management team). This way, only infrastructure that a team has vetted gets provisioned, and it’s easy to shut down resources once the work is done without leaving any resource straggling behind unused.

How to innovate and control costs?

If you are concerned about the “speed to market” ability getting affected by developers getting delayed with infrastructure requests, consider creating an additional account (sandbox account(s)) and allowing developers unrestricted full access to these. You can configure lambda functions to wipe the account daily, meaning there will be no resources running for long periods.

AWS Nuke allows you to completely wipe an AWS account easily and can be automated, and Caution: As the name and the disclaimer suggests, AWS Nuke is very destructive and should be used only on a sandbox account where no production data is kept, thus ideal to still allow for implementation.

Provision via Infrastructure as Code:

Tying into the previous point, adopting infrastructure as Code will allow you to control what is being provisioned and keep it a standardized practice.

With IaC, only approved resources are created, protecting against drift that was manually introduced or by error, ex: previously running m5.large suddenly got upgraded manually to M5.2Xlarge. IaC will restore the code configuration.

IaC enforces a standard and security best practices configuration when used with other tools to validate best practices, reducing the risk of configuration drift and ensuring resources are deployed consistently, leading to cost optimization. Additionally, because the stack/script is responsible for creating, maintaining, and removing resources, IaC eliminates human errors, such as accidentally missing a resource during infrastructure deletion.

Action Item: Adopt an infrastructure as Code policy within the organization. Multiple infrastructure as Code options is available, such as Cloudformation and Terraform, with more developer options, such as CDK and Pulumi, becoming very technically mature products.

Have a Clear Decommissioning Policy Setup:

Even with the ability to remove all resources efficiently and the visibility on what gets provisioned, unless there is a clear decommissioning policy of when a resource needs to be terminated, upgraded, or cost-optimized focused cleanup cannot happen.

In large organizations with multiple development teams and projects, it becomes a massive operational overhead for account managers to run around team leads and project managers to ask when the development resources can be terminated.

Action Item: Consider creating and enabling a Decommissioning Policy for resources. A decommissioning policy provides a systematic approach to managing AWS resources, reducing costs, and improving the overall efficiency and security of resource utilization. It helps ensure that resources are being used effectively and optimally, improving the general management of resources.

There are many ways this can be achieved, and every organization will do this differently, but the basic principles are the same. Have a consensus on how resources can be identified when required and when they can be deleted. You can use AWS tags or create custom scripts that update records in dynamoDB with resources created and their Time to Live. The scripts can also handle automatic deletion when the TTL expires.

Resoto is a third-party inventory discovery and management tool that can read information about resources provisioned in your AWS Account and create a list of resources. One nifty feature is the ability to clean up expired resources. Simply add a tag to resources with the expiration date, and Resoto can automatically delete those resources.

Get Help:

AWS Account management is a massive ask, new services are released daily, and it becomes a hassle to manage them manually. This is why multiple options in the market do precisely that. Whether you are using open source tools such as Servian’s Auto Cleaner, a serverless application that can check unused resources and remove them on your behalf. Or you instead choose to opt for managed Services such as Nops.io, developed by AWS Advanced technology partner Nops, an automated FinOps platform helping customers reduce their AWS costs by up to 50% on auto-pilot with offerings such as Instance class optimization, unused resource deletion and spot instance recommendations.

In conclusion, reducing cloud costs by removing unused resources is a crucial step in optimizing cloud infrastructure. Although you can use the tools above all AWS platforms are very uniquely deployed, make sure you also do your own research and environment deep-dive to find the best cost-optimization opportunities. Organizations can reduce costs, increase security, and improve their overall performance and efficiency by staying vigilant and making regular efforts to optimize their cloud infrastructure.