30% or more
You probably significantly overpay (by 30% or more) for cloud infrastructure and don’t even know it. Cloud services are complex and cloud billing is complicated and opaque. It may seem like cloud providers have you over a barrel (which is only partially true). Finally, cloud consumption and billing are difficult to predict. All of this means that cloud costs can be hard to manage or even to accurately identify. This is a big enough problem that there are consulting companies specializing in just minimizing cloud costs for clients.
Executives estimate that at least 30 percent of their cloud spending is wasted.
— Forbes / State of the Cloud Report 2020
I’d like to share a few practical, actionable tips and suggestions that can help you lower your cloud infrastructure costs. Most can be implemented in-house, by yourself, without the need to hire external help. If you have any questions, anything needs further clarification or if I missed mentioning something, please don’t hesitate to comment or contact me on Twitter or LinkedIn.
First I’ll cover some general principles, concepts, and culture related to managing IT infrastructure and development that result in lower cloud bills. In part 2 I’ll list some specific, practical actions that can save you a few dollars here and there.
Part 1 — General principles
Your cloud provider is your friend
Your cloud provider wants you to succeed. For them, successful clients mean long-term business, so they will want to help you with whatever they can. So, don’t be shy and talk to your cloud account manager. Tell them about your project, your needs. They will be in a very good position to recommend the best technology. They are also the people who will explain to you in detail how the billing works, what you should expect to pay, and what each line item means. Moreover, they will be able to provide you with lots of resources about the technology you are using, so you get the most mileage of every dollar you spend with them. Sure, there’s a risk they will try to upsell/cross-sell you more services, but educating yourself about the billing, terms and conditions, the technology, and its applications reduces information asymmetry and gives you more bargaining power.
Grow organically / Right-size
The only way to future-proof your IT infrastructure is to stay flexible (which I’ll cover later). Over-engineering, over-sizing, and “pre-optimizing” will only cost you more money and will never work for two simple reasons: 1) business requirements change very quickly and 2) technology changes even faster. Trying to design solutions and architecture that will be optimized 12 or even 6 months from now is a waste of time. Focus on what is needed right now, optimize for the current business process (meaning use the right tool for the task at hand) and stay flexible.
The most trivial example of over-sizing is buying too large servers that are constantly underutilized. The most prominent example of overengineering is using Kubernetes. It’s complicated, eats up a lot of resources (both server resources and talent), and is an overkill for at least 90% of companies. Unless you’re Netflix, you’re probably better with a cloud-native container service (Amazon Elastic Container Service or Azure Container Service).
Another activity in the right-sizing category is managing the cloud auto-scaling rules for your services and servers. Monitor your cloud infrastructure continuously and keep the rules updated, so they match the actual profile of the demand. Pay attention to overutilization limits and overcharges.
Some tips in the “right tool for the task at hand” category include:
- Pick the right solution to store data. Cold storage (like Amazon Glacier) should only be used as… cold storage, for backup. Write once and (hopefully) don’t ever read. If you use it for writing and reading, it will get very expensive.
- Use object storage (Blob or S3) where possible, for storing photos, videos, other large files for example. It’s cheaper than using block (disk) storage, which needs to be used to store your server image.
- Leverage content delivery network (CDN, like CloudFront or Cloudflare) for serving your static websites. They are almost free then, because they are served from the CDN and local browser cache, so you don’t pay for bandwidth and your server doesn’t do any work either.
Optimize the code and architecture
Hardware engineers say that all the efficiency gains from Moore’s law will be wasted by the software engineers… Optimized code and architecture will probably save you the most money in the long term. The differences in cloud operating costs between a good and a bad code/architecture can be huge: we’ve seen a 2x — 5x change in costs.
This is a subject worth a separate article, if not a book, so just very briefly, here are some simple, actionable suggestions. First, use garbage collection in the software you develop to make sure the applications manage memory efficiently. Manage CPU processes on the servers — kill dead or idle processes. You’ll need less RAM and CPU to run your services, so it will cost less. Second, avoid deleting database entries. Delete and rewrite takes more transactions than update, so will cost you more in IO activity. Finally, if possible, move all your servers into the same zone. The traffic between them will be cheaper, hopefully free.
Stay flexible and automate
Automation is probably the second biggest cost saver. It mainly reduces the internal resources (time of the engineers) spent on infrastructure. The cost savings are not immediately visible in your cloud bills but are rather reflected in productivity gains (deployment speed, performance, reliability, etc).
Automating your IT infrastructure management means introducing scripts, rule-based actions, and applications to assist the developers, DevOps, or system engineers with their tasks and preferably eliminate routine, tedious tasks. The big cloud providers offer good management automation tools under the umbrella of Infrastructure as Code (IaC). These tools, like Amazon’s CloudFormation, do a great job handling infrastructure provisioning and management. A piece of code (instead of a human operator) can launch a server with a specific service at the right moment, execute for as long as it’s needed, and then kill it to conserve resources and save costs.
Containerization is another great way to increase flexibility and automation. Putting your software into containers means that they can be launched on any infrastructure and the software will work the same. The same container can be moved around different servers, VMs, or clouds. Containers also scale quickly and can be orchestrated/ managed automatically. All of this means more efficient use of resources = lower costs. And remember, Kubernetes is one of the container orchestration solutions, not a containerization technology. Docker (the most popular containerization technology) containers can be managed by many different orchestration solutions, including native cloud solutions, Kubernetes, and others (for example Docker Swarm).
Once you have containers and IaC in place, you can work on “flattening the curve” of your cloud consumption by moving routine and predictable tasks around. Backup, migration, reporting activities, or any processes you can control can be scheduled when there’s usually a dip in the demand for your services (late at night? over the weekend?), or at least not during the peak (don’t do anything extra on a black Friday if you’re running an e-commerce website…).
Finally, if possible, don’t use proprietary solutions. Instead of buying a branded database service from a big cloud provider, buy a vanilla server from them and launch an open-source database inside a container on this server. This will be cheaper and will give you much more flexibility and control over your infrastructure. You’ll be able to migrate between clouds easier and use multi-cloud infrastructure in a much more convenient way.
Part 2 — Practical tips
Here are some very specific, maybe obvious aspects of your cloud infrastructure that can be managed to help eliminate some of the IT costs immediately.
Pay attention to your bill and turn off what you don’t use
Go through your bill line by line and see if you are using everything you pay for. It sounds obvious, but we still find a surprising amount of unused VMs and services that no one is using. Someone may have created a test environment 6 months ago that no one is using, there’s an old static website running on a server, things like that. So, make a backup of anything that you can use in the future (like the test environment or the website) and kill the server/service. You’re always able to restore it from the backup if needed. This part also includes deleting unused IP addresses (some cloud providers charge you when you don’t use them).
Keep in mind that some auxiliary services are not going to be deleted when you delete the main service. If you delete a server pay attention to and manually get rid of things like disk, snapshot, static IP, or a Windows server license.
Consolidate services and use one larger server
One bigger server is cheaper than the sum total of small servers of the same total size. If you can, use one bigger server instead of several smaller ones separate for each service. This will work great especially for services that constantly underutilized the servers, like static websites. You can probably launch tens of static websites on one small server instead of using separate servers for each website.
Use reserved instances
On-demand pricing is great because you only pay for what you actually use. But, chances are there is a baseline, a minimum consumption level for your cloud infrastructure. It’s much cheaper to use the reserved instances to cover this baseload and only pay for on-demand “peaks”. Here’s a simple illustration of this concept. Additional savings can be generated if you pre-pay for the reserved cloud services.
Every price is negotiable. The potential savings depend on your bargaining power, but I’m almost certain that every cloud provided will offer you some sort of a discount when you threaten to leave. I was once offered a 50% discount off a $3 server when I told them I want to close my account! It was a small hosting company, but even the big cloud providers are more flexible now. A few years back AWS was a de-facto monopoly, but right now Microsoft is breathing down their neck, Google is closing the distance and there’s a number of other formidable competitors with deep pockets (=aggressive pricing) including Alibaba Cloud, Oracle, IBM, and Tencent Cloud.
What usually works best is calling the competing cloud providers (if you’re with AWS, call Azure and GCP) and asking them for a quote for your current setup. Chances are it will be notably lower (20%-25%). Then take this quote and show it to your AWS guys, ask if they can match it. In most cases, they will offer you a discount and if you have the patience, you iterate to get the best results (i.e. take the new, discounted AWS prices to Azure and so on).
I hope you’ll be able to use some of these suggestions and lower your cloud bills. Please let us know (directly or on our Twitter) if you have any questions or would like to know more about a specific cloud cost-related subject.
I’d like to thank my friend and Djuno co-founder Moe Sayadi for sharing his knowledge and experience about cloud infrastructure for the purpose of this post.
Djuno develops AI that helps you take back control over cloud costs.
Djuno AI is a light touch tool that predicts server utilization, identifies seasonality, and provides cost-saving tips and recommendations. It’s free and doesn’t require any registration.
You can check it out here: http://ai.djuno.io/