A FinOps Journey
I’d like to share my story, since I didn’t choose FinOps, it chose me. It probably chose you, too.
At the beginning of 2022 the greater Engineering org at my employer underwent a reorg. Part of this reorg smushed our platform Ops team and our platform Eng teams together, and the one person who didn’t fit into this new layout was our FinOps engineer. I was leading our Data team at that point in time, and much of what the FinOps engineer did was (to me) data work, so I agreed to take them on as a report.
I got FinOps certified and set about understanding our FinOps practice. My process of understanding required what must’ve been too many questions and requests for documentation and not merging their own MRs, and so after only a handful of weeks as the Manager of the FinOps function, our only FinOps headcount quit.
This left me fully holding the bag for our FinOps function – at that time encompassing production workloads on all 3 major US cloud vendors with a combined yearly spend in the low 8 figures. This was summer 2022.
My most immediate and obvious problem was visibility. This same problem in a different role led me to found the data practice at my employer, so I had a simple framework for what need to happen – data needed to be
- procured,
- made understandable, and
- made available.
In short – and my point of view on this has not shifted very much – The first, biggest part of FinOps is a bog-standard business operational problem with a bog-standard business intelligence (BI) solution.
The first expression of a theme
I had written up a long term FinOps plan of action by that point – the company’s first as far as I’m aware. It was pretty simple and laid out my thesis even though I didn’t put flashing lights announcing it as such:
The true power to optimize costs lies within the Engineering org..
By nature, engineers are mostly data literate, curious, and like to optimize things because optimizing is part of the craft. FinOps is just a performance optimization problem, and the profiling data is provided by our cloud bills. What most companies fail to do is translate the language of Finance and the AWS CUR into the terminology of whatever the company actually builds. A good profiler doesn’t use machine language – the actual system level calls being executed by the OS – to help the engineer, it points to the line(s) of code that are problematic.
The problem was simple with this framing – I just needed to transform our bills into reporting that engineering would find interesting. I already had our cloud bills in our warehouse; literally the first thing I did when starting the data team was write a data pipeline in Python to ingest years worth of AWS CUR bills from S3 into our BigQuery backed warehouse.
I intend to cover nuts and bolts specifics like how to build a data pipeline in Python in this blog, because these are incredibly valuable skills in our context. Knowing a tiny bit of Python and a little bit more SQL will make you absolutely unstoppable and invaluable in your organization. (It will also boost your cachet with the Engineering org.)
Tooling
I spent the rest of 2022 demoing various FinOps tools. I knew that months of data normalizing and report building lay ahead, and I didn’t want to build it myself. I didn’t consider it to be to my employers long-term advantage to build cost reporting tools that I would understand deeply, but that a theoretical new hire would never have seen before.
I’ll cut to the chase because this is getting long – I went through more than 1 sales cycle with the Big Names in finops tooling, got quotes, and got laughed out of the room by my superiors because they all charged based on a percentage of our total cloud bill. I ultimately decided to do what I could with what I had – complete access to all the data in the warehouse and a couple of tools in the company toolshed, namely DBT and our BI dash.
Dear Vendors, an aside on percentage-based pricing
I get it. Your solution will save us so much money, we’d be stupid not to spend 3% to save 15%, and you’re right. There’s also the fact that for what your solutions do under the hood, basing on a percentage of cloud spend is a reasonable proxy for the amount of data that’ll be ingested and therefore will correlate to your COGS for servicing a customer. But you don’t want to price+ your COGS, you want to charge for value… I get it! But…
The small-midsize shop is in a weird place in the FinOps tooling market. I suppose a mega-globo corp like Target or Citibank can handle these procurement issues in a variety of ways and also a few million dollars is basically a rounding error for them. At the other end of the spectrum, shops with < $1MM annual spends likewise are going to be able to cough up $15k a year to help them with this problem.
For us at say, $20MM a year, a 2% quote comes out to $400k a year. A YEAR. I don’t know about you, but that costs more than our 2 person FinOps team does and it doesn’t come with someone to set it up and operate it. Therefore, I am quite turned off by FinOps tools on the market right now, and this blog will have some strong opinions on building these skills and tools yourselves.
In conclusion
If you’ve made it this far, thanks. I don’t precisely have a plan here with this blog, but I think it’s far more important that I just get started, and journaling a bit about the path to here feels like an important thing to get down in print.
Future posts will be more nuts and bolts. Thanks.