A somewhat respectable colleague and former mentor of mine recently pointed out that if this author thought something was a kludge, it was really bad. That's because, he reasoned, "you work in DevOps. Nothing is more of a kludge in software engineering than DevOps." I thought this was an amusing criticism. Do I think it's true? Probably is. Does it oversimplify? Maybe not.
I find DevOps engineers to be unique in the software startup world. We are often some of the grumpiest software engineers you'll come across. When other software engineers are celebrating their big release of something new going live, we're a bit more tentative. We'll join in for one drink in the name of team building. Mostly it's to take the edge off of the encroaching sense of dread.
When the new feature goes live, it breaks in ways no one predicts. Usually it's a configuration that worked fine in staging. Occasionally, it's a missing database index. Sometimes it's application code that someone fresh-out-of-school wrote... boy, is that person going to have a rough hangover tomorrow... Unless you released on a Friday. Then there is no finding that dev until Monday at lunch.
Take a step back from this story to see all the systems DevOps tie together to do this job. The software probably is live on a node, server, or container of some kind. This node probably has configuration files and multiple sets of software running. This setup gets its software changes by some kind of deploy system. The deploy system is probably closely coupled to the software framework (RoR, lamp, java...) If something goes wrong, you need important stats on your system easily available. Better yet, if a stat goes out of bounds, you should be alerted.
Software seems simple, right? Pick your project, write your code, deploy, watch, and profit! <- Right there you see the first problem. The DevOps infrastructure is... sadly... an afterthought. Nothing is quite so simple and elegant as getting your prototype up and running on your laptop. That's a long way from a web-service that can handle tens of thousands of simultaneous users.
There is a metaphor I like to use against software development: building construction. The humble prototype often has the complexity of a nice new camping tent. Took a while to figure out how everything fit together, but it's done now and seems like a novel place for two people to spend a night. As time goes on, you'll probably want more. You probably want heating, and a kitchen, and plumbing of all kinds. One would never actually try to morph this tent into an actual house, but that's effectively what happens with software.
You could imagine the process: Let's add a better roof, something that looks like a covered parking spot. Good improvement. Add walls from corrugated fiberglass. Fantastic product! It lets light in and keeps the elements out! Well, maybe not... not if you have ambitions of actually building a house. In this example you are increasing the complexity of your current situation and taking time and resources away from your ability to build a real house.
The assembly of a working DevOps infrastructure goes through this analogous process. Your software is running on an EC2 instance! But where do you send your metrics? Let's use graphite! It's open-source (free), and you just plug in a database. You can even use your existing database!
Your user base is growing and now you need two servers. Can't use that fancy AMI snapshot, because that would stop your service and you only have one server. So you manually build a new one, but you are careful to get an AMI snapshot. Time goes on and you've made a dozen manual changes to your servers. Enter the configuration system. Your options range from a bash script, or maybe ansible, or chef, or maybe even move everything to a Docker container!
So you go middle-of-the-road and build things with with ansible. When your project grows and multiple environments and servers that are really similar but have many complex differences. You realize that maybe chef makes a lot of sense for your situation. But you really want containers, too. Why not both! Maybe skip the chef part and figure out how to have ansible play nicely with containers.
In the meantime, that graphite service you setup... it's totally overwhelmed. And the new product person hates the graphing interface and wants to triple your data points. What now? Grow your system.. maybe add some home grown metrics aggregators, and use cassandra to store your data. All this works nicely for a while, but then you realize you don't have time or resources to maintain this metrics system.
So you look third party for your metric needs. NewRelic, CloudWatch, SignalFX, Datadog, Instrumental... Each one unique in feature set and price point. No two alike! You pick one, your project person loves it, but your engineers can't recreate their old graphs. Meanwhile it seems like some data isn't working the same... Who knew the accountant was using data in the metrics system to balance the quarterly books!?Slapdash that metrics system back into acceptable condition. It's similar to hooking up the kitchen drain to the vanity sink drain. Bond it together with duct tape. A little plaster of paris to stop the leaky part. Hang some wall board to cover it. Looks fine! Seems to work... but your dry-rot problems are just beginning... Your in-sink-disposal will not have the bandwidth you want.
Shoddy plumbing aside, even the most quality-driven engineer will find their limits in this realm. The requirements will change, the urgency of problems shift, and sometimes the best solution you can deliver is a gradual step, kinda like that corrugated fiberglass wall. Good improvement! Fantastic product! Love me some corrugated fiberglass.
Herein is the Kludge: There are a million ways to build it, none of them are "wrong." The priorities and resources of your project will drive the current DevOps problem and will also drive the DevOps solution. Eventually enough time and solutions accumulate. Eventually you have collectd sending data to statsd which sends data to three different locations... No amount of duct tape will ever save you.
