How I think about software development, pt. 1, the software side

This is a highly opinionated exposition of my thoughts, and advice as it pertains to software development generally, developed mostly in the time period from 2018 to 2025. It comes from my experience writing medium-to-large (but not massive - I've never worked on a > million-line codebase, for example) projects, by myself, as well as in teams. Work in teams was generally in the web-development space, whereas independently I do work in tools, games and graphics (as well as web). I am far from an industry veteran - but for this seven-year period I have programmed nearly constantly, often for 10-14 hours a day.

Trust but verify pt. 1

All pieces of ostensible wisdom, advice, positive or negative feedback you ever receive - including this - may be false or unfounded. Treat it as such. This is doubly true if it is not backed up by some evidence or concrete reasoning.

The Software Side

Software development is engineering

Philosophy is fine, but when doing engineering, prefer science.

I've worked with people who have spent hours of time in meetings discussing the 'philosophy' (their words, not mine) of some piece of code. These people tend to be active in code reviews, going back and forth on how variables should be named, which files which functions should be in, or talking about the 'elegance' or 'cleanliness' of a solution. They also seldom solve actual problems well.

There is a part of me that is this person. I like talking about philosophy, and meta-questions about the task-at-hand. I also think aesthetics actually do play a role in software development. But if they are where you start from, before you know how to solve engineering problems, and solve them well, you will produce garbage. You will waste time in meetings - hours, maybe - talking about nothing in particular. You will waste the time of people kind enough to humor you, or naive enough to think your 2-page Confluence document on why we should update our prettier config to do linewrapping at 80 columns instead of 120 is worth reading, let alone worth writing.

In the least optimistic case, someone might enjoy the fact that engaging with you relieves them of their more-important duties temporarily, which is a temptation many of us feel, at a time where our attention is so often fragmented, and where the Windows operating system often ships with literal advertisements.

An example of non-engineering

I recall an argument I had with a friend about a project I was embarking on - a web-application to play a board game.

At the time I was in my first year of college, and I was absolutely awful at programming. Contrarily, for the first time in my life I was doing well in school. I learned Java, Object-Oriented-Programming, and some basic 'design patterns'. I took things seriously. Because I was positively reinforced by my grades, peers and instructors, and worked extensively on my own time to do extra, more ambitious things (relative to the typical first-year), I probably thought I knew some things that others didn't. This was largely false. I did not know how little I knew about how to write good software.

The debate centered around the relationship between the game state, and the 'player' data model. Of course, both of these were Object-Oriented classes. I was strongly advocating that these two classes should store references to each other, in a circular manner. The reasoning being 'convienence' and 'elegance'. I was likely imagining how 'nice' it would've been to type player.state or state.activePlayer instead of some alternative.

What is abundantly clear to me now, is that nearly everything I was doing, nearly everything I was thinking about was wrong. Not even that I had made the wrong choice between some set of available options - I was thinking in completely wrong terms from the beginning, which prevented me from even discerning available options.

This was going to be a networked application. How were clients and the server going to communicate? Over which protocols? Do we need to be real-time? Were there going to be multiple servers? How many? Where are they physically geolocated? How big is the market for this product? How many people could reasonably be expected to use it concurrently? If you had asked me any of these questions - I would not have had an answer for you. If I did - I would have been making it up on the spot, having never before thought of it.

You see, the program I was writing with the two Object-Oriented classes interacting in a circular-referential manner - it was a NodeJS program that did not use the network. I wasn't solving the actual problem - almost at all. I had nearly zero conception of how the code I was writing related to the actual product I wanted to produce.

Because the code I was writing had nearly zero connections to anything actual - anything that does anything, in reality - I had no metrics, no measure by which to evaulate which decision would be better or worse than any other. The only thing I had to by was what I was taught from lectures and books and online resources, which trended towards Object-Oriented thinking - modeling entities in your program as classes, and as abstract entities. The poison of OOP had seeped into my brain, and still to this day I am embarassed to admit that I occasionally waste time on asking myself stupid questions like "what file should this utility function go in?", when the answer is largely inconsequential.

The few things you could evaluate architecturally, were bad. You should avoid circular references in your data models whenever possible, because circular references (usually) make nothing important easier, and some important things harder (serialization, for example). At the time, I didn't have a sophisticated enough understanding of memory to know that this was true, or why it was true.

The only positive thing you could say about what I did at this stage of the project was that I atleast wrote code. Given my education, it wouldn't have been unexpected for me to start by UML diagrams instead.


The project I'm referring to did eventually produce a working multiplayer web-application - after several complete grounds-up re-writes, as about every 8-16 months during the process of working on it part-time I had some kind of epiphany that completely up-turned my understanding of what it means to write software that is good, and I would realize in horror that I had made assumptions in the previous implementation that were unnacceptable to me. Some of these assumptions weren't strictly unnacceptable in that the project could have been finished without the re-write, and probably finished much sooner. But the quality of the end product would've undoubtly been much worse.

I'm mostly happy with it now.

I'm not advocating for complete re-writes in this article. I'm hoping, rather, to articulate what is the better alternative to the bad decisions, incorrect assumptions, and shoddy foundations that caused me to want to do multiple re-writes of my entire codebase in the first place. What could I tell that person seven years ago to help them?

Admittedly - it's a bit of an assumption that such a person would have listened to my advice. I do not know if I would have. Dear reader, I hope you find some of what follows to be useful to you.


Beating my first-college-year approach is a fairly low bar. I think one better starting point than the UML diagrams and the real-world data-modeling I was taught, is something like: articulating your problem clearly, in plain language, and attempting to solve it in a 'grounds-up' manner, from first principles, using the ethic of an engineer - which I take to mean something like making reasonable-to-good use of available resources to achieve your ends.

What can we say about the ethos of a software engineer that wants to do work in this way?

Requirements and Context

Something I glossed over is the bit about 'articulating your problem clearly'. The thing is - it's always hard until you figure out what to do. Mike Acton has some good thoughts on this point. Especially on a team, the process of determining requirements - that is - the specification of the problem itself - is an iterative and gradual process. It's also hard. So, what does that iteration look like?

First - start anywhere that strikes you as reasonable. How reasonable your starting point actually is will vary with your experience in the domain at hand. Don't have a lot of experience? Fine - accept that. You can't know what you don't know, but you must know something about the problem, otherwise you wouldn't have it in your head. The only rule, is to stick to reality.

I've said to be 'reasonable', and 'reality-based', but what would it mean to ask a question about the problem that is not based in reality? Most of the kinds of questions I was asking and debating in my first-year of college would qualify.

Unfortunately, a rigorous definition evades me - but it has something to do with disputable assumptions buried inside the premise of the question. Something like:

"Should this member variable be private, public, or protected?"

Is an example of a poor question, because there are no software engineering problems that depend on this question being answered, except for those that artificially constrain you as to require it. That question might be valid to consider if you are forced to program in Java, for example. But - you can in principle solve any software engineering problem without even considering this question, because there exist programming languages without even the concept of public, private, and protected. The question would actually be incoherent to an Odin programmer, for example.

Note that you can challenge this question, like: "Should I be putting access modifiers on my variables at all?", or "Should I be using Java/whatever language for this problem?". This is another indicator to me that the question itself is not sufficiently deep or important.

Consider an alternate question, like:

"Where are my customers physically located?"

You cannot easily pick-apart this question in a similar way - imagine asking: "Well, what if my customers aren't in physical space?" or "Do we really want customers?". It seems obviously and irreducibly relevant to any software that is built to be used by people. This is a better question.

If you can solve the problem (and solve it well) without even considering the question asked, it's probably a bad question. If considering a question leads you to solving the problem better, relates to the data you will be manipulating or storing, or is otherwise essential to solving the problem, it's probably a good question.

Hard questions, unanswerables

Looking back, one 'challenge' to a question was: "Should I be using Java/whatever language for this problem?".

If you've engaged in conversation/debate with programmers about basically anything, you'll be familiar with the experience of an 'unanswerable'. In this case, I don't necessarily mean literally unanswerable, I mean that the process of answering a question consumes your resources as a company/team/individual - your time and energy. Answering some questions consumes more resources than answering others.

Being reality-based means being willing to shelf questions like this in some circumstances. Sometimes you pick a sub-optimal answer, and go forward with that, because improving your understanding is too costly, and the potential gains too small. If everyone on your team only knows how to program in Java, debating the programming language the project should be written in is probably a waste of time.

First-pass at defining your context

Let's take any known, based-in-reality information, and interrogate it. Let's assume it is a web application:

The list goes on and on. Once they are asked, to yourself and/or your team, the process of answering begins. For some of these questions, answers will come quick, and the answer will not change for a long period of time - maybe ever. For others, the answer may change on a weekly or daily basis.

Engaging in the process of answering these questions by yourself and/or as a team improves your teams conception of the working context. This improves your collective understanding of what you are all doing. Having at least some answers to these kinds of questions gives you the what in the answer to the question of what are we making?

A poor team doesn't ask such questions, or work on improving their collective understanding, or produce novel questions to ask, or challenge requirements.

So, assuming you have an understanding - even if it's tentative - about what you are making (and you are committed to continuing to interrogate your own understanding over time by asking good context-driven questions) how should you make it?

Non-pessimization

This term was used occasionally by Casey Muratori to describe a kind of optimization which I would summarize as programming in such a way that minimizes useless, wasteful, or redundant work. You can also think of it as the default, most reasonable way to program if you have the physical resources/tangibles of your system in mind - processor speed/time, memory and disk usage, network time, source-code size, etc. I consider this to be intrinsic to the ethos of a good engineer.

This is in contrast to true 'optimization' work, where one may envision a wisened grey-beard, hunched over, programming in assembly and counting clock cycles, to produce a 5% faster triangle rasterization routine than what the compiler does for him by default. This is the straw man often attacked by those unsympathetic to arguments for performance-oriented programming. The important thing about this image is that it is not what most of us have to do, or should do to become better programmers.

Many programmers (web programmers, in particular - even ones with decades of experience) write code with virtually no concern or awareness for/of the aforementioned tangibles. This is fundamentally understandable and easy to do. I'm sympathetic to that disposition. Code is barely tangible itself - it is extremely lightweight. It takes up space on your disk, but it's often a negligible amount of space from the perspective of your available disk space.

The effects of the code you write can vary. If you write some code, re-run your program, and the program gets 50 milliseconds slower, will you notice it? If you aren't vigilant, you might not! But 50ms is a astronomically large amount of time for a modern CPU. The entire universe of a AAA, multiplayer video game like Call of Duty must be simulated/updated and rendered in just over 16 milliseconds. This includes things like outputting sound and handling incoming/outgoing network packets - problems facing web applications are in some ways a sub-set of problems facing networked games.

If you do not excercise vigilance with respect to the relevant metrics for your program, you will compound losses that feel little individually (even though we see that 50ms is a lifetime for a CPU, we must admit that 50ms tends to pass fairly quickly for a human being in most - but not all - cases), but can add up to a great disaster. I must emphasize: if a single thing takes 50 milliseconds in Call of Duty - the game drops multiple frames and feels terrible to the end user - especially if these dropped frames cause them to miss a shot, or get killed in the game.

These issues crop up in all kinds of applications, not just games. It is infuriating to try and type in a text-input box where each keystroke takes 50ms to register. This is one situation where 50ms can actually feel slow to a human - which exposes one of the many fallacies in the idea of architecting for 'human time' performance, as if time is always experienced by humans in the same way, in all situations.

Navigating the menu of a 'smart' TV where each press of the little gummy remote buttons takes 50ms if you are lucky to reflect on screen is a horrible experience.

In the case of Virtual Reality (VR) applications, 50 milliseconds of delay means not surpassing 20 frames-per-second, which is not too far-off from the state-of-the-art performance of VR displays from over 3 decades ago, which Thomas Piantanida from SEGA, a forerunner in VR at the time, called the 'barfogenic zone' - 50ms in this case can mean physical illness.

If you can sometimes miss a slow-down of 50ms in a system you are developing, and this difference occasionally makes all the difference, what can we do? We've described non-pessimization, but one valid summary could be 'not doing dumb stuff' - which isn't very useful. Can we try and be more prescriptive?

Speed of light analysis

One time I was working on improving the load-time performance of a web-app for my employer. I had remarked that the framework we were using ships a whopping 1.6mb icon-font with the initial bundle, which appears regardless of whether you use it or not. A co-worker, who happened to be a founder and the CEO of the company turned his chair to face me and asked (paraphrasing): "I'm pretty bad with file-sizes, what's considered a big file?".

It's a good question.

Anyone who's done this kind bundle-size 'optimization' work probably nearly fainted when they heard 1.6mb. But why? I have tons of files on my computer much bigger than that.

The actual answer, is that a single measurement is not sufficient to indicate if something is slow, fast, big or small. Sizes and speeds are relative. Daniel Lemire (who happens to have taught a class the aforementioned CEO once attended) calls this the fallacy of absolute numbers.

We need a value to compare our measurement to. John Carmack likes to look to physics for an answer - he calls this approach 'speed of light analysis'. I like this term - I say 'speed of light' to refer to the smallest, or best, or fastest a thing could possibly be. Performance is typically thought of as being about speed, but note that any tangible could be evaluated like this - we are currently doing this with data size.

So, in the case of the icon font from above, how bad is 1.6mb, really? Well - as stated - this font ships even if you don't use any of the icons. This was the case for us. The 'speed of light' in this case is 0 megabytes, and our actual solution produces a size of 1.6 megabytes, which gives us a waste percentage of 100%.

This is unfortunately not uncommon in software today. You might expect that most engineering produced by todays software teams is <50% waste, but it's much closer to 90%, or higher. At the beginning of his course on performance-aware programming, Casey Muratori outlines what he calls the '5-multipliers', the 5 things that contribute most to software performance today. Number one on the list, is waste - which he defines as work done by the computer that did not need to be there to solve the problem at all.[^1]

You should always keep in mind what the 'speed of light' is for the system you are developing:

There is a chance that this approach sounds a bit too hardcore to you. It is true - you will never produce a solution that actualizes the speed of light for your problem. So, its utility isn't as a goal-post you are expected to hit - it's about orienting you in the right direction. You should move towards a solution that approximates it, and make note of how decisions you make impact your orientation towards this technically unattainable goal.

How do you know if you're moving in the right direction?

Measurement and statistics

You should, insofar as it is possible, render or output metric data on every single run of your program. If you are programming a real-time application, the milliseconds-per-frame measurement should be rendered continuously on your display. This should be done even when the task of the day is not strictly performance related.

This applies to your data more generally. Whatever it is that you are working on, data relating to the problem should be measured, rendered and/or outputted. There always is something, because all computers do is transform data from one form to another.

In a real-time context, consider visualizing your data with an exponentially-weighted moving average, if it can vary per-frame. Data that fluctuates per frame can be hard to reason about in real-time without some kind of smoothing like this.[^5]

You might want to also record dropped frames. A dropped frame is a frame that runs slower than your target. If you're trying to render a frame in ~16.6ms, which is the number you'll have to hit to render 60 frames a second, a dropped frame is a frame that runs slower than 16.6ms. Being aware of dropped frames helps you catch worst-case performance bugs that a moving-average for frame time might miss.

Consider outputting your data to a CSV for visualization in a spreadsheet and periodically analyzing it. Eyeball statistics isn't a thing, so actually running some statistical analysis on your data is often a good idea. The truth is that so often, many of our problems as software engineers are so pedestrian, that 'just eyeballing it' is sufficient, but you shouldn't forget that more tools are available to you. You don't have to have a particularly sophisticated understanding of statistics to do this (I don't) - simply knowing the min, max, and average of your data even just some of the time can be insightful.

Regardless of your specific problem, rendering, or visualizing your data is one fool-proof method to find bugs you didn't even know you had.

For web development, you don't have access to the innermost loop of the browser; reasoning about real-time performance becomes harder because of the number of additional layers between you and the GPU. But you can still profile your code with the web performance APIs, like performance.now().

If you bundle your code, analyze your bundle on every build. Esbuild has a pretty good visualizer, if you happen to use that. One challenge that is particular to networked applications is that the data required to run the program is not necessarily on the hard-disk of the user - stuff has to be downloaded. Sometimes continuously, sometimes mostly or all up-front. Sometimes HTTP is all that is used - connections are established, data is exchanged, and then the connection is severed - and sometimes you have a real-time component with a persistent connection, whether through Server-Sent-Events, WebRTC, or Websocket. Network speed varies, so minimizing the size of the data you send over the network is important. 200 kilobytes of data takes a full second to be sent over a typical slow 3G internet connection.

Network availability is also a concern, so minimizing the number of network requests is also worthwhile. Minimizing requests may ease the process of migrating to PWA. Caching layers, whether via Javascript objects and arrays, or the IndexedDB API, can be valuable in minimizing unnecessary requests.

It should be noted that a network request is a kind of 'outside world' piece of code; a necessary point-of-failure. You cannot garuntee that a network request will succeed. Because of this, minimizing network activity helps to improve the reliability of your system as well as its performance. A smaller number of things that can 'go wrong' also makes error handling and recovery simpler.

A similar approach is taken in well-written programs that have to deal with manual memory management - the operating system can always refuse a request to allocate memory on your behalf (perhaps the system is out of memory!), and so a strategy often taken is to make a smaller number of individually larger memory allocations up-front, and then hand out this memory to your program yourself, saving you the hassle of having to go through the operating system in most cases. In practice, memory allocation rarely fails even when you call the operating system every time, but it's still much better to approach the problem this way, as you can make your allocators much faster than the OS's with only a couple hundred lines of code.

To return to the web for a moment - Web.dev has a good series of articles on the details of web performance. My favourite article on web performance is "Making the world's fastest website, and other mistakes" by Taylor Hunt, which is as amusing as it is helpful.


Some things to consider:

Trade-offs

I believe that some people think that this approach to software development, as articulated thus-far, is too hands-on, dirty, or significantly too far along some diminishing return curve. They may imagine a kind of cost-benefit analysis of their effort in actually engaging with the problem they have (in contrast to their previous strategy of engaging with abstractions surrounding the problem) against the in their mind minimal benefit to the product they make and sell.

I expect that those of that disposition would have clicked off this article some time ago. But I should address this idea more directly - some people call this 'premature optimization' (though that's not what was meant by the authors of that quote originally), some call it 'programmer cycles' vs. 'machine cycles'. How valid are these ideas?

Firstly, some things are just better than other things. But, trade-offs do exist. In my experience, the common programmer will trade performance/other tangibles away for virtually every single other metric available (including those that are never measured). I'll use 'performance' going forward as a stand-in for any tangible of your choice.

All of the above things are actually reasonable perspectives in some circumstances, however, many programmers will make those tradeoffs nearly 100% of the time, and the loss in 'readability', 'flexibility', 'aesthetics', etc. is often hard or impossible to quantify, whereas we know all our applications are getting slower and slower - we can measure it. In a lot of cases, it is plain as day to see.

There are lots of common programming practices that exhibit this kind of sloppiness. Did you know that the Agile Software Development methodology, adopted by countless software companies, has no conclusive evidence that it helps across any axis?

There isn't enough empirical work done on things like readability, flexibility, and other virtues some programmers will mention. It's not unreasonable to think that some significant percentage of the time when someone sacrifices performance for another metric which they didn't measure, that they just getting the worst (or at least close to the worst) of both worlds.

If you find yourself making the tradeoffs above, try this exercise, taking 'readability' as an example - Maybe I don't want to write the more performant code because it's not 'readable'. Ok:

Insofar as is possible "improvements should be measured in tangibles - power, heat, dollars, etc". I would add that costs should also be measured in this way.

It's worth re-iterating that sometimes - perhaps even often - it is genuinely good to trade off these 'low-level' concerns for personal or professional time, allowing you to solve problems faster (but also solve them worse). The primary point is that there is a trend in software development that leans too far towards this, which is only in the last few years beginning to reverse.

In my view, Casey Muratori, whom I've mentioned a few times already, and the 'handmade' community which originally spun-off from his work is largely responsible for this cultural shift[^3]. It will be a wonderful day that the pendulum swings in the other direction - hopefully not too far.

Dependencies, Abstraction, and resistance to change

Aim for the minimum set of constraints that will solve the problem you have, and argue up from there.

Don't find an off the shelf solution and chop away at it until it does a decent job - diagnose the problem, and solve exactly that, and nothing else.

It takes more effort to remove cruft than to add it; It takes more work to defuse a bomb than arm one; It's easier to make a mess of your home than clean it up.

Because of these things, I think embracing a kind of conservatism/resistance to 'additive' change is useful. Anytime someone wants to substantially add a dependency or increase the day-to-day friction of development, they should have to fight for it. If you do the opposite (where, say, any engineer anytime can run npm install ... on anything they want) by the middle to the end of a large project, it will feel like swimming through molasses to get work done, and removing the molasses from the pool may take months or years

You should adopt an attitude that is hostile to waste. You should be vigilant at deleting unnecessary functions, classes, database entities, lines of code, outdated or irrelevant comments - anything and everything that is not necessary, you should try to get rid of. If your programming environment, tools, or existing code structure has made making these deletions/refactorings more difficult, then that is one sign that these things are not what you want to be using.

This doesn't mean don't ever use other people's tools or code - you just have to be discerning. Most problems only come from larger and/or dynamic dependencies. Copying code snippets, or small-ish files into your project to solve a problem you have isn't usually cause for major concern. To this end - prefer using library source code directly and not through a package manager.

Trust but verify pt. 2

Verification doesn't have to mean being an asshole. Trust your peers, but verify their claims, if the fact-of-the-matter is consequential. Sometimes things change with time, and old assumptions become no longer true, despite being memetically repeated. Verifying doesn't just mean reading popular opinion. If feasible, test it yourself.

Around the year 2018, I investigated the performance of C-style for loops vs. the array method forEach in the V8 Javascript engine. All benchmarks I found on the internet were decisive - for loops won. I made my own benchmarks, and were able to replicate the findings of various articles and jsfiddles. In 2022, I happened to re-run these benchmarks, and the tables had turned. To my surprise, forEach now out-performed for in V8 in the exact same benchmarks I had ran myself 4 years prior.

How relevant is it that you do or do not believe me? How would you verify what I am telling you?

Outro

The focus of this article has been on the ethos of a software engineer, solving their software engineering problem. But in reality, a lot of the problems we encounter are messy, human problems that relate to team size, communication, process, confidence and emotions. In a future article, I aim to discuss (in a hopefully more terse manner) an approach to communication and leadership, and these so-called 'soft skills' that are very important for engineers to have.


[^1] I've never seen someone actively take this approach, but there is a highly pessimistic variant of this approach. One could instead compare their measurements to the speed or size that their system needs to exhibit in order to not collapse under its own weight - the speed or size requirements that it has before it fails to work, and the product and its owning company itself implode. I would be interested to see if there's anyone that actually enacts this kind of a thing in their work.

I personally don't recommend it because I think this would necessarily lead to putting the worst-possible thing into the world that still technically works - and the idea of every single practical tool or piece of engineering we can buy being the worst-possible version of itself that was still commercially viable is pretty depressing.

On the other hand, there's a lot of software out there that already feels close to that to me now - maybe it wouldn't matter if - Adobe, for example - took this philosophy seriously, but I still wouldn't want it to be pervasive.

[^2] Here's a formula in pseudo-code for calculating that for milliseconds-per-frame (always prefer milliseconds-per-frame to frames-per-millisecond or frames-per-second):

// run this code every frame:
if (current_time - last_time_measured) > one_second {
    const alpha = 0.25; // alpha values close to 0 cause the rolling average to favor recency over history, reverse is true for values close to 1
    average_ms_per_frame = alpha * average_ms_per_frame + (1.0 - alpha) * (one_second_in_milliseconds/num_frames_in_last_second);
    num_frames_in_last_second = 0;
    last_time_measured = current_time;
}
num_frames_in_last_second += 1;

[^3] - https://caseymuratori.com/