The Internet is awash with questions and potential pseudo-scientific answers surround the cost of developing code. Short of looking at your bank statements, how does one - especially from afar - attempt to estimate the cost of developing software? Is it viable to even try calculating it, and is it usable for anything but humble-bragging?
In other words (so search engines can find this); what does a line of code cost?
This little piece may hopefully serve as an introduction to the caveats and gotchas surrounding this question, as well as some form of answer to it.
The most frequently (and arguably worst!) method of estimating cost is to look at Lines of Code. Somewhere, in a reddish gnomes beard, there exists a notion that it is possible to take the number of lines of code, multiply by a few bucks (Anything from $0.2 to $100 per line has been suggested), and end up with a number. And while it is true - the bit about ending up with a number - it exhibits a clear logic fallacy: That since there are N lines of code, it must have required writing N lines of code to get there. In reality, a line of code represents one and all stages of development:
- Getting the idea for the code (
- Figuring out how to construct the line (
- Writing the first iteration of the line (
- Someone is wrong on the Internet! (
- Trying the code (
- Found a bug, whoops! Better correct (
- This line is using camelCase instead of snake_case, boo (
quality assurance, compliance)
- The line is ready, ship it! (
The development will continue to advance for each step, until a set-back is hit and we return to a previous step. The cost, however, will always accumulate, even with setbacks. Where we are only tells us a fraction of where we've been, but it helps.
Depending on how you develop (single developer,in-house collaboration, online collaboration etc), where you develop (hobby, company, FLOSS foundations etc), some stages may not be there, or other additional stages may apply, but the basic assumption that each line of code represents one and all stages of development all at once still holds. Thus, a line of code may very well cost 20 cents, or is may cost a hundred bucks. We cannot tell just by looking at the number of lines, we have to examine the process that got us there.
A better (albeit still not perfect) way of measuring the development cost is to look at the recorded changes to a line; how many times a line has been changed (added, removed, modified), as that at the very least encapsulates some of the stages described above;
releasing (the degree to which these stages are represented may vary). This is also known as Hits of Code, Edited Lines of Code, Line Changes and so on. Whereas Lines of Code will tell us we have, let's say 10,000 lines of code, there may have been 500 revisions accounting for 50,000 line changes. Thus, on average, every line was revisited four times and edited to some degree. We can't tell from this metric just how big the change-set was - is it just a typo fix, or was something fundamentally wrong - but we inch our way closer to a more comprehensive estimate of cost: Ten fresh lines of code are, ceteris paribus, cheaper than ten lines that have been modified five times over. So far, so good...but we also need to address how we came about making the changes we made.
An auxiliary bit of data that will help us in establishing the cost of code is the collaboration methods and constraints used, if any. If you're working on something all by yourself, you likely do not need to factor in additional costs (in fact, you should probably subtract some from your total), but if you work at a company or for a FLOSS foundation, there will be things in the development cycle that accrue additional costs to the development:
- Who decides what goes into the code-base?
- How is it decided? Synchronous or asynchronous decision making?
- How fast is the feedback loop?
- How is quality, compliance and legality assured?
- How is the software shipped?
As collaboration in project grows and asynchronous constraints enter the development, cost increases but tapers off as the constraints are learned.
Typically, development speed, from fastest to slowest, is:
hobby -> company -> floss org, with FLOSS organizations being noticeably slower and more costly in terms of person-hours. The two primary factors is the asynchronous decision making as well as FLOSS-stamping, that is to say, the quality assurance and brand-name quality that comes with developing under the umbrella of an open source organization. Thus, the cost is higher, but justified in the end by a greater guarantee that what is being produced is proper and passes all (or most) checks. Companies can be placed anywhere within the speed/cost pace space between hobby projects and FLOSS orgs, it will vary from company to company, but are generally speaking still cheaper/faster, as the chain-of-command is usually visibly clearer and communication constraints are more relaxed.
Another very important factor that comes into play is experience, both in terms of code development in general, but also knowledge of the specific software and its libraries/designs, as well as how turnover ratios affect the overall experience levels over time. We should factor in at least the following:
- How well-versed are people in the language/system being used?
- How experienced are they in the specific problem we are solving?
- How does turnover, if such exists, affect the overall experience levels over time?
- What is the (estimated) relation between these and the cost of development?
There are also intrinsic human risk factors to consider, such as tacit knowledge being lost when people leave a software development project, which may offset the cost by more than what was originally thought.
In this illustration, the overall hourly cost decreases as knowledge is gained, and increases slightly with every new person joining, only to skyrocket when the original author leaves, leaving the rest clueless about certain aspects of the code, thus increasing cost even more.
My friends know this very well; I like to throw equations around. So let's make one for code development costs!
We'll assume the following to be true:
- The cost of a line of code (or a revision thereof) depends on the complexity of the line and and the complexity of the change.
- Lines of Code and revision complexity is directly affected by experience levels
- Constraints are affected by experience levels, but the speedup gained tapers off as some constraints cannot be completely removed.
- The various stages of development happen at different speeds (and thus have different costs)
Thus, to calculate the cost, in person-hours, for
i revisions, each containing a number of changed (added, removed, modified) lines of code, the total cost of development could be written as:
Naturally, you can't just put this into your calculator and get a result. The intention is to illustrate the numerous factors and the complexity involved in calculating the cost of development. It would, of course, be very interesting to see some of these variables be replaced with actual figures, even for a select project or two, and estimate the applicability of this towards other projects in general. Let's, for now, leave that as an exercise for the reader.
There are a handful of existing models that can be used to quickly get a very rough estimation of what code development may have accrued in terms of potential financial cost. The most notable ones are COCOMO, COSYSMO, COQUALMO and the revised COCOMO II model. They provide some measure of repeatability and uniformity, and can as such be used to compare across projects in very general terms. If you have bank statements from your employer (in the case of a company software project), I would still advise you to use those instead, as the CO*MO models make many assumptions and have a lot of knobs and dials that most people would not know what are for.
As part of our work with Snoot and later contributions to Apache Kibble, we've done extensive modelling and calculations using these models at Quenda, as well as the model described in the previous chapter, and the results for mature code-bases (10+ years on average) generally vary between 4 and 25 person-minutes per line modification, or between 15 and 90 person-minutes per line of current code for functional code at certified FLOSS organizations, depending on language and complexity of code (we did not count web sites and other such elements, but we did count documentation and comments). These calculations include most of the aspects mentioned in this article, as well as some that we haven't gotten around to (legal, provenance, infrastructure, branding etc). For companies, this cost would likely be halved, and for hobbyists or consultants the cost could go down even further. Some will suggest 10 lines of code is enough per day, some will say 500. We will say; it depends on code complexity, expertise, development stages and extrinsic constraints.
We hope this introduction to code development costs has piqued your interest in exploring the matter further, and hopefully dissuaded you from using a pocket calculator and back-of-the-napkin math for estimating development costs. While it is possible to extrapolate on larger projects from smaller but similar ones, we ask that you at least consider the following variables when estimating costs:
- The development stage (how often have we come back to this problem/function?)
- The development complexity (what language, problem, and level of refinement are we working with?)
- The developer experience (How knowledgeable are we in this field? What are barriers to knowledge?)
- The development constraints (Is this sync/async development, what needs to happen before we can ship the code? How fast is the feedback cycle?)
- The risk factors and turnover (What could happen that would be considered a set-back?)
With this data in hand, you'll hopefully be better equipped to make estimations that aren't complete garbage.
Conversely, as a wrap-up segue to the footnotes, this little article took around six hours to make, and consists of 235 lines, if you go by a strict 70 character limit. So that's one finished line per 90 seconds.
- Economists love to say this. It roughly translates as "all things being equal" and shields us from unknown variables :)