Measuring Software Development
By CSONGOR FAGYAL
There are general concerns about the measurability of software development. The reason behind these concerns are usually doubts. Doubt about validity, meaningfulness, or even existence of KPIs that should be used . In this post we will address these doubts. But before doing that, let’s start by explaining what Gitential does.
The Idea Behind Gitential
Software Development is in lack of control and predictability.
- Despite all the efforts in the IT industry, measuring software development is still challenging. There are no standard KPI-s, it is hard to overview the development team’s work and without a transparent process, it is hard to optimize it
- Evaluations are humanly done, which is time consuming, subjective and incomplete.
Software Development is a process and it needs to be optimized.
- The software development process is in the centre of business. The revenue of companies depends largely on how efficient this process is.
- Measuring is inevitable not to risk gut feeling and opinion-base decisions. Optimization to be effective, has to rely on objective and data-driven insights.
- It is not our goal to evaluate software as a finished product. Whether it fits the market or not, whether the best architecture was used or not, and so on. Finished software is state. We at Gitential are interested in the process before reaching that state. We help you analyze if you are doing everything right throughout the process. Also, Gitential provides actionable insights for you to improve.
- Since we are working with a process that is quite complex, and that can vary from project to project and team to team, a few absolute metrics cannot simply tell whether the process is good or bad. Instead, our goal is to measure as much as we can, and display as many possible correlations as possible, so that you can find what is relevant to you in a given situation and what you can tune.
- In software development, time is the biggest cost thus the most important thing to measure is how that time is used during the project. In other words: who does what and when. We have found that in most development projects there is way too little objective information available about this very important thing.
- In the end, software itself is nothing more but the source code. Thus, to be objective measurements must be based on the changes of the source, so the only thing we use in our measurements is the evaluation of source code changes in repositories.
- Coding is a creative process, not just punching code. The days of a developer are inherently hectic. It has parts such as research, attending meetings, debugging, solving technical issues, and so forth. This means you cannot (and should not) compare one hour, or even one day of a developer to another, and expect meaningful data. However, statistically – when the time window examined is long enough – anomalies diminish and data becomes meaningful.
- Besides time, there should be repeatable, consistent measurements that can assess the quantity and quality of source code created. While simply the lines of code written can be misleading, our research has shown that using proper analysis that takes into consideration complexity, verbosity, churn, and other source code properties, it is now possible to quantitatively measure the normalized code volume created by developers.
Addressing Common Doubts
“Software development is so complex that makes it unmeasurable.”
This is just a myth. If we identify what we can (and should) measure, what correlations tell the most important stories, using the latest technology and the results of the research we have done, many aspects of development become much more transparent. You just have to ask the right questions. Try Gitential, and you will see!
“Software development is a form of art. How do you measure art?”
We believe this is a mystification of software development, usually voiced by developers themselves. Gitential is created by developers, so we get it: programming is a creative process that is hard to understand, compare, and analyze. However, complexity itself does not make the result an art. Evaluation of art is subjective that depends on time, personal taste, and so on – software, on the other hand, fulfills a specification that can be tested and validated. It has an associated time and cost. These are very exact things.
Also note: you do not measure art, you evaluate it. On the other hand, Gitential does not evaluate software as a product: instead it measures the process through which it was created. Our goal is to optimize this process.
“To analyze how complex the work is that a developer does, the software should understand the source code – what the software does – and that is impossible.”
That is just a false preposition. Many types of analysis are possible without actually “understanding” what the code or the project is doing: for example, just by looking at the indentation of code, the complexity – how easy to understand or write – can be detected (this is called whitespace complexity). Parts of the code that are often refactored can be found (hotspot detection). The code can be tokenized and the number of operations detected – and so on.
“Writing one line of HTML is not worth as much as writing one line of Java, so how do you compare a developer programming in language X to another one that is writing in language Y?”
First of all: you don’t. Why would you do that? When building a house, you do not compare the length of pipes a plumber builds to the square feet of the area a painter paints. So you should compare people who do similar tasks: compare one HTML developer to another one, one UI team to another one, a past period of the project to the current period, or even to an industrial average. Use measurements relative to your project, repository or team.
Also, when assessing someone, use more meaningful metrics, such as time spent on the project, churn, productive code written, etc.
It is kind of possible to compare the work of developers by factoring in complexity, language verbosity, and so forth – also see the paragraph above – but maybe it’s better if you don’t base your assessment of someone simply on the lines of code written. That is just one metric that does not tell the whole story.
“You analyze source code changes, but we very rarely commit to the source code repository. Will that distort the measurements?”
Yes, it probably will. We do some heuristics to handle this case, but the truth is that Gitential has already identified a problem with your development: your commit frequency is not following best practices. Please fix that. 🙂
“Sometimes a trivial problem, for example finding a typo can take hours. As a developer, I feel like I worked hard, yet the system detects I did little work because of this.”
The answer is in the question: “sometimes”. This happens with every developer: sometimes we stumble upon small issues that take way too much time to solve. We make mistakes, we learn, we find bugs. However, that is not something that happens always. Statistically these anomalies disappear. (Once again: use big enough time frames when evaluating.) If a developer spends most of his or her time on finding typos – we are sorry but that is a problem, it should be addressed, and we will detect it.
“How do you measure the time I spend on coding? You are not sitting beside me, you only see my source code.”
You are right that in a sense that the exact time spent on coding could only be measured by looking at what you are doing in every second. However, we did spend a great deal of time on creating heuristics that can estimate the time used for coding based on commit sizes, commit frequency, and so on. Try Gitential, and you will see it is fairly adequate. If you find it’s not, let us know and we’ll fix it!
For optimization, this number should not be exact in every case, but statistically it should be reliable and consistent – and it is.
“Programming is not just coding. There are meetings, communication, research… How do you measure and evaluate that time?”
We don’t. What we can do though, is let you enter the time that is spent on a project in total, and compare that with the estimated coding time. This will also show you the overhead time that is not spent on coding, which is an important metric.
“Our projects are so different: one is a web service in Java for a client, the other is an internal project using embedded C. It seems like these cannot be compared. How could I use Gitential here?”
Once again: products can be different, but the processes cannot. The goal of measuring software development is to have a tool for optimization. If you still want to compare trends, time usage, productivity, etc., these are project (and language) agnostic metrics. Or just compare similar projects, teams, and workflows.
“Will you say a developer is ‘worse’ simply by detecting that they produce less code than others? Maybe they just have a harder task.”
We are not saying things like good or bad, worse or better. We only show what we measure. The conclusions will be drawn by the evaluator: you. We only give you a tool to find possible problems.
It’s also important to consider that while this case (harder task) is possible, you should consider which is more likely: the developers are given a harder task, or they are less competent? (Find this out by analyzing other work done by the same developers.)
“Similarly to the previous question: what if a developer is writing smarter, denser code? Do you detect less work in that case? That would not be fair.”
First of all: denser code can be detected and reflected in code volume (which is not the same as lines of code).
Also see the previous paragraph: we are not saying somebody is better or worse than others based on just one metric. This is not how you should evaluate your developers, either. Look at trends, other metrics, and all aspects of measurements.
“You speak a lot about this ‘process’ and its ‘optimization’. But what does that really mean?”
The process is the creation of software, in a project by a team of developers, in the most efficient way: basically on time and on budget.
To achieve this optimal process, there are many things that should work: you should have competence; proper staffing; sharing of workload; fast and efficient communication; proper planning; test coverage; periodic but not overwhelming refactoring; teamwork; common code ownership; low bus factor; quality and maintainable code, etc.
The problem is that it is very hard to take care of all of these factors, especially when a team is overloaded with technical problems, and is trying to just__work. Things are rarely optimal, and the process can break here and there. As a result, problems are hidden and are often only identified when they get escalated, causing cost overruns, delays and whatnot.
It is also hard to optimize the process because each problem and each team is slightly different. Gitential is helping with this by measuring as much as possible so that you can see what is happening in real time and detect if something would go astray before it actually happens. By measuring everything in real time, you can easily detect results of your actions to see whether you have successfully fixed a problem, or just caused another one.
In conclusion, if you entertain the idea of introducing Gitential, or in general the measurement of development of your project, consider the following:
Focus on what you want to achieve. Measurement is not for itself, it has a goal: the optimization of the development process that is achieved through transparency.
Do not be obsessed by the differences between projects, developers, and languages: the process is the same and most differences can be resolved by the algorithms Gitential uses.
Optimization is local: you want to make a team, a project or even a single developer better.
Everyday anomalies factor out statistically. Do not worry about spikes during the process: those are natural and will not hurt your overall measurements.
Ask the right questions. Before trying to measure something (that you might think is unmeasurable), first consider why you want to measure.
Do not forget that measurements are just the basis for analysis. It is up to you to see the correlations and make the decisions and adjustments. You work with your developers, so you have information that measurements might not reflect.