written on 03/13/2021
In my years as a software engineer, I have probably looked at hundreds of codebases. Too many to count. I struggled a lot with understanding where the relevant code is most of the time. Normally, asking for help what I should look for and guidance in tickets will bring me forward. Slowly and surely I will understand what the code is doing. And you will too. Some people are better at this and some people will be slow. No shame. Most code is complex. But I found a simple tool that will make it easier for you. It is called code-complexity and you can use it as the following code snippet shows:
1npx code-complexity . --limit 20 --sort ratio2# You can also use --filter '**/*.js' to use glob patterns to filter files
It will return an output like the following:
This will show the biggest and most changed files. The likelihood that these files are crucial for understanding the application is quite high. Read through them and understand them. What this data means in detail will be explained in this blog article now.
In this chapter, I will explain to you the concepts of complexity and churn when it comes to code. It is the baseline to understand the technique we are using here to improve your understanding of a codebase.
Complexity can be defined in different ways. The level of nesting of functions is normally used as a measurement of how complex code is. Code with small functions and composed behavior is normally more readable and easy to understand. So we could say that complex code also consists of a few functions that are far nested and it is mostly true. Nesting is hard to track though so we could find another metric somehow.
With long functions normally there comes large files as well. People tend to put everything into one file if they also put a lot into one function. So in theory we could take the lines of code as a measurement as well. There are a lot of utility packages out there that solve this problem. One of these tools is called sloc. It will output the number of lines of code within a file. But do not use it directly. The tool I mentioned before includes this by default.
So in conclusion we can say complex files are either super nested or super long. One of these things normally comes with the other so that’s great to hear because analyzing the length of a file tends to be easier than nesting.
Churn is a bit more complicated to explain. But let us start somewhere. A churned file is a file that has a lot of changes. But what does this mean?
A lot of changes to a file happens when yeah, a lot of people have changed the file. But how can someone measure that? The git history is telling us how often a file was checked in. So we can make sure with that how likely a file is to be changed. Normally this means files of this type are the main point of the application. A problem that occurs though is that often there are configuration files included here, but you can simply exclude them for this analysis.
Now, after learning what complexity and churn mean, we can focus on the combination of them. Files that normally charge a lot but are also really complex should be normally refactored. And most of the time, with that, it is natural that these files might be the core of the application. The basic logic is written in them directly or in files related to that. So let us check how we can analyze that further.
My technique to check the files in detail is quite simple. I first look over the file and check what the exported functions are called. Ideally, I write them down. Internal functions are firstly not important to understand. Once I have an overview of all the exported functions I foremost check if there are any unit tests. If the functions have parameters as well, then I will try to write them down as well. With TypeScript or Flow types, this gets, even more, easier to get an overall feeling of the structure.
Unit tests are a good first approach to see how the functions are working. To understand functions you probably just need to look at the input, the function name, and what it is returning. In most of the cases, types even support you with that, and unit tests will show you edge cases for the function and how it can be used. So that’s mostly enough to understand the function. At least if you know the programming language. If you want to get deeper into the function feel free to, but you do not have to do that. Why? Explained in the next chapter.
Understanding a function in detail can be important. But during onboarding, a lot of other things are more important. You will not be able to understand every bit of the application within a short time frame, but understanding the core parts should give you a track of where the core logic of the application is executed.
With that knowledge, you can jump into the first issues for you. Ideally, the team has prepared smaller tasks in the codebase to give you a nice onboarding experience. If that is not the case, ask your manager or senior engineers in your team if any of the current issues is suitable for you. Make sure to transmit your gained knowledge of the codebase though so they understand your knowledge level.
A good idea for the first issue is also to do pair programming with other software engineers from the team. Make sure to tell them that you want to type mostly and they should be more of supervisors so you learn how to navigate the codebase by yourself. Because of that guided onboarding or easier tickets, you do not have to jump into details. The details of the code will be discovered now during the implementation phase of fixing bugs or adding features. The more tickets you will do the more you learn about the codebase in detail. But look back at churn and complexity because it can change over time.
Having to work on the code base now will also involve another bigger thing: Debugging. With your first tasks, you will probably learn already how to run the application locally, run unit tests, and integration or E2E tests if these exist. These become vital once you implement the feature because adding tests will make sure your application is working as expected. Often these tests cover a lot of code though and are kind of abstract. In these cases, you have to learn to debug your code. Because most of the tests are being run in a Node.js environment we will have a quick peek into how to debug Node.js-based applications. Most engineers use
debugger keyword, nevertheless, it is a bit tricky to run your test suite and have a nice debugger experience because within Node.js it is a bit difficult to spawn a browser instance’s developer tools and connect it to the program. Another option would be to use your IDE or Editor to connect a debugger supported by your coding user interface. For example, Visual Studio Code supports debugging Node.js applications directly in the IDE. A guide on how "Node.js debugging in VS Code" can be found here.
Debugging is an art in itself. You should get comfortable using breakpoints and what the debugging functions "step over" and "step into" mean. These will be extremely helpful when debugging nested functions.
In this chapter, I will go through some codebases with this technique to explain where the main core of the application is and how the process mentioned above can help you to get familiar with the code base quicker.
The first step, of course, is to clone the repository to a local folder and then run:
1npx code-complexity . --limit 20 --sort ratio
This will output the following table:
As you can see, there are a lot of unrelated files that could be filtered out like the compiled folder but for an initial analysis, this is enough.
We can see multiple directories being important here:
If we get a task we would at least know already where to look for related code. Initially, I would try to understand the
packages/core files to understand what they are doing. Understand the tests if they exist and then you should have a good grasp of what Blitz is doing.
React.js is a frontend framework that almost every web developer knows by now. What most people do not know is how the codebase is structured and what are the core parts. So let us have a look at it.
1npx code-complexity . --limit 20 --sort ratio
Running the command will lead to the following result:
What we can see here is that two sub-packages are probably the most interesting to understand:
Understanding React Fiber and how react-dom's partial renderer is working will give you a good idea about React's architecture. A good thing about the code within React is that it is well documented with comments even though it is complex at first.
Venom is a library to interact with Whatsapp. You can send messages via this library and do many more things. It is a bit more practical because on such applications you will work mostly in your day-to-day job. So let us run our usual command:
1npx code-complexity . --limit 20 --sort ratio
What we can see here is that there are these directories which are from importance:
As we can see from the
src/lib directory, the files included are automatically generated. Ideally, we can filter them out but for now, let us look at the other files.
We can see that
src/api/layers/retriever.layer.ts are not complex but have a lot of changes. So every time a feature is added or deleted these files are touched. These are the core files of the application and understanding them will give you a good grasp of how the codebase is structured and what you should focus on.
This technique of analyzing a codebase originally came from a book that handles refactoring large codebases via a process: Software Design X-Rays by Adam Tornhill. It is a great book and teaches you a lot of ways to structure your code and what parts are worth refactoring. A great book. I think every software engineer should have read it at some point because it will help them to understand a codebase differently. With working on a project, people will get familiar with different parts of the software and of course, they will have their special "area" of code where they are super comfortable. If this code is good and understandable is another question though, that this book tries to answer.
Based on the refactoring efforts, we can also use the knowledge to see which parts of the application are important. Hopefully, I explained this in this blog article to you.
With that, you can analyze a software project as well and come to the same conclusions as mentioned in the blog article.
You might also like
The Work Experience in a Software Engineer Resume
The work experience in your resume is the part with the most impact to get hired. Writing this section perfectly will lead to more responses and interviews. See these tips to get to the next level.
Let your Projects section shine
Project section on top of your experience or below in your resume? These and many other questions regarding your projects section will be explained in this blog post. Improve your CV with these simple tips that helped me to get interviews at the biggest tech companies in the world.