Are You Reading Code Like How to Learn a Natural Language?

Is learning programming perhaps the same as learning a new "natural" language?

When I began to learn Ruby on Rails Tutorial 5 years ago, there were a lot of unfamiliar words which other unfamiliar words explained. This was the first time to learn programming since I learned C language 15 years ago in a training course. I was confused soon. Because an unfamiliar word yields new unfamiliar words, it seemed as if an unlimited loop happened. It was too troublesome to search Google or Wikipedia for them and memorize them all the time I learned the tutorial.

As a countermeasure, I made some kind of a dictionary or a pretend glossary by copying and pasting the sentences including unfamiliar words. If the unfamiliar words appeared again, I looked for them in my pretend glossary without searching Google or Wikipedia again. Though the definition of an unfamiliar word was not strictly clear in this glossary, the context such as what situation the word was used in or the relationshop with other words was acceptable enough to follow the tutorial.

This approach can be considered as looking up a word in a dictionary and writing or picking up a sample sentence with the word. Is it similar to the way to learn a native language or a foreign language in school? I should have learned a tutorial about a framework written in a programming language.

At the time, I had a question through this experience. "Is what we learn programming almost the same as a language learning?"

Basic grammer and vocabularies

After learning Ruby on Rails Tutorial, I got skilled by reading textbooks and an official documentation for Ruby. Many of textbooks about programming especially for beginners explain one programming language grammar. On the other hand, in an official documentation for Ruby such as class Array, there are a lot of methods which it explains arguments, returns and the ways to use. This documentation is a list of vocabularies in Ruby language. Surely, I had learned Ruby language like how to learn a foreign language as a beginner. From basic grammar and vocabularies.

Should we spend more time reading code?

Next step to master programming language is code reading

In my experience for learning English as a foreign language, the next step after basic grammar and vocabularies is reading. Therefore, I thought I should focus on code reading in terms of programming.

I tried to read the source code of feedbin which was also known as Rails app. As reading the code, it turned easier for me to understand them. The speed of code reading was gradually up as a result of good understanding. There were also new vocabularies which feedbin originally defined as methods.

After reading about 40 % of the source code in feedbin, I realized code reading enhanced me as I assumed. It was helpful for me to feel much more confident in my programming skill. Together with this side effect, I had written code better and faster than before. Code reading must be a wonderful way to master programming!

How is coding? Isn't it important?

Of course, it is no doubt that coding is very important to master programming, though. Many programmers say "Just write code a lot!" or "Write code every day.". On the other hand, few insist on code reading. There is also very few textbooks about code reading. For example, "The Art of Readable Code" is a great book, but it is not for code reading but for coding. I'm seeking a book about code reading continuously.

So, I want to insist that we should spend more time reading code as well as coding. In the process of mastering programming, we should learn a programming language the same way as to learn a natural language like native and/or foreign.

Let's read code more!

To be a better programmer than yesterday, I decided to read the whole source code of a famous OSS. I picked Gatsby up because Gatsby surprised me with its speed and is my favorite static site generator. This blog is made by Gatsby. And it is also helpful to understand the philosophy of Jamstack deeply through this code reading project.

I expect to understand a Gatsby internal system or software architecture which an official document uncovers and get more skilled about software design. Something new may be discovered beyond imagination by reading a huge amount of code.

Additinonally, this project is also a big and curious challenge for me. Can I complete reading Gatsby? What troubles will I face? What tips should I know to solve the troubles in code reading? And the first question about this project is whether reading the "whole" source code of an OSS is instructive or not. Is it enough to read some parts of code?

7 day's progress in reading a huge amount of the source code

Measure the number of lines

Before reading code, I measured the scale of the whole source code in Gatsby repository. All the code in Gatsby repository had over 400k lines. Because plugins accounted for most of the source code, I began to read the package "gatsby". This package had 91,006 lines in the version 3.2.0-next.0.

Read official documents

In my experience, the speed of code reading is faster if I understand a whole view of the source code or a summary. Fortunately, Gatsby has rich official documents. So, I read both Reference Guides and Conceptual Guides.

It took 26 hours to read all the 34 articles in both guides. However, you may think of 26 hours as a waste of time just to read documents.

If 26 hours' document reading boosts my code reading speed by more than 50 %, for example from 80 lines per hour to 120 lines per hour, I can reach the end of the 91,006 lines much earlier than before reading official documents. Even if you can read at 1,000 lines per hour, it will take 91 hours to finish. If up to 1,500 lines per hour, 60 hours. In such a case too, it is helpful to read official documents and understand a summary if OSS has enough documents.

How was the progress for 7 days?

Finally, I began to read the code of Gatsby. Now 7 days passed. The progress is below.

  • The number of lines I read is 1,747. It is only 1.9% of the package "gatsby".
  • The average speed of code reading was 106 lines per hour.

It will take 842 hours to finish reading the package "gatsby" if calculate backward from this progress. Of course, the speed of code reading will be faster as I understand Gatsby deeper, so the time until finishing should be shorter than 842 hours. But, I'm not sure that I can reach the end of 91,006 lines after I faced this data. Get into trouble!

New tips and insights about what code reading is and how we read code

7 days' trial tells me that it is very difficult to finish reading the whole source code of the package "gatsby" which is a core of Gatsby.

By the way, you may think of what "reading code" means. No one accepts opening a file and scanning all the code with their eyes at the moment as "reading code". What on earth is "reading code"? What change should happen to us after reading code? Or what proves our achievement for code reading?

Through 7 days' trial, there are new tips and insights about what code reading is and how we read code.

Up to the present, my standard for reading code is what I can make a list of functions in a file and add explanations that solve ends and means of the code with both arguments and returns.

The list below is an example of a function list I call.

function or method explanation
createFileContentHash Receiving strings as parts of file paths, create and update a hash for each file if a file is updated, and then return hashes.
createPluginId Receiving strings as name of the plugin, create a plugin id with createNodeId.
resolvePlugin Receiving strings as name of the plugin and the root path, resolve the root path of the plugin , and then return the resolved path, plugin name, id and hashes.
loadPlugins Receiving strings as root path of the plugin, resolve paths and other data with processPlugin and then return an array including internal plugins, the plugins in the gatsby-config.js, the plugins both "gatsby-plugin-page-creator" and "gatsby-plugin-typescript".
processPlugin Receiving strings as absolute root path of the plugin, detect subplugins recursively and resolve the paths with resolvePlugin, and then return paths, id and hashes with pluginOptions.

In my opinion, such a function list can prove that I have read code about these functions even if I forget them in detail.

As to my project, reading all the code of the package "gatsby" can be put this way that makes function lists of all the files in the package and fills all the "explanation" columns. The job is a huge burden because of 91,006 lines...

If this trial until now were a feasibility study, I would have to judge it as failure. The fact reminds me again that the progress for 7 days was 1.9 %.

Should I give it up here? Or...

Develop a booster so as to read a huge amount of code!

It is vital for me to boost the speed of code reading because the road to the end of 91,006 lines is too long. Therefore, I decided to develop a tool for code reading as a booster.

As I say above, I believe code reading is mandatory to get skilled about programming. The tool I began to develop is not only for support of this code reading project, but also for support of my usual development.

To develop a tool for code reading, what function do I need to boost the speed?

I usually read and write code with Atom. Sometimes Vim. Have used VS Code. These are very useful, but text editors are tailored for coding rather than code reading disappointingly. It is true that most of text editors have functions to support code reading like syntax highlighting, linter and split panes. Is it also helpful to highlight a couple of bracket? Nevertheless I'm uncomfortable with these text editors while reading a huge amount of code.

A first trial is below. This function paints background of a function definition. If this function works when open a file, I can recognize the whole composition of the code in the file at a glance. It is also useful to list functions up. And it reduces the pressure of much code because I can realize that most of function definition is not so large and therefore easy to read.

automated coloring function

An additional function is drag and drop (TBD). A chunk of code can be moved and gathered in a place. This function enables me to read a series of code step by step. It is easy for me to understand code if I read them as following sequence.

drag & drop function

I will use this tool through reading Gatsby and continue to develop it. If I find new tips about code reading, I will try to implement a new function for them. I'm looking forward to building an app with their functions for code reading.

Read the whole source code of a huge OSS

My code reading project is "Read the whole source code of Gatsby". It is just a beginning of the road to the end of 91,007 lines now. This challenging project is based on an assumption that learning programming is the same as learning a new "natural" language and we should spend more time reading code as well as coding.

I will achieve this project by reading a huge amount of code, studying new tips for code reading and developing a tool which helps me read or understand code fast and comfortably.