Gatsby Initialization ( 1/2 ) | Look Through Code Reading

I began reading the source code of Gatsby as a project. The goal is to finish to read the whole code of Gatsby, exactly speaking the package gatsby. Here is my experience and understanding through reading the function initialize in Gatsby.

The sequence from the command line interface gatsby build to the function initialize

Gatsby is basically a static site generator. The core function is to build static sites including html and css files. So, at first I decided to read the source code of the command line interface gatsby build as a starting point.

The command line interface gatsby build is defined in the package gatsby-cli, not gatsby. But the process of build itself is defined as the function build in /src/commands/build.ts of the package gatsby.

The function build invokes the function bootstrap asynchronously after setting some traces. According to the official document about build process, there are two phases in the Gatsby build process. First is the bootstrap process. Second is the build process. Therefore, the function bootstrap is inferred to be a core function in the bootstrap process.

In /src/bootstrap/index.ts, bootstrap simply invokes 13 functions in order. The first function invoked is initialize.

The sequence to reach the function initialize is below.

gatsby build
-> /src/commands/build.ts : build
  -> /src/bootstrap/index.ts : bootstrap
    -> /src/services/initialize.ts : initialize

The function initialize has 7 steps divided by activityTimer

The function initialize has 540 lines. Because it is a huge function, I scanned the whole code of initialize before reading line by line. Then, I found activityTimer divided the whole code into 7 sections. For example, the third section is below.

activity = reporter.activityTimer(`onPreInit`, {
  parentSpan,
})
activity.start()
await apiRunnerNode(`onPreInit`, { parentSpan: activity.span })
activity.end()

No matter what activityTimer was, it was helpful at the time to understand that the process of the function initialize was composed of 7 steps. And the string as a first argument of activityTimer turned out to correspond to the build logs. The string "onPreInit" of the third section located on the third line in the build logs below.

success open and validate gatsby-configs - 0.062 s
success load plugins - 0.915 s
success onPreInit - 0.021 s
success delete html and css files from previous builds - 0.030 s
success initialize cache - 0.034 s
success copy gatsby files - 0.099 s
success onPreBootstrap - 0.034 s
success source and transform nodes - 0.121 s
success Add explicit types - 0.025 s
success Add inferred types - 0.144 s
success Processing types - 0.110 s
success building schema - 0.365 s
success createPages - 0.016 s
success createPagesStatefully - 0.079 s
success onPreExtractQueries - 0.025 s
success update schema - 0.041 s
success extract queries from components - 0.333 s
success write out requires - 0.020 s
success write out redirect data - 0.019 s
success Build manifest and related icons - 0.141 s
success onPostBootstrap - 0.164 s
⠀
info bootstrap finished - 6.932 s
⠀
success run static queries - 0.166 s — 3/3 20.90 queries/second
success Generating image thumbnails — 6/6 - 1.059 s
success Building production JavaScript and CSS bundles - 8.050 s
success Rewriting compilation hashes - 0.021 s
success run page queries - 0.034 s — 4/4 441.23 queries/second
success Building static HTML for pages - 0.852 s — 4/4 23.89 pages/second
info Done building in 16.143999152 sec

In the build logs, the bootstrap process has 21 steps from "open and validate gatsby-configs" to "onPostBootstrap". The build process has 6 steps from "run static queries" to "Building static HTML for pages".

Checking the strings of the arguments against the above build logs, initialize covered the first 7 steps below.

success open and validate gatsby-configs - 0.062 s
success load plugins - 0.915 s
success onPreInit - 0.021 s
success delete html and css files from previous builds - 0.030 s
success initialize cache - 0.034 s
success copy gatsby files - 0.099 s
success onPreBootstrap - 0.034 s

So, the function initialize accounts for one third of the bootstrap process. Overview of these 7 steps can be inferred from the log messages, and the official document explains each step briefly.

These 7 steps are literally initialization for Gatsby as a static site generator before building html and css files.

Next, I read the code of initialize in detail and found the core functions of each steps as a clue of understanding.

( 1 / 7 ) open and validate gatsby-configs

The first step "open and validate gatsby-configs" had 4 processes. As the log message said, the core process was to open and validate gatsby-config.js files. The core function of this process was getConfigFile.

In many cases, each process has a core function to carry out its task. The list below is a summary and a core function if exits.

process summary core function
1 Open and validate gatsby-config.js files getConfigFile
2 Get flags properties and set them to environment variables handleFlags
3 Enable loading indicator of "query on demand" -
4 Load Themes loadThemes

The function getConfigFile does not only open gatsby-config.js files, but also valites them with some aspects below.

  • Does a gatsby-config.js in the root of a user's Gatsby project exit?
  • Is the file name the same as "gatsby-config.js"?

The latter validation has a tip about typo. To detect typo of the file name, Gatsby measures Levenshtein distance with the library fastest-levenshtein and reports an error about typo. I think it is a user-friendly message and a nice tip because a user can modify the file name immediately.

By the way, the log messages and the official document doesn't explain the processes except the first. Code reading can find some secrets like these.

As of the last 3 processes, it was convenient to find the offical documents by search. Flags were introduced in v2.28 to handle environment variables in gatsby-config.js. Query on Demand was the feature like lazy loading in development. Themes are plugins than can improve Gatsby starters by separating Gatsby sites and updating themselves.

Knowledge often helps us understand code

When I didn't understand code about these features, I found the offical documents above. The knowledge or background about them supported me to understand the code well. The point of this tip is how early we can notice the lack of the knowledge behind the code. So far, I recommend to search official documents just after you don't understand code. You can find a detail explanation or a release note in an official site.

( 2 / 7 ) load plugins

The second step is loading plugins. The core function is literally loadPlugins.

step summary core function
1 load plugins and validate options and Gatsby APIs loadPlugins

inside loadPlugins function

Though this step just invokes the function loadPlugins in the code of the function initialize, loadPlugins can be divided into 8 processes. Each process has a core function. The list is below.

process summary core function
1 create an array including all the plugins normalizeConfig
2 validate options of all the plugins validateConfigPluginsOptions
3 load browser API, node API, SSR API from files getAPI
4 load plugins in gatsby-config, internal plugins and other plugins loadPluginsInternal
5 flatten plugins with nests into an array flattenPlugins
6 identify which APIs each plugin exports collatePluginAPIs
7 distinguish types of bad exports API and report errors handleBadExports
8 detect multiple replaceRenderers and report warnings handleMultipleReplaceRenderers

The core process above is to load plugins with the function loadPluginsInternal. Before loadPluginsInternal, Gatsby validates options of plugins and loads Gatsby APIs. After loadPluginsInternal, bad exports APIs are removed.

"load plugins" means the data about plugins are transformed into IPluginInfo type (below) and the array plugin has all these IPluginInfo objects.

export interface IPluginInfo {
  /** Unique ID describing a plugin */
  id: string

  /** The absolute path to the plugin */
  resolve: string

  /** The plugin name */
  name: string

  /** The plugin version (can be content hash) */
  version: string

  /** Options passed to the plugin */
  pluginOptions?: IPluginInfoOptions
}

The loaded plugins are below.

  • plugins in gatsby-config.js
  • dev-404-page, load-babel-config, internal-data-bridge, prod-404, webpack-theme-component-shadowing, bundle-optimisations (implemened in the package gatsby)
  • gatsby-plugin-page-creator
  • gatsby-plugin-typescript
  • default-site-plugin

default-site-plugin is composed of gatsby-x files in the root of a user's Gatsby project like gatsby-node.js or gatsby-browser.js.

The validation for options of plugins checks each type of the property with the library joi. The function collatePluginAPIs validates APIs. It distinguishes the APIs which aren't defined in gatsby-node.js, gatsby-browser.js and gatsby-ssr.js. And then it reports these APIs as bad exports.

How better we read the code over multiple files?

There is just one line that invokes loadPlugins in the function initialize, but I must have read 20+ functions and over 900 lines of code which locates on some files separately. It was hard to understand the process of each function and unifiy them as a sequence or a bunch of process even though each function is not so complicated.

Anyway, I must have opened multiple files and went to and came back from them. I felt my memory had a limit to retain functions, variables and types at the same time. So, a tool like a text editor should support my memory to show multiple parts of code on a screen. In my opinion, panes can do it, not tabs.

-> Next ( 3 / 7 ) onPreInit