Gatsby Initialization ( 2/2 ) | Look Through Code Reading

The previous post explains Gatsby initialization until load plugins. Next is onPreInit step.

( 3 / 7 ) onPreInit

As the offical document says, onPreInit API runs as soon as plugins are loaded. The function apiRunnerNode handles Gatsby Node APIs like onPreInit.

step summary core function
1 call onPreInit API implemented in gatsby-node.js apiRunnerNode

When apiRunnerNode receives the string onPreInit as an argument, it finds onPreInit API in gatsby-node.js of the loaded plugins and calls it.

Judgement of whether to read more in detail or skip it

In this step, I don't know a concrete process which onPreInit API runs because users or plugins implement it in gatsby-node.js. If read more about the code of apiRunnerNode, I thought there was no more information about the onPreInit step. It was enough at the time if I understood that apiRunnerNode handles Gatsby Node API.

When read code, it is sometimes important to focus on whether the code is closely related to the topic we want to know. If not so much, we had better skip reading in detail. Or we often get lost because of too much information that we don't need immediately.

( 4 / 7 ) delete html and css files from previous builds

Gatsby deletes html and css files from previous builds. There is no core function. If the command line interface gatsby develop calls the function initialize, this step is skipped.

This step and next are so-called initialization I imagined before reading.

( 5 / 7 ) initialize cache

After deletes html and css files, the function initialize checks and deteles cache. And then creates empty directories. There is no core function. Instead, this step can be divided into 4 processes below.

process summary core function
1 create a hash to check updated plugins (createHash)
2 check a new hash against the hash stored in Redux -
3 delete .cache directory and files in it -
4 create an empty .cache directory and public/static directory (ensureDir)

The function createHash is a method in the Node.js module crypto.

It is impressive how Gatsby checks updated plugins before deletes cache. The first process creates a hash of all the version numbers of installed plugins, the site's package.json, gatsby-config.js and gatsby-node.js. The second process checks the new hash against the old hash stored in Redux.

In other words, Gatsby deletes cache in initialization if

  • any plugins are updated.
  • any files of package.json, gatsby-config.js and gatsby-node.js are modified.

( 6 / 7 ) copy gatsby files

This step has 3 processes. Many files and directories in the package gatsby is copied into .cache directory of a user's Gatsby project. The rest of the processes prepares for loading Gatsby SSR API and Gatsby Browser API.

process summary core function
1 copy files in the package gatsby into .cache directory of a user's Gatsby project -
2 create subdirectories of .cache -
3 create files to load gatsby-ssr.js and gatsby-browser.js in each plugin -

The copied files locate on /cache-dir/ directory. There are 30+ files and 4 directories. For example, default-html.js is below.

import React from "react"
import PropTypes from "prop-types"

export default function HTML(props) {
  return (
    <html {...props.htmlAttributes}>
      <head>
        <meta charSet="utf-8" />
        <meta httpEquiv="x-ua-compatible" content="ie=edge" />
        <meta
          name="viewport"
          content="width=device-width, initial-scale=1, shrink-to-fit=no"
        />
        {props.headComponents}
      </head>
      <body {...props.bodyAttributes}>
        {props.preBodyComponents}
        <div
          key={`body`}
          id="___gatsby"
          dangerouslySetInnerHTML={ __html: props.body }
        />
        {props.postBodyComponents}
      </body>
    </html>
  )
}

HTML.propTypes = {
  htmlAttributes: PropTypes.object,
  headComponents: PropTypes.array,
  bodyAttributes: PropTypes.object,
  preBodyComponents: PropTypes.array,
  body: PropTypes.string,
  postBodyComponents: PropTypes.array,
}

After copy, detect the plugins which have gatsby-ssr.js and/or gatsby-browser.js. Each plugin creates api-runner-ssr.js and/or api-runner-browser-plugins.js to load Gatsby SSR API and Gatsby Browser API. And then all the paths to these files are written in api-runner-ssr.js and api-runner-browser-plugins.js of a user's Gatsby project.

It is more complicated than handling Gatsby Node API. I don't understand the reason. But I will realize it if continue to read. I guess at this time that this complicated process for Gatsby SSR API and Browser API is due to their called phase like after-build or in browser.

( 7 / 7 ) onPreBootstrap

As the same as onPreInit step(3/7), apiRunnerNode searches onPreBootstrap API in plugins and calls them.

return value of the function initialize

The return value is simple below. Both store and workerPool. store in Redux holds the whole state tree during build. workerPool is a jest-worker object.

const workerPool = WorkerPool.create()

return {
  store,
  workerPool,
}

Summary of the function initialize

At the beginning of my Gatsby code reading project, I've read the function initialize in detail. initialize is called by the function bootstrap. In building static sites, the sequence in which Gatsby calls initialize is below.

gatsby build
-> /src/commands/build.ts : build
  -> /src/bootstrap/index.ts : bootstrap
    -> /src/services/initialize.ts : initialize

The function initialize is

  • open and validate gatsby-config.js files
  • load plugins
  • delete html and css files
  • delete cache
  • copy gatsby files
  • call Gatsby Node API(onPreInit and onPreBootstrap)

These processes account for one third of the bootstrap process during build.

Summary of code reading

Finally I read more than 2,800 lines of code to understand the function initialize, though initialize has just 540 lines.

I found a lot of challenges to code reading through this experience. And the tips for code reading can be updated.

Knowledge often helps us understand code

The knowledge or background supported me to understand the code well. The point of this tip is how early we can notice the lack of the knowledge behind the code. I recommend to search official documents just after you don't understand code. You can find a detail explanation or a release note in an official site.

Panes can let us read better the code over multiple files. Not tabs.

Our memory has a limit to retain functions, variables and types at the same time. To read better the code over multiple files, a tool like a text editor should support our memory to show multiple parts of code on a screen. In my opinion, panes can do it, not tabs.

It is sometimes important to focus on whether the code is closely related to the topic we want to know. If not so much, we had better skip reading in detail. Or we often get lost because of too much information that we don't need immediately.

Making lists of functions prevents me from getting confused by too much information of code.

Making lists of functions is one of my tips about code reading. I had to arrange much information to add explanations to lists of functions in the course of code reading. The list below shows the core functions which loadPlugins includes.

process summary core function
1 create an array including all the plugins normalizeConfig
2 validate options of all the plugins validateConfigPluginsOptions
3 load browser API, node API, SSR API from files getAPI
4 load plugins in gatsby-config, internal plugins and other plugins loadPluginsInternal
5 flatten plugins with nests into an array flattenPlugins
6 identify which APIs each plugin exports collatePluginAPIs
7 distinguish types of bad exports API and report errors handleBadExports
8 detect multiple replaceRenderers and report warnings handleMultipleReplaceRenderers

As a result, I can understand the sequence of called functions and the data type the functions create, transform or just delete. This routine work that I makes lists of functions lightens the burden on me when understanding code. And I can keep a clear head due without getting confused by too much information of code.

Scanning the whole code of a function or a file helps me divide a difficulty into many parts.

I tried to read from the beginning of the function initialize line by line. But soon I got confused and overwhelmed because I couldn't find the ends of this function and therefore didn't understand what every part of the code means. For the countermeasure against confusion, I scanned the whole code of initialize some times. This approach can find parts I understand easily rather than parts I don't understand. The more parts of code I understand, the better I can focus on other parts.

Additionally, scanning the whole code can find some patterns in the code. In the case of initialize, I found activityTimer was called repeatedly. It divided the code into 7 parts. It was a great discovery for me. I could read the code of initialize part by part and understand it easier than before dividing.

What's Next

The next phase is sourcing data, type inference and building GraphQL schema. My code reading project is approaching the core of Gatsby which is a React-based static site generator with GraphQL. Let's go!