How webpack decides what to bundle

Sometimes webpack feels like a black box.

You write the webpack config, run webpack and out comes a bundle with unreadable code.

You know exactly the code you write – but you don’t know exactly what’s included in the bundle.

Does webpack just take everything in node_modules and put it in the bundle? Or is it smarter than that?

What about unused code, like functions that are not called? Are those included in the bundle, or is webpack smarter than that?

Let’s take back control by looking inside that black box to learn how webpack works under-the-hood.

Everything starts with the entry file

When you run webpack from the command line, the first thing webpack does is that it looks for the entry file in your webpack.config.js file:

  entry: './src/index.js',
  // the rest of your webpack config is here

In this case, the entry file is './src/index.js'. Webpack opens that file and then it searches for every import statement and require statement. For example:

import React from "react"
import MyComponent from "./MyComponent"
const _ = require("lodash");

In this case, it finds three statements. One that imports from ./MyComponent, one from lodash, and one that imports from react

Webpack builds a dependency graph used internally

Next step for webpack is to check for everything that is imported from those three files. It does this recursively until it has covered all imported files in the app. 

Every time webpack finds a new module it runs the module through the loader defined in the webpack config file. Then it adds the output from the loader to a dependency graph. If we visualize the dependency graph it looks something like this:


The root node of the dependency graph is the entrypoint index.js. Then you can see the dependencies it has: lodashreact and MyComponent. In this graph, you can also 
see that MyComponent has a dependency on memoizee.

Now all modules that are used in your app are included in the dependency graph. 

Your project have many installed dependencies in the node_modules folder that should not be included in your client-side JavaScript production bundle. Examples of such dependencies are the devDependencies that are used for testing and building. Another example of such dependencies are dependencies used only by your backend like express (if you are working on a full stack app).

These unused dependencies will not be included in the dependency tree because they are never referenced from any of the client side modules.

Webpack shakes out dead code from the dependency tree

Next step for webpack is to apply tree shaking to “shake out” the dead code from the dependency graph. Webpack 4 uses the built-in plugin ModuleConcatenation plugin for this.

Dead code is code that is never used in your app. Functions that are never called. Exports that are never imported.

So you have a dependency tree with all your modules in it. You can think of the part of the modules that you have imported into your app is the green fresh leaves on the tree. The other part of the module that is not used – the dead code – is the brown dead leaves. Webpack takes this tree with a steady hand and shakes it up really good. All the old brown leaves fall off. What’s left are the beautiful green healthy leaves!

Ok enough metaphors, let’s look at some code to really understand this.

react-router-dom is a widely used library that does client-side routing. I took a look at the source code on GitHub and found that it exports all of the following functions:

export { default as BrowserRouter } from "./BrowserRouter";
export { default as HashRouter } from "./HashRouter";
export { default as Link } from "./Link";
export { default as NavLink } from "./NavLink";
export { default as MemoryRouter } from "./MemoryRouter";
export { default as Prompt } from "./Prompt";
export { default as Redirect } from "./Redirect";
export { default as Route } from "./Route";
export { default as Router } from "./Router";
export { default as StaticRouter } from "./StaticRouter";
export { default as Switch } from "./Switch";
export { default as generatePath } from "./generatePath";
export { default as matchPath } from "./matchPath";
export { default as withRouter } from "./withRouter";
export { default as __RouterContext } from "./RouterContext";

But your app might only use BrowserRouter and Route like this: 

import { BrowserRouter as Router, Route } from "react-router-dom";

When tree shaking is applied, webpack only includes the code for Router and Route in the dependency graph, and “shakes out” all other exported functions.  Your bundle will not include the full react-router-dom library, but only a subset of it. That’s awesome because it makes the bundle size as small as possible!

For tree shaking to work, the dependency must use ES6 modules and you must import them with import statements (not require).

The final step is for webpack to seal the graph and creates the bundle

Next webpack walks the graph and seals it together and then it applies plugins. Plugins are defined in the webpack config file, and there are also internal plugins that webpack calls automatically without your configuration. Most of the internal work in webpack are actually implemented as plugins. 

The difference between a plugin and a loader is that a loader can only transform a single file just before it’s added to the dependency graph. Plugins are much more flexible. Plugins can work with multiple files, do bundle optimizations, code splitting and many other things.

The final step is that webpack spits out the bundle to the path defined in the outputsection of the webpack config. 

Webpack is a super powerful tool and there is much more to learn if you want to master it.