On July 15th, we discussed the pleasure and pain of monorepos and React with Salem Hilal from Etsy. We became acquainted with Salem because his article about Etsy, Monorepos, and Kevin was featured in the Enterprise React Newsletter.
A monorepo is an architectural strategy whereby all source code resides in a single source code repository. A monorepo strategy is the opposite of the One-Repo-Per-Project strategy commonly implemented. There are good reasons to use both strategies. Watch the recording for more information.
The Lay of the Land
About 200-300 people work on the code held in this repository. Across the organization, there are some other repositories with special purposes, but most of what you know about Etsy, comes out of this single repository.
Generally speaking, developers have access to all of the code in the monorepo. Thus, developers can alter almost anything. Naturally, this could create chaos, and from time to time, it does. We had an interesting conversation about this. The notes are below under the “Etsy Development Philosophy” heading.
Etsy desires consistency across code. Gaining consistency is a challenge because there are 200+ developers contributing to the code base. Etsy helps to guide consistency by the appropriate use of git hooks, linting, Prettifying, and automated inspection of code. An example of a linting rule is a guard against trailing commas. Internet Explorer 11 does not allow trailing commas, and IE 11 is on the support path for Etsy. While Etsy no longer needs to worry about what syntaxes browsers support thanks to tools like Babel, they still use linting to detect issues at commit-time.
The development standards at Etsy evolve. TJ asked how Etsy would handle a situation where the standards change but the section of code being worked on is not affected. Would the developer have to fix additional parts of the source code? What is the responsibility of the developer?
Salem explained that the commit hook knows the previous linting status of the files and will only force new failures as hard stops. Thus, the developer would only have to fix the new linting errors and not be responsible for bringing all of the code up to the current standard. This approach favors individual developer productivity over total code base consistency.
Another guest, Dan Skaggs, said the approach used in his monorepo was to fail the commit for any errors regardless of the date the rule became effective. This approach puts the onus on the developer to bring all source code up to the latest standard before successfully committing. Both methods have merit and the right one depends on which trade off you want to accept.
To Rebase or not to Rebase
Etsy aims for a linear flow of source code. If source code branches are used, they are used for a very short time and specific purpose. An Etsy developer’s workflow is to use git rebase often to ensure the local copy is brought up to the state of the remote copy. Rebasing frequently helps the developer stay in sync with any added rules, or git commit hooks that might cause rework. I found this interesting because I specifically recall being told to never rebase my repos. Around 12:07 there was a conversation about rebased because of a viewer question about monorepos and merge conflicts.
The advantage of a monorepo is simplicity in permissions, building, code access, and sharing. A disadvantage of a monorepo at Etsy scale is physical system limitations. As the number of files grows, so grows build time. Also, it can be very overwhelming to have to check out an entire codebase just to fix a single character in a single file. That said, computer science choices are a point on a line of diverging tradeoffs and the benefits of monorepo architecture pay off for Etsy.
One of our developers, Jalen Massey, asked Salem about the disadvantages of a monorepo. Salem said the main issue with monorepos is they do not enforce isolation. Without isolation enforcement, because modules are accessible to all, a module can get extended beyond the original scope. Also, there are physical issues in a monorepo because so many files are in the same spot. Large repos take more download time, more CI/CD time, more memory, and other things. Physical limitations are the main impetus behind Etsy creating Kevin.
Etsy Development Philosophy
At 35:48, I asked Salem a question about how the processes get enforced at the company. With so many hands in so many parts of their applications, there is a wide surface area for issues. Salem explained Etsy has a strong “No Blaming” policy where mistakes are expected from time to time and a high value is placed on understanding the nature of the mistake and learning from it. This environment sounds super supportive and an environment where developers can be their best, continually learning and innovating.
Etsy places a high value on deploying quickly. This means they make the necessary investments in automated testing, tooling, deployments, and so on. Having an optimized infrastructure and environment means that developers operate efficiently and ship faster, and it means that reverting bugs and deploying patches are similarly efficient.
We discussed the importance of automated checking to take both the time, and social pressure off of code reviewers. The group felt like automated checks made it easier for developers to work on their code and handle linting style rule errors without consuming time and creating interpersonal friction.
Lastly, in this section we covered how to update teams on new features. As the monorepo gains new features, it is important to inform the developer community about what is available. Dan Skaggs also brought up a good point about using this information to manage upwards – informing senior management about the progress, milestones, and new capabilities. He sends out a newsletter to the technical executive staff at his company and has seen the interest in his internal newsletter grow, as executive staff want to be informed about the latest developments.
Kevin and Webpack
Around 50:00, we discussed the build tool migration at Etsy and the creation of Kevin – an open source project that makes dealing with huge repositories and webpack a lot easier. Processing the monorepo for a build process would require 50-60 GB of available RAM. This amount of memory is required to hold all of the files in memory for the incremental build workflow. Few machines are running around with an extra 50GB to spare!
Kevin works by splitting up a monorepo in regions. Think of a region as a collection of user interfaces, suiting a common purpose. Each bit of code belongs to one and only one region. Regions are handled as individual units by the Kevin process. Rather than process the entire monorepo each time, Etsy can process the code base region by region, simplifying the physical and technical requirements. Salem wrote a comprehensive description of the Kevin build process on the Code As Craft website. It’s worth your time to read it and understand this tool.
Etsy and Open Source
For the final portion of the interview, we talked about Etsy and Open Source. Starting at 58:31 we went over how Etsy uses open source as well as contributes to open source software. I was impressed by the sophisticated approach Etsy uses. Staff can contribute to projects, and even open source tools and things built at Etsy. There is an internal group responsible for ensuring open source is done correctly, licensing and legal requirements are met, and the project has a responsible and available sponsor. Etsy is a good citizen of the technology ecosystem and I admire their commitment to be both producers, consumers, and enablers of quality open source software.
Salem offered to answer questions about Kevin and the material in the interview. The recording of the interview is here. We’ll make sure he knows about comments there. You can interact with him on Twitter as well.
If you are interested in Kevin, take it for a test drive. The project would welcome contributions, extensions, or constructive criticism.