You are viewing...

Terrible Ideas #2: Single Repo or Monorepo on Git

Updated on November 02, 2019 at the 16th hour
Posted under:

DISCLAIMER: All views are considered my own and you should not draw any conclusions on associates.

  • TL;DR Only small MonoRepos on git a.k.a. Multi Repo. Use a MetaRepo if you want one big repo with "partial" checkouts.


    I used to work at a company that did single repo (not monorepo) on TFS (This Fucking Subversion?). This was a terrible experience as the checkout process was say 1-2 hours burned checking out a 30-40GB repo when I started that ballooned to 60-70GB when I got to stop using it. Microsoft had stopped supporting it as they transitioned their way into Git. Network failure meant that you could not work on files that you did not checkout ("Time to go home"). Transient failures would corrupt your state. The build pipeline was so coupled together due to the thought of using direct references to common libraries instead of publishing the library. Just general bad practices that intermingled due to neglect and not thinking about the effects.

    I swore off .Net, never again. C# the language is ok. Honestly, If I were the CTO I would have initiated mass firings, freeze of feature dev and a move to Java. Not all at once obviously, but mindsets need to shift in order to be nimble, hire the best and worry less about small fires. End digression.

    This one guy (Dude has a fucking dinosaur laptop!!! Prehistoric! ๐Ÿ˜‚) had said this development style got them "to a billion". Oh puleeze, that is bull. The company was carrying the teams using this style. Other parts embraced better tooling like Rust and Git ๐Ÿฃ. I would fire people, who hold on so dearly to the past, like this. Move to the future or go somewhere that wants to stick to bad shit forever.

    Mono is life, El Goog and FB are doing it

    Eventually, we got to the point of transitioning to Git y'know after all the bringing it up with the CEO. I did bring up bad infrastructure with a small group chat with the CEO because I had to work with this shit. Anyway, same guy researched MonoRepo vs MultiRepo, looks at Google and FB and declares we must move to the mono repo. ๐Ÿ˜‚๐Ÿ˜… Com'on, got to be kidding because Google and FB are doing it that's why we must? Those same articles or papers clearly state they required partial checkouts to accomplish this feat and they don't use Git for this exact reason. Y'know maybe Git isn't so complicated, so research the data model and limitations and see if it would handle a company "MonoRepo". Nah, Google and FB are doing Mono and Mono is so popular nowadays. Bring up the case of build system having to pull in this big mono and dismissed as "oh well, we do it today". It's like ...wtf... we want to move away from that. This is a religious-, emotion-driven decision and arguably fireable. I would/will fire people who I don't trust or just want to argue.

    Someone else did their homework on GitHub and made a final decision to do MultiRepo. Good job to that person. Happy ending there.

    Why not `git cli Big Bad Mono`?

    Git has a limitation on the size of the repo before the operations slow to a halt. GitHub definitely has a size limit. It requires full checkout of a repository so imagine pulling every file that you do not need and imagine your disk space going to shit because of the giant history. Imagine the build systems having to pull big bad mono just to build one project.

    * Full checkout requirement alone informs you not to do big bad mono with Git. Google and FB do partial checkouts with other VC systems.

    If you really desire big bad Mono, then use Subversion.

    There's a really good talk focused on getting yourself out of trouble in git. Touches on the data model of gitย  and it gave me ideas on how I could mangle with a git repository to make it do what I wanted.

    Git users tend to not know how it works, so they can mangle the history up quite a bit. Imagine doing that with 20-50 other unpredictable users who can force push/pull, merge with only their changes and etc. "Oh my changes went in, so I'm good" Git gives you a lot of power, but most don't use it all.

    So, `git clone meta-repo`?

    Not necessarily, but if you have a penchant to use git, but want one root repo and don't mind having to update the sub repo and the root repo.

    The meta repo is a repository of repositories, so it is effectively MultiRepo with a metadata root

    It solves the "discoverability" problem and the "atomic" commits problem. Personally, if two projects are in the same repo but have no relation to each other then there is no good reason for discoverability. Get proper repo search tools.

    * There isn't a good reason I've seen for atomic commits across a large number of repos unless you have coupled everything which is a terrible experience for everyone! If a library is so widely used, it is time to publish it, pin the version and update in a safe manner. No good reason for people to update unless security, compliance or incompatibility problems.

    There are some tools that facilitate and manage a meta repo:

    * Meta Toolย  (https://github.com/mateodelnorte/meta)


    I'm not big on meta and like to see that it is another way of organization I am aware of. It is nice for small teams that have back and front end engineers working together. At some point, depending on kind of product growth, I think things will naturally split off separate repos where each does not need "atomic commits" or arbitrary discoverability across front and backend code.

    What's the lesson here?

    Don't abuse git and use MultiRepos with Git. Save a lot of future pain otherwise use Subversion or some VCS that supports partial checkouts.

    Some people will stick to the past due to lack of knowledge or just plain old willful and blissful ignorance. If they cannot change their mindset quickly enough then it will be arduous and probably unfruitful.

    Dump TFS.

    Don't get stuck in the mentality and cargo cult-ing of a practice or tool that Popular Tech companies use. I don't care if FB or Google uses this or that. It is great to know what they use in order to evolve solutions. Also, they don't tell you much about their current environment until it is a mature tool that which will probably be deprecated in a few months or years, so if you want to peddle what they do as if it was current well that's your reputation.

    * Ex. Google moved from pull to push telemetry as they scaled up.

You just read "Terrible Ideas #2: Single Repo or Monorepo on Git". Please share if you liked it!
You can read more recent posts here.