Performance and files full of code

April 30, 2022

Linus’s Famous Law: “Given enough eyeballs, all bugs are shallow.”

Linus’s Less Famous Law: “Files grow.”

One of the hardest things programmers do is keep code readable. Splitting code up into separate files is a basic tool for that.

And yet, crazy as it sounds when you say it out loud, our tools pressure us not to.

PHP

My first software job was for a company called AdMob. We used PHP. I know.

I needed to optimize some very hot code. I was a philosophy Ph.D. dropout, so I had very little idea what I was doing. The internet’s advice for optimizing and scaling PHP was “use memcached!”, but we were way beyond that.

I eventually figured out that if I combined all of our code into a single file, I could get double-digit speed-ups. It turns out that the overhead of statting, opening, and parsing a new file was substantial. I wrote a simple preprocessor, everyone groaned unhappily, and we moved forward.

JavaScript

Today I was looking into a bit of JavaScript that runs as part of our build process. (I care a lot about build time. For many years I was the de facto watchdog of the Go toolchain’s speed and have spent countless hours working on it.) Poking around a bit, I found that a substantive chunk of time was spent opening and parsing JavaScript files containing a bunch of data.

“No way,” I thought. So I ran an experiment.

I created two sets of files, containing the same data, an NxM map. One was all the code in a single JavaScript file, like this:

module.exports={"k0":{"j0":0},"k1":{"j0":0}};

The other was split across N JavaScript files:

module.exports={"k0":require("./sub/k0"),"k1":require("./sub/k1")};

where, e.g., sub/k0 is:

module.exports={"j0":0};

Then instead of N=2 and M=1, I set N=1000 and M=100, and executed the two.

On my M1 mac, with all files in the page cache, going from many files to one was considerably faster:

name  old time/op         new time/op         delta
Exec          111ms ± 1%           59ms ± 1%  -46.53%  (p=0.000 n=9+8)

name  old user-time/op    new user-time/op    delta
Exec          113ms ± 1%           51ms ± 1%  -54.62%  (p=0.000 n=10+9)

name  old sys-time/op     new sys-time/op     delta
Exec         17.5ms ± 2%          5.8ms ± 3%  -66.67%  (p=0.000 n=9+8)

Since the data files I was looking at today are autogenerated anyway, it would make sense to generate one big file instead of lots of little ones, even though it’s a bit less nice that way.

Some facts don’t change: An interpreted language must still stat, open, and parse each file, and that has overhead.

And it’s not just opening the file! /usr/bin/time -l on a mac now prints an “instructions retired” count, which is awesome. Spreading the data across many files causes over twice as many instructions to be executed. It also increases the peak memory footprint by 15%.

C++

It’s not just interpreted languages! Compiled languages are also sensitive to the apparently unimportant detail of how your code is organized into files.

At my first startup, card.io, I was doing computer vision and machine learning on underpowered mobile devices. The heavy computational work was in a bunch of C++ files. On a (somewhat educated) lark, I preprocessed all the files into a single giant input file. Lo! The code got a lot faster.

The compilation unit (what a single invocation of the compiler works on) of C/C++ is a file. By placing all the code in a single compilation unit, the compiler could see more of the code at one time and thus optimize it much more. I didn’t even bother grumbling. I just added the preprocessor to our build system and moved on.

Link-time optimization has improved the situation some since then, because it provides the opportunity to optimize across compilation units. But that comes at a steep cost. The results of compilation can be cached, but linking must be done every time, so LTO slows down builds.

Go

Go mostly gets this right.

The compilation unit of Go is a package, which can contain multiple files. Moving code around between those files has no impact on performance.

What about moving code around between packages, though? Perhaps surprisingly, there are very few cases in which this matters.

When compiling a package, the Go compiler emits information about optimizations (mostly inlining and escape analysis) that will be useful when compiling code that imports that package. So moving code around between packages typically has no impact at all.

As far as I know, this has never been written down as a formal design decision for the language, but it comes up occasionally in discussion.

One hard problem in compiler design is phase ordering. Here’s an example. Should it do inlining before or after dead code elimination?

Obviously, compilers should inline before dead code elimination, because inlining will often cause code to be provably dead. (Imagine code like if debug() { ... } and func debug() bool { return false }.)

Obviously, compilers should do dead code elimination before inlining, because dead code elimination will provide a clearer picture of the true complexity and size of a function, which is important for making good inlining decisions.

Hmmm. Obviously, compilers should apply optimizations in a loop until they reach a fixed point. What could go wrong?

The Go compiler does inlining early. But the part where it emits information for subsequent compilations occurs very late. We could have the best of both worlds: We could inline early intra-package, but make better inter-package inlining decisions by using information that wasn’t available earlier.

However, this could incentivize programmers to warp the structure of their code to get different optimization results. And that is a showstopper, at least for Go.

Go’s definitely not perfect. For example, its lackluster inlining heuristic (and lack of an pressure release valve) causes programmers to mangle their code. But that’s considered a bug. And it’s slightly off-topic for this post.

Moral

The moral of this story is not that Go is superior, although I am obviously fond of it; there are complex trade-offs here.

People like to say things like “tools don’t matter” and “tools are just a way to get the job done”. I disagree.

Our tools generate incentives that end up warping how we do our jobs, in ways subtle (code organization, dependency culture) and not-so-subtle (formatting flame wars, hallway sword fights).

Programmers should pick their tools carefully. And tool authors should think hard about the incentives they create.