Distributed build (with distcc)

thinkyhead · January 17, 2020, 7:42pm

Has anyone set up PlatformIO to compile through the distcc distributed build system? How difficult do you think it would be? I have to do a lot of builds every day, and it would be great to be able to cut the time in half.

robertlipe · August 21, 2024, 2:31pm

Four years later, xkcd: Wisdom of the Ancients comes to mind.

I’m struggling with terrible Platformio build times and have another machine with eight idle M1 cores.

The project I’m working on http://nightdriverled.com currently builds 39 different combinations for ESP32. The .bin files are about 1.5-1.8MB each with only one hitting 2MB. So while “our” source code is about 1.2MB (src/, include/) there are a about a dozen Arduino-ish libraries in lib_deps. I know our build is kind of goofy in including too much - instead of all 39 targets turning in exactly the combination of Adafruit this and U8g2 that, it sometimes builds them and throws them away, but slicing it in high resolution in platform.ini is awful, too. So we wait for 39 copies of /ArduinoJson (times a dozen for all the lib_deps) to be fetched from the network, then installed (which takes longer than the fetch) and then built. It takes my M1 about an hour and, of course, with 39 copies of everything, it’s not like the build cache is exactly helpful except when you’re rebuilding. There’s just not much you can do with 62GB that’s fast:

 du -hs .
 62G	.

Because of Platformio/Scons slowness in the dependency checking and the slow “retrieving from cache” taking almost as long as compiling some files, even a “do nothing” build is 20+ seconds.

 $ time pio run -e mesmerizer
[ ... ] 
Environment    Status    Duration
-------------  --------  ------------
mesmerizer     SUCCESS   00:00:20.851
========================= 1 succeeded in 00:00:20.851 =========================
pio run -e mesmerizer  17.34s user 2.46s system 92% cpu 21.443 total

I know that distcc won’t help with that awful incremental build time but for that hour it spends fetching and generating that 62GB, enlisting that other computer would be awesome.

I also know I can probably find easier ways to speed it up once it’s all been fetched (and why do I need 39 copies of the JSON code anyway?) like generating 39 compile_commands.json and feeding that to something that’s actually fast like Ninja or CMake or something, but others have to be struggling with this awkward build system. Distcc would be some nice pain relief as the individual builds should be extremely parallelizable.

I know our platformio.ini could handle these multiple envs better, but my attempts to restructure the library dependencies just really doesn’t scale out.

I know that even if distcc could scale up time waiting for compiles by 80%. (realistic once everything is fetched and cached over the local network) our build would still be painful, but distcc would still be helpful chowing down on those 401 .o’s.

I could probably work it out in a Makefile, but between Platformio, Scons, ESP-IDF, Arduino, there’s just a lot going on that I don’t have great understanding of before xtensa-blah-g++ gets called as that’s the part we could distcc up.

So, has anyone successfully worked out distcc with PlatformIO? Barring that, is anyone able to pull that tree and help us make it play nicer with PlatformIO? Speeding up our “do almost nothing” builds would also be welcome.

 rm ./.pio/build/mesmerizer/src/main.cpp.o
➜  nightdriverstrip git:(fixincs) ✗ time pio run -e mesmerizer
[ ... ] 
Retrieved `.pio/build/mesmerizer/src/main.cpp.o' from cache
[ ... ] 
Environment    Status    Duration
-------------  --------  ------------
mesmerizer     SUCCESS   00:00:22.142
========================= 1 succeeded in 00:00:22.142 =========================
pio run -e mesmerizer  18.40s user 2.50s system 91% cpu 22.854 total

That’s 22 seconds to find the dependencies, copy a file from a cache to the workspace (why not a symlink?), link, then run esptool to glom the ELF to a bin. So distcc couldn’t help that case at all, but that’s still about 20 seconds slower than it should be, IMO.

How are the rest of you getting on with small, but not trivial, programs like this? We’re only about 20KLOC plus some libraries so we shouldn’t exactly be a stressful combination.

maxgerhardt · August 22, 2024, 7:15pm

And stuff like setting [build_cache_dir(build_cache_dir — PlatformIO latest documentation) or setting one libdeps_dir for all environment brings no improvements for parallel or consecutive builds of different environments?

robertlipe · August 22, 2024, 9:42pm

We use build_cache_dir.

du -hs .pio/build_cache
32G	.pio/build_cache

But since the cache is managed by Python, it’s about as slow as just recompiling many of our smaller sources. Of course since each of our 39 copies of, say, AsyncTCP has different pathnames, each of sources precompile down to something different because of FILE type differences, so there’s never a hit on builds 2-39 against the first.

Even when compiling absolutely no objects, PlatformIO is slow by default, as demonstrated by our 20+ second “do nothing” builds.
$ time pio run -e mesmerizer && time pio run -e mesmerizer
It spends a good 5-7 seconds “retrieving from cache”. Why does a bunch of stat(2) calls take that long?
[ … ]
yulc-demo SUCCESS 00:00:18.757
So with ZERO compilation and everything cached, it still takes 18 seconds. Not fast.

build_cache_dir seems to just let you choose WHERE it downloads 39 copies of the same dozen libraries. It still downloads 39 copies.

I know there’s a mode to make it totally give up on dependency checking that claims to be faster, but I can’t say that faster incorrect builds is exactly a tradeoff I’m anxious to make, but watching LDF iterate on my UNCHANGED tree for several seconds every time doesn’t contribute to my impression that anyone is using this that cares about build times.

So, yes, I’ve already experimented with both of those and didn’t exactly find joy in either one.

If git submodules weren’t such a terrible user experience for first time builders, I’d probably just pull those dozen packages into a third_party directory, version them on our own, and then “only” have 39 slow builds instead of 39 slow builds AND 39*a dozen fetches and installs. It’s not the time spent by g++ and ld that is killing us. It’s PlatformIO taking long to decide what to build, taking long to retrieve from cache, but most importantly, it’s those fetches that put the bus in park on a fresh work tree.

If anyone has tips on avoiding 38 of those fetches - none of which parallelize in any meaningful way, it seems - you could knock almost an hour off a cold build for us.

It’s “fast” in CI on GitHub only because Microsoft can afford to distribute the build to 39 different machines that can do them totally in parallel - exactly the way a normal developer on a single machine can’t.