dockerTools.buildImage layers are 2x too big #94636

fare · 2020-08-04T03:13:33Z

Describe the bug
When using dockerTools.buildImage, files specified as contents get added twice, resulting in an image twice too big: first by mkPureLayer (or mkRootLayer), without the /nix/store/* prefix path, directly in /, then by the layerClosures and newFiles handling of buildImage, that will pull in all the very same packages in /nix/store as separate copies.

To Reproduce

Create an image with docker image name ${IMG} using buildImage, then docker load < result
docker run -t -i ${IMG} ls -lid /bin/bash /nix/store/*bash-interactive*/bin/bash

You'll see that the two files have different inode numbers. The same data is copied twice.
Or you can docker history ${IMG} and see that layers are twice the expected size.

Expected behavior
Somehow the files should be copied only once. The two copies should be either as symlinks or hardlinks. In the latter case, the two steps of mkPureLayer then newFiles handling should happen in a single command that can thus share the hardlinks.

This requires some major refactoring of buildImage.

For added points, separate layer computation from image computation, so when chaining multiple layers, we don't need to pack then unpack N images each of N layers, which consumes O(N^2) resources both in cpu time and disk space.

For yet more points, make it so that layers can be built that are independent from each other will be built in parallel, instead of requiring a total order of layers.

Bonus: instead of running runAsRoot commands in a virtual machine, what about using the much lighter weight fakeroot, just like Debian does. This might even remove the need for two vastly separate cases mkPureLayer vs mkRootLayer.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Notify maintainers
@roberth @utdemir @alexbiehl @nlewo @grahamc

The text was updated successfully, but these errors were encountered:

yurrriq · 2021-01-06T00:46:22Z

Is someone actively working on this? I'm willing, if not able, to help.

yurrriq · 2021-01-06T00:51:22Z

I wonder if #108416 helps out here too..

utdemir · 2021-01-06T01:53:14Z

Just briefy looking at the issue, Here is what I currently am thinking:

Most of the "added points" you mentioned are mostly solved by buildLayeredImage (and streamLayeredImage). However there are still reasons to use buildImage, the first thing that comes to my mind is that buildLayeredImage does not support runAsRoot-like functionality.

I wonder if #108416 helps out here too..

streamLayeredImage and buildImage has almost completely separate codepaths, so it's unlikely it easily helps out here.

Is someone actively working on this?

I am not actively working on this, but I do not know if someone else is.

This requires some major refactoring of buildImage.

I am not sure, but I'll assume that it is true for the rest of the comment. If there is an easy fix for the current implementation we should try it first.

I'm willing, if not able, to help.

I, for one, think it would be valuable to try to unify the codepaths of buildImage and buildLayeredImage. To me, ideally buildImage would just be a specialized version of buildLayeredImage where maxLayers = 1. However, that requires adding the missing functionality to streamLayaredImage and investigate where it diverges. If you decide to go down to this route, I'd be willing to help as much as I can around how streamLayeredImage works.

I guess another approach would be to just refactor buidlImage separately to first fix the issue, and possibly refactor the shortcomings you mentioned. But I am wary of maintaining the two separate implementations of what essentially the same thing.

@roberth What do you think?

fare · 2021-01-06T04:46:01Z

For the record, I have stopped using either buildImage (that builds stuff twice too large) or buildLayeredImage (that has totally useless layering whenever there are more than ~100 layers).

Instead, I ended writing my own script (in Gerbil Scheme) that uses nix deterministically on top of docker build, what more using cachix so it will include my packages without rebuilding them inside docker: https://github.com/fare/gerbil-utils/blob/master/scripts/make-docker-image.ss

roberth · 2021-01-06T11:29:42Z

@utdemir Unifying would be great. Right now we have both duplicate maintenance and features that are missing in the one or the other. The two are quite different though, but I'm sure something can be done.

Just unifying doesn't address the confusion around image contents though, which behaves differently in either implementation. Users will usually want to add symlinks to the root, but in some cases they do need to copy files to the root.
From a technical perspective they should have multiple options, which are currently hidden behind vague parameters.

copy files to the root
copy files to the root in a new layer
add to the paths of the single buildEnv / symlinkJoin
add extra store paths (should be rare because we always include full closures and you don't generally use an image in ways that require store paths that weren't already in the image for another reason)

The contents parameter is hugely misleading and should be renamed to something like copyToRoot.
The symlinking can be done in many ways, so I don't think it's the responsibililty of buildImage necessarily. We should just recommend copyToRoot = pkgs.symlinkJoin { }. Otherwise we'll have to create an unbounded number of such facades, for every variation, which inhibits understanding. See Fairbairn threshold and teaching a man how to fish.

that has totally useless layering whenever there are more than ~100 layers

See #48462

nixos-discourse · 2021-03-16T03:03:49Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-run-chown-for-docker-image-built-with-streamlayeredimage-or-buildlayeredimage/11977/1

W1M0R · 2021-03-18T13:45:03Z

It looks like build dependencies are included in the resulting image, increasing the size by at least 30MB.

pkgs.dockerTools.buildLayeredImage {
  name = "hello-test";
  tag = "latest";
  contents = [ pkgs.hello ]; # pkgs.hello includes: libunistring, libidn2, glibc.
}

Most of the images that I am trying to build include the following transitive build dependencies:

libunistring
libidn2
glibc (30MB)
bash
gcc

This means that it seems currently impossible to build Docker images that are smaller than 30MB using nix.

I believe the build dependencies are included, because if I set contents = [ pkgs.p7zip ] and I use /nix/store/*glibc/bin/ldd to inspect the dependencies of 7z, 7za and 7zr, they have no dependencies. So it looks like they are statically built and don't need dependencies 1-5 to run.

Here are the layers according to dive:

Does anyone know how to take an existing derivation, e.g. p7zip, and remove dependencies 1 - 5 from its runtime dependencies, so that buildImage/buildLayeredImage will not include those layers?

vroad · 2021-04-14T02:25:56Z

Most of the "added points" you mentioned are mostly solved by buildLayeredImage (and streamLayeredImage). However there are still reasons to use buildImage, the first thing that comes to my mind is that buildLayeredImage does not support runAsRoot-like functionality.

since #116749 is now merged, if you just want to change ownership of files in the image, you can switch to buildLayeredImage (streamLayeredImage), which doesn't have this problem.

stale · 2021-10-12T14:44:38Z

I marked this as stale due to inactivity. → More info

roberth · 2022-07-01T13:23:46Z

This may help: #179801

alexvorobiev · 2023-06-09T18:20:22Z

@roberth copyToRoot doesn't seem to be replacing the functionality of contents fully. Unlike contents, it doesn't always copy the files but instead sometimes copies just symlinks to them. My use case is to have functioning sudo in the image which I cannot convert to copyToRoot: https://discourse.nixos.org/t/using-copytoroot-to-add-sudo-to-images-created-by-dockertools

roberth · 2023-06-09T22:44:48Z

Yeah I think that's because of buildEnv. That one produces symlinks, which are then copied when creating the customization layer.

You could probably make it work somehow, but for the dockertools project, I think we should leverage parts of NixOS

along these lines [RFC22, RFC78] NixOS a la carte (proof of concept) #148456

alexvorobiev · 2023-06-10T19:11:24Z

Yes, I wish I could use bits and pieces of NixOS. For now, I ended up moving sudo to a separate layer and that seems to work (I posted the workaround to that discord thread). Is it worth trying to switch to streamLayeredImage? My images are fairly large (>5G) with hundreds of packages. Will each package go to a separate layer?

adrian-gierakowski · 2023-06-10T19:41:46Z

Will each package go to a separate layer?

there’s only 120 something layers available so not exactly

streamLayeredImage will put “most popular” packages into their own layer and once layer limit is reached lump everting that’s left into one final layer. This approach is suboptimal for most cases when the layer limit is reached. To optimise things for your use case you’d need something like this or alternatively this (haven’t used the latter)

nixos-discourse · 2024-07-17T21:16:36Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/dockertools-image-sizes-are-absurd-how-to-improve/49225/2

fare added the 0.kind: bug Something is broken label Aug 4, 2020

FRidh added this to the 20.09 milestone Aug 5, 2020

FRidh added the 6.topic: docker tools label Nov 6, 2020

FRidh modified the milestones: 20.09, 21.03 Nov 6, 2020

vroad mentioned this issue Mar 18, 2021

dockerTools.streamLayeredImage: add fakeRootCommands option #116749

Merged

10 tasks

W1M0R mentioned this issue Apr 16, 2021

pandoc: binary copied to /nix/store and to /bin #119597

Closed

roberth mentioned this issue Jul 2, 2021

dockerTools.buildLayeredImage: contents completely replaces /bin #129007

Open

stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 12, 2021

roberth mentioned this issue Oct 20, 2021

Packaged node2nix applications doesn't work in dockerTools.buildImage svanderburg/node2nix#201

Open

roberth mentioned this issue Dec 3, 2021

[RFC22, RFC78] NixOS a la carte (proof of concept) #148456

Closed

13 tasks

This was referenced Jul 1, 2022

Update docker example (nixpkgs#94636) NixOS/nixos-homepage#872

Closed

dockerTools.buildImage: Add copyToRoot to replace contents, explain usage #179801

Merged

roberth closed this as completed in #179801 Jul 7, 2022

W1M0R mentioned this issue Feb 17, 2023

dockerTools.streamLayeredImage produces wrong system paths #102962

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dockerTools.buildImage layers are 2x too big #94636

dockerTools.buildImage layers are 2x too big #94636

fare commented Aug 4, 2020

yurrriq commented Jan 6, 2021

yurrriq commented Jan 6, 2021

utdemir commented Jan 6, 2021

fare commented Jan 6, 2021 •

edited

Loading

roberth commented Jan 6, 2021

nixos-discourse commented Mar 16, 2021

W1M0R commented Mar 18, 2021 •

edited

Loading

vroad commented Apr 14, 2021

stale bot commented Oct 12, 2021

roberth commented Jul 1, 2022

alexvorobiev commented Jun 9, 2023

roberth commented Jun 9, 2023

alexvorobiev commented Jun 10, 2023

adrian-gierakowski commented Jun 10, 2023

nixos-discourse commented Jul 17, 2024

dockerTools.buildImage layers are 2x too big #94636

dockerTools.buildImage layers are 2x too big #94636

Comments

fare commented Aug 4, 2020

yurrriq commented Jan 6, 2021

yurrriq commented Jan 6, 2021

utdemir commented Jan 6, 2021

fare commented Jan 6, 2021 • edited Loading

roberth commented Jan 6, 2021

nixos-discourse commented Mar 16, 2021

W1M0R commented Mar 18, 2021 • edited Loading

vroad commented Apr 14, 2021

stale bot commented Oct 12, 2021

roberth commented Jul 1, 2022

alexvorobiev commented Jun 9, 2023

roberth commented Jun 9, 2023

alexvorobiev commented Jun 10, 2023

adrian-gierakowski commented Jun 10, 2023

nixos-discourse commented Jul 17, 2024

fare commented Jan 6, 2021 •

edited

Loading

W1M0R commented Mar 18, 2021 •

edited

Loading