[core] natively support .dockerignore #10921
Labels
@aws-cdk/core
Related to core CDK functionality
effort/medium
Medium work item – several days of effort
feature-request
A feature should be added or improved.
in-progress
This issue is being actively worked on.
p1
A not-uncommon pattern to use with Docker is to turn the
.dockerignore
file into a whitelist by making the first rule one that ignores everything and then having subsequent negated patterns explicitly exempting specific files and directories.For example:
This is useful to guard against accidentally passing files to the Docker context (or worse, to the image itself) due to forgetting to add a corresponding entry to the
.dockerignore
file.CDK currently not only prevents this pattern from being used, but actually makes the situation worse, due to its under-the-hood reading of
.dockerignore
patterns and processing them withminimatch
, which has different pattern matching behavior.I noticed this when I noticed that my CDK operations were incredibly slow (
cdk diff
taking 5 minutes) and upon inspection of the cloud assemblies I noticed that everything was being copied in, includingnode_modules
.Use Case
Ergonomics
Directories for generic assets are pretty straightforward to reason about what is being passed to the cloud assembly and what can be excluded via
exclude
. Directories for docker assets often contain more directories including large ones such asnode_modules
orbuild
directories.Users may mistakenly assume that having a
.dockerignore
means that they don't have to provide anexclude
option sinceDockerImageAsset
gives the impression of a deeper understanding of Docker assets compared to a genericAsset
, so I don't think it's unreasonable to expect that CDK may be able to Do The Right Thing™ when a.dockerignore
is present.I think this expectation is reasonable especially since CDK currently does read and process
.dockerignore
under the hood, albeit incorrectly applyingminimatch
semantics instead of.dockerignore
semantics (see below).Adding this feature will mean that users of
DockerImageAsset
get an out-of-the-box experience that Just Works™ without any furtherexclude
modifications, unless they need to for whatever reason, but then it would be obvious to them.Correctness
Even setting aside ergonomics, it's simply incorrect to read in
.dockerignore
and process its patterns withminimatch
semantics, since it differs from.dockerignore
semantics, yet CDK is doing just this, and it's also doing some workarounds for preventing exclusion of the Dockerfile and.dockerignore
from the cloud assembly, which has consequences on user-provided exclusions (see below).Perceived Slowness of CDK
Incorrectly configured exclusions can have a severe impact on the perceived slowness of CDK when every synthesis requires copying large directories like
node_modules
, and it is easy to misattribute that to CDK itself being slow (which in a way it is being), possibly driving users away.My motivation for writing this and implementing the proposed solution is because something like
cdk diff
was taking my project over 5 minutes to finish. I did believe it was just CDK being this slow.Whitelist Support
Users cannot use a whitelist style
.dockerignore
due to the way things work (explained in-depth below).Even if one recognizes this, someone who uses a whitelist approach with
.dockerignore
may see the benefits of such an approach and attempt to apply them toexclude
, but it is also not possible there (in particular with directory contents, see below).Proposed Solution
I propose to remedy this by adding native support for
.dockerignore
style patterns (rather than trying to process them withminimatch
). This will be exposed via an optionalCopyOptions.ignoreMode
setting which defaults toIgnoreMode.GLOB
(for backward compatibility) which would useminimatch
, whileDockerImageAsset
would useIgnoreMode.DOCKER
(and error for any other mode) with full, proper.dockerignore
behavior.This way, users of
DockerImageAsset
that have a.dockerignore
will have a everything just work out of the box, even if they are using a whitelist style.dockerignore
.Summary of Problems
Here is an in-depth summary of the current problems when attempting to use a whitelist
.dockerignore
:CDK drops all whitelist-style patterns
The guard against excluding
Dockerfile
or.dockerignore
from the cloud assembly does not consider that a pattern may be an 'ignore-all'*
pattern or a negated pattern!
, causing every pattern in a whitelist to be dropped.aws-cdk/packages/@aws-cdk/aws-ecr-assets/lib/image-asset.ts
Lines 108 to 113 in 60c782f
This guard checks that the given pattern doesn't end up excluding
Dockerfile
or.dockerignore
.If the pattern is
*
for example then it will definitely match againstDockerfile
or.dockerignore
, which this code interprets as a dangerous pattern which would filter out those necessary files, so that pattern is filtered out.Likewise all negated patterns are filtered out as well. For example if
!foo
and!bar
will match againstDockerfile
and.dockerignore
sincematch(Dockerfile, !foo)
is true becauseDockerfile
is notfoo
.This causes every single pattern to get dropped in a whitelist-style
.dockerignore
without the user having done anything at all, since.dockerignore
is automatically read and combined with possibly-user-suppliedexclude
patterns..dockerignore
patterns are notminimatch
patternsCDK automatically under-the-hood reads
.dockerignore
files and combines them withexclude
patterns which may be user-supplied. These patterns are then processed by the minimatch package which provides functionality somewhat likefnmatch(3)
.This is incorrect because minimatch interprets patterns differently. For example, per
.dockerignore
rules, a simple!subdirectory
pattern is enough to whitelistsubdirectory/
and any of its contents. However the same is not true forminimatch
.CDK passes the
matchBase
option which means that if the pattern contains no slashes, it'll match against the path basename. So the directory itself will correctly whitelist, but when CDK checks child paths, those will match against the basename so e.g.subdirectory/child
will be a check ofminimatch(child, subdirectory)
..dockerignore
is more like.gitignore
pattern behavior with some deviations which you can read about here.There is one package which seems to provide
.dockerignore
matching behavior called@balena/dockerignore
, which conforms to the API ofnode-ignore
which is a package providing.gitignore
matching behavior.Note that
@balena/dockerignore
also details the differences between.dockerignore
and.gitignore
behavior.CDK
.dockerignore
-sourced patterns override user patternsBeing aware of the above issue, I can simply, albeit redundantly, add my own explicit patterns to plug the leaks that CDK opens up.
Specifically, since
.dockerignore
directory exemptions (negated patterns) aren't interpreted correctly (the.dockerignore
way), I can remedy this by adding my own!subdirectory/*
for any!subdirectory
in the.dockerignore
.As already mentioned, CDK automatically reads a
.dockerignore
file if present and concatenates it after user supplied rules.The exclude pattern logic operates in a 'last pattern wins' fashion.
All of this together means that my
!subdirectory/*
pattern will be clobbered by the*
at the beginning of the.dockerignore
, meaning my pattern won't take effect at all.This can be remedied by making user-provided patterns come after
.dockerignore
patterns, so that user-provided ones can have the 'final say'. TheDockerfile
/.dockerignore
guard will still filter out the*
pattern so CDK won't be using a whitelist as intended, however, and it would be up to the user to recreate the whitelist present in.dockerignore
inexclude
minimatch
patterns.Minimatch matchBase behavior
Even if we allow user-provided
exclude
patterns to 'win', CDK usesminimatch
'smatchBase
option which:CDK passes absolute paths to
minimatch()
, which is probably why passing this option is considered useful.The problem is that my above workaround of explicit providing a
!subdirectory/*
pattern won't work because it contains a slash, so it will be matched against the full absolute path, and it won't match, and there's no way for me as a user to work around this.The solution here is to pass to
minimatch()
paths relative to the root directory (i.e. root of the asset).Other
This is a 🚀 Feature Request
The text was updated successfully, but these errors were encountered: