-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provenance v1 feedback: add new slsaLevel
field distinct from builder.id
?
#716
Comments
slsaLevel
field distinct from builder.id
?
Thank you for starting this discussion! This is exactly the type of feedback we are looking for! (I changed the issue title to make it easier to identify.)
That is indeed the current proposal. Our reasoning is that this makes verification much simpler: just check a single field.
Yes, in fact that was proposed by some folks on Tekton in a separate conversation. /cc @bobcatfish @chitrangpatel @khalkie @patricklawsongoogle. They were also confused by this and preferred having a distinct field for the SLSA level. I could be convinced. Having now two independent implementers suggest the same change is a strong signal. Still, I have two small reservations:
Any thoughts from others? |
Thanks for the quick response 🎉 Couple thoughts on your points:
|
Regarding (2), here's how I could imagine someone doing it wrong: Let's pretend you run a BuildKit instance that is only L2. It is not sufficiently Isolated, because the only thing separating a user's workload and the provenance signing key is a container, and a container is not a sufficient security boundary (according to some.) The risk is that a malicious workload breaks out of the container with a 0day and steals the provenance signing key. Now you upgrade to a new version of BuildKit that satisfies L3 by running the containers under gvisor. Since it's the same instance, you keep the same Oops! The key has already been compromised. You can't claim to be L3 if that key has been exposed to attackers. A similar scenario might be that you have two pools that use the same signing key. By forcing the user to pick a new |
Would it make sense to encode the requirement that "the same builder but with different capabilities should use different keys and be distinguishable" explicitly into the builder id schema? e.g. a strawman:
...with the out-of-band constraint that the same key should (or must) never be reused for different values of It feels like we're trying to layer some semantic meaning into the otherwise opaque string field of builder.id, but that the semantics we care about (claimed builder capability and associated keys) are narrow enough that it makes sense to just give them their own field. It also makes it really easy to write "sanity check" validators that reject provenance where the build SLSA level is higher than the claimed capability level, while also keeping it simple to write an authentication policy that just maps Note that this doesn't prevent the builders from encoding their own opaque versioning into builder.id if they like--and I suspect it makes it less likely that a verifier might accidentally start depending on the nominally opaque structure of builder.id to try to parse out the claimed capability level instead of maintaining that relationship out-of-band in the authentication policy. |
Another BuildKit maintainer here. For those who are unaware, we have integrated SLSAv0.2 to the latest release of Docker Build. The data we produce (including some non-spec extension fields) is documented in https://github.com/moby/buildkit/blob/v0.11.5/docs/attestations/slsa-definitions.md . example We have visibility to all the build steps at a very granular level. We can track all the criteria for SLSA levels individually for any individual build. I think that is powerful and our advantage if you compare the detailed provenance info that we can produce compared to some other tools. When comparing https://slsa.dev/spec/v0.1/requirements to https://slsa.dev/spec/v1.0-rc1/requirements I'm somewhat confused as first one lists a lot of useful properties that I can use to quantify the quality of the build procedure, while the second one concentrates on the properties of the signature while being very generic about the requirements for payload what is actually being signed. Not only does it mean that the SLSA level doesn't give much info about what the build actually did, it means that very different styles of builds have equal levels. And because many individual fields are missing now, there isn't a way for tools to compare the provenance properties and incentive for the authors to improve their builds.
We don't really have something that would be "non-isolated" atm. If we did, I think it would be such an exception, and we would not really care about provenance for these cases. But we do have:
You can see that this already creates lots of combinations. The "completeness" is split up in the current SLSA spec for different props as well. Our default is to have complete materials but not complete config parameters. I guess some combinations would end up on the same SLSA level but we still definitely want to show them all individually to our users as they are important for understanding the build procedure. We would expect some tools that take artifacts as inputs to do the same (eg. tools that trigger a replay of a build based on provenance, tools that perform independent reproducibility validation, or tools that check for stale materials). What would be your recommendations if the current draft stays? Would we need to move all the individual (previous and current) SLSA requirements into extension fields so our users still get useful information? Because the |
Sorry, by "Docker Build" to you mean the As per our build model, we distinguish between the "trusted control plane" (claims made by the build service) and the untrusted "tenants" of the build service. At SLSA Build L2+, the provenance MUST be generated by the trusted control plane. Therefore, it cannot be generated by a tool run by a tenant within the untrusted build process, such as the
That sounds more like SBOM than provenance? SLSA really needs a page explaining the difference between the two, but my basic bar is that SBOM is granular and generated by the tool, while provenance is high-level and generated by the service.
My recommendation is to document these properties per Alternatively, you could represent this with extension fields, with documentation in Does this all help? Footnotes
|
Following up on my earlier comment, I realized after chatting offline with Mark that my strawman doesn't fix anything. In order for the constraint that builds by the same builder but at different SLSA levels (particularly with different isolation guarantees) not be able to share keys, it would never make sense for However, this did get me thinking about how to ensure or at least encourage that constraint in terms of designing the authentication policy. Basically, if your authentication policy takes the shape But this brings me back to one theme of this conversation: after authenticating the payload, the verifier knows which trust anchor authenticated the signature, and therefore which In an x509 world, I believe these two parameters would appear in the cert (i.e. part of the authentication phase), and the cert issuer would be responsible for ensuring that distinct certs were always issued with unique signing keys (and enforcing policy about which entities can be issued which certs, etc). But we're not necessarily in an x509 world here, so it feels like what we're doing is effectively laying down design constraints on the authentication policy, since the spec is talking about aspects of authentication (signer identity and properties tied specifically to the identity). Mark, maybe it would help to document a concrete, reference authentication policy schema and pseudo-code implementation, along with some examples of how we would expect provenance producers to publish and distribute their
I've been focusing on the build's SLSA level in my examples, which I think is a superset of the properties that are actually being distinguished here. Would it make sense to define precisely which properties merit having isolated signing keys, and then formally encode only those "isolation level" values into either the It feels like part of the problem here is that
If those extension fields aren't part of the authentication policy, then aren't we back to this being vulnerable to a non-isolated builder being compromised and generating claims in the extension fields that are indistinguishable from what an isolated builder would generate? |
No, we produce SBOMs as well - they are completely different. Straight from SLSA webpage - "information about software artifacts describing where, when and how something was produced" - that's exactly what we provide. Note, of course, that
I think this seems more realistic for our case. In a previous version, we started introducing some of this data (inputs + configuration) using our custom data structures. Then we found SLSA definition to be a better approach with a potential for indistry-wide adoption, deprecated our custom fields, and moved to use SLSA. With the recent changes removing some of the fields that we now need to turn into extension fields, SLSA definition is less useful for us(and possibly other 3rd party adopters). |
To try and clarify, as I see things: In BuildKit, we want to use the provenance spec to record information from during the build - we have insight into content used since we have access to things like artifacts from At the moment, we're not building provenance to be checked against a verifier - but we do want to work towards letting users set up build environments using buildkit that they can verify to be SLSA 3+. BuildKit already runs in a client-server model, with the server controlling the provenance generation, with the client(s) unable to interfere with that process - even if the most common configuration today is with both the client and server within the same trust boundary. This is kinda what is described in "if the service had tight coupling to the command, such that the docker tool ran within the trusted control plane with sufficient sandboxing and isolation guarantees". Some more thoughts 🎉 (sorry they're a bit scattered) - I think there's a couple of different things to talk about here:
On a second reflection, I'm not sure if adding a new For the sake of example, if we had a The problem is, I have no idea what a complete list of these fields would look like - I can't find a tidy list on slsa.dev about what reproducibility requirements, completeness requirements, etc all look like. Maybe custom fields would be the right call here, I'm not sure. I'm still curious about the statement of purely using a A concrete example:
I think the issue is that some of the requirements to implement SLSA levels apply to the build service (like digital signatures), some of them apply to the user of the build service (which can be verified or checked by the builder, like a request to not include specific parameters, invalidating the completeness requirement) and some apply to both, like potential future reproducability+hermetic requirements (the build service is capable of reproducible builds, but only with a set of specific source files, e.g. docker builds that don't make use of With the current state, we can only express SLSA levels about the builder, not about anything relating to the actual run.
I think to have a slsaLevel notated for the run, we do need to work out how to solve this. I think the way to do this is to talk about what the compromise of a builder with builder.id X implies. If X is compromised, we have to treat all builds of X as compromised - however, Y does not need to be treated as compromised. If we have an isolated and non-isolated mode, we have to have different IDs - not because of the different SLSA levels, but because a compromise of the non-isolated mode implies a compromise of the isolated mode. I think this requirement makes more sense broken apart from the notion of SLSA levels, instead of being directly tied to them. |
Thank you, that is extremely helpful! I appreciate everyone taking the time to explain in detail. Is the following an accurate summary?
Ultimately this is a trade-off between simplicity and expressiveness. Our general strategy has been to prefer simplicity for the verifier (and to reduce the chance of mistakes), even at the cost of complexity to the builder. What about the following compromise?
Would this be acceptable for BuildKit? It would require a separate builder ID for the two modes, but it would allow you to retain the fields you want. |
Previous SLSA had these fields(equivalent of what is proposed in d9491ec):
In BuildKit, we added an extra extension field, "hermetic" https://github.com/moby/buildkit/blob/v0.11.5/docs/attestations/slsa-definitions.md#metadatahttpsmobyprojectorgbuildkitv1hermetic . It was also SLSA v0.2 criteria. I think it gives quite a lot of information about the quality of pipeline and I don't think it is covered by existing fields already. Not all hermetic builds are reproducible because reproducibility is bit-by-bit. And not all complete builds are hermetic (at least for us). For us, the completeness of config and verified materials means that we have captured all data that is needed to replay the build again from identical config and immutable inputs(eg. snapshot of the container image or git branch from the time of the build, even if it has changed later). There is also another property that I'm struggling to convey. I would best describe it as "immutable". For example, I'm exploring what properties Go builder in https://github.com/slsa-framework/slsa-github-generator has. Iiuc I would describe it as:
(sorry if there are any mistakes, the code is new to me) As you see, the aspects that are false are for different reasons. Squashing these properties into a single field would lose valuable information. If I were to compare the trustworthiness of two provenances, I would already trust the one that only manages to turn a single false into true from that list more.
I think this would be fine. The verifier that wants to assign the level number already needs to know about the specific |
Hmm. I had been considering "hermetic" to mean the same as "resolvedDependencies is complete", which is clearly not how you interpreted it. That highlights my concern: without further specification, different people will interpret these fields differently. It's pretty difficult to come to an agreement on this concept, which is why we temporarily undefined SLSA L4. I suspect we would diverge even more if we tried to agree on "systemParameters is complete"! Is everyone OK keeping the status quo for v1.0 but re-add these fields in a point release, when we can spend considerably more time agreeing on clear semantics and wording? It sounds like this is not blocking anyone from using v1.0 (since they can just use extensions) and adding later will be backwards compatible. |
What's the reason that the release date is fixed? I get not wanting to stretch the release process out, but ideally the release of v1.0 shouldn't be missing desired features that were present in earlier versions. The lack of the previous fields isn't a hard blocker for us adopting the 1.0 format over 0.2, in buildkit, since we can use extension fields. I agree that working out the exact interaction between producers + consumers + verifiers for new fields is likely to be a little complex, so I don't think it's worth rushing through, as long as we can keep the discussion going for a point release. |
This is a bit specific because not all builders allow the execution of user-defined code, where it becomes critical to understand if the sandbox allows network access or not. Otoh I’ve yet to see an implementation where these fields would always have the same value when using a definition that I described. For an independent verifier that proves provenance for a subject by issuing a rebuild in their own controlled infrastructure, I agree that these properties should be clearly defined so everyone agrees on what they mean. I don’t think, though, that we should discard some properties just because there could be an implementation where having one property at a specific value would also mean the same value for another property. I think the more quantitative properties of the build execution we can give to the user the better. |
Coming from a background in Tekton, I had a similar views that I expressed in #849. For me, however, I see it as being reasonable to have the Considering one of the points from above -- reproducibility -- there have been discussions about how this might make more sense as a verification process instead of a build property (#1011). In this situation, a reproducible attestation could be generated by another system after verifying that the same artifact (semantic or bitwise) can be produced from the inputs. |
Heya! I'm one of the maintainers of BuildKit (the backend for docker build) working on provenance generation there.
I have a query around the changes to builder.id and the removal of
metadata.completeness
andmetadata.reproducible
in the v1.0-rc1 release. Specifically, these fields could be used to determine the SLSA level of the build (e.g. SLSA level 3 requires the external parameters to be complete). However, now this information has been incorporated into thebuilder.id
:In BuildKit, long-term, we're aiming to have BuildKit act as the build service (maybe with some integration with our github actions?), so the builder id would be some part of that. However, this builder has the issue that we can't always achieve the same SLSA level that the builder is capable of - the SLSA level of an individual build is tied to both the builder and the user-specified inputs of the build.
For example:
mode
parameter which can bemin
ormax
. Users can select this, which changes the amount of information that the provenance attestation contains, so that we can safely enablemin
mode by default, since badly written Dockerfiles may accidentally leak secrets with the higher setting. Builds onmin
mode cannot reach higher SLSA levels, even if the builder is capable.RUN
step in them can be made to be reproducible, but ifRUN
steps are present, the builder may not be able to ever completely guarantee that the build is reproducible (e.g. the build may produce different timing results each run, or may access hardware random numbers, etc, etc).Essentially, the SLSA level of an individual build isn't conceptually just tied to the builder, but to the actual user inputs - but this isn't reflected through the
builder.id
parameter. We could just have multiple ids for multiple modes of operation, but this doesn't feel conceptually correct - the builder is the same in all the scenarios, it may even be the same daemon, or build service, or github action powering all of them.I can understand the reason for removing the original properties, they're quite fine-grained and specific - but I think the spec should continue to contain some way of specifying the level of the build separate from what the builder is capable of.
One possible idea could be to allow a range of SLSA levels for the builder id in the verifier (which represents the range of capabilities of the builder), while an additional build level identifier or SLSA level string could be encoded directly in runDetails. During verification, the verifier would need to check that the specified level for the build was in the valid range for that builder.
Curious about whether this is a change that could be considered? Or even if this has been discussed before (though I can't find anything).
The text was updated successfully, but these errors were encountered: