Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config parser (and loader) #4744

Open
wants to merge 50 commits into
base: master
Choose a base branch
from
Open

Config parser (and loader) #4744

wants to merge 50 commits into from

Conversation

bentsherman
Copy link
Member

@bentsherman bentsherman commented Feb 14, 2024

Close #2723

Implements the strict config syntax in Nextflow, using the shared "compiler" module from the language server.

To use the strict config syntax, simply set NXF_ENABLE_STRICT_SYNTAX=true in your environment when running nextflow.

This config loader is much simpler and easier to read. There is no more dependence on metaclasses, instead there is a simple builder DSL for executing a config file and producing a map.

NOTE: While this PR uses Jitpack to load the shared module, this dependency should be inverted before the release of 25.04. That is, eventually the compiler module should reside in Nextflow and the language server should consume it.

Signed-off-by: Ben Sherman <[email protected]>
Copy link

netlify bot commented Feb 14, 2024

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit ca26e26
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/67bcd8da1400ad00096fff07
😎 Deploy Preview https://deploy-preview-4744--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@bentsherman bentsherman changed the title Add custom config parser Config parser Feb 15, 2024
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
@bentsherman
Copy link
Member Author

Research updates:

  • Groovy compiler provides an interface to use a custom parser (in order to ease the transition from Antlr2 to Antlr4), this can be used to inject our custom parser without much hassle
  • Config schema effort can be pursued independently of the parser, see Config schema #4201
  • Might be able to use the Java security manager to prevent execution of unsafe code in the config file such as file I/O, subprocesses

@bentsherman
Copy link
Member Author

Here's an example to illustrate how the config parser "executes" the config file. Given this config:

foo.bar = 'hello'

foo {
  bar = 'hello'
}

includeConfig 'foobar.config'

The parser transforms this code into the following:

assign(['foo', 'bar'], 'hello')

block('foo', {
  assign(['bar'], 'hello')
}

includeConfig('foobar.config')

Basically each statement is translated to a DSL method (see ConfigDsl). This "hidden DSL" allows us to control how each statement is executed. In other words, the config parser + hidden DSL implement a simple interpreter for the Nextflow config language, separating the config language from the config execution.

I think this is a good model for how we might separate the Nextflow language from the runtime. Instead of hooking directly into Groovy methods and relying on Groovy MOP, metaclass, etc, we can introduce a hidden DSL which allows us to better control how a given statement (e.g. invoking a process in a workflow) is executed by the runtime.

@bentsherman
Copy link
Member Author

Java security manager is being removed. I will keep an eye out for other sandboxing techniques, but in the meantime, a simple solution is to provide a "safe" execution mode where assignment expressions are wrapped in string literals.

@bentsherman bentsherman changed the title Config parser Config parser (and loader) Mar 23, 2024
@ewels
Copy link
Member

ewels commented Apr 7, 2024

Really like the way that this is going, great work 👏🏻

Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
@bentsherman

This comment was marked as resolved.

Signed-off-by: Ben Sherman <[email protected]>
@@ -739,6 +683,26 @@ class ConfigParserTest extends Specification {

}

def 'should apply profiles in the order they were defined' () {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pditommaso this test is related to #1792 . We determined that config profiles are applied this way with the legacy parser. We could patch it, but I figure there's no point now, so just added a test to verify it's behavior.

Signed-off-by: Ben Sherman <[email protected]>
@bentsherman
Copy link
Member Author

@pditommaso tests are passing, let's review and merge

@@ -14,14 +14,15 @@
* limitations under the License.
*/

package nextflow.config
package nextflow.config.parser.legacy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be preferable to not shuffle stuff around and use Legacy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a reasonable change to keep related classes organized. You have done a similar thing many times, e.g. nextflow.executor.local or your proposed operator cleanup for the provenance tracking

Copy link
Member

@pditommaso pditommaso Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, what I think it should be avoid the use of the term "legacy" in class or package names, as it's a bad practice to use the term "new" when introducing new class. It would be preferable just to use v1 for current ones and v2 for new ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also allow the use of something like NXF_USE_SYNTAX_PARSER=v2 that's more extensible over NXF_ENABLE_STRICT_SYNTAX (I still believe it may be convenient to decouple for the "strict" semantics that's the need in the short term).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider keep current naming and use ConfigParserV2 for the new one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigParser now refers to the shared interface which both parsers implement. That's why I had to rename them. I could call them ConfigParserV1 and ConfigParserV2.

Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor suggestion but otherwise the docs look great.

Co-authored-by: Chris Hakkaart <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
@bentsherman
Copy link
Member Author

Another flaky CI test:

~ Test 'publish-s3-kms.nf' run failed
   + /home/runner/work/nextflow/nextflow/nextflow -q run ../../publish-s3-kms.nf -c .config
   ++ check_kms_key
   ++ aws s3api head-object --bucket nf-kms-xyz --key work/ci-test/publish-s3/HELLO.tsv
   ++ grep -c e5109f93-b42d-4c26-89ee-8b251029a41d
   
   An error occurred (404) when calling the HeadObject operation: Not Found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants