Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPS link-based files, interpreted by a file() function can falsely make internal objects #5792

Open
jfy133 opened this issue Feb 14, 2025 · 2 comments

Comments

@jfy133
Copy link
Contributor

jfy133 commented Feb 14, 2025

Bug report

The following error was originally found by @ilight1542 and then debugged with @TCLamnidis @maxulysse @nvnieuwk

Expected behavior and actual behavior

If you accidently call a file-path object like it was a hash map (e.g. an nf-core meta map), where the object is from a https link, you can call 'fake' subobjects like it was a metamap, which is just a string of the name of the subobject.

Steps to reproduce the problem

We can create a file on our filesystem

touch hello.txt

Then in the nextflow console, run the following, which should fail as expected, if we try and call a nonexistent map-like attribute

foo = file("hello.txt")
print(foo.single_end)

Note this also fails if we use an s3 path (as expected).

However, if we replace remote file as a HTTPS URL, this does not fail and instead outputs the string of the attribute that was called

foo = file("https://github.com/nf-core/test-datasets/raw/eager/testdata/Mammoth/fastq/JK2782_TGGCCGATCAACGA_L008_R1_001.fastq.gz.tengrand.fq.gz")
print(foo.single_end)

@nvnieuwk suspects it may be related to this object: https://github.com/openjdk/jdk/blob/master/src/java.base/unix/classes/sun/nio/fs/UnixPath.java

Program output

Expected behaviour (as works correctly with local paths):

groovy> foo = file("hello.txt") 
groovy> print(foo.single_end) 
 
Exception thrown

groovy.lang.MissingPropertyException: No such property: single_end for class: sun.nio.fs.UnixPath
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:67)

Incorrect behaviour when https:

groovy> foo = file("https://github.com/nf-core/test-datasets/raw/eager/testdata/Mammoth/fastq/JK2782_TGGCCGATCAACGA_L008_R1_001.fastq.gz.tengrand.fq.gz") 
groovy> print(foo.single_end) 
 
single_end

Environment

  • Nextflow version: 24.10.4
  • Java version:
Picked up JAVA_TOOL_OPTIONS:  -XX:+UseContainerSupport -XX:ActiveProcessorCount=1
openjdk version "17.0.14-internal" 2025-01-21
OpenJDK Runtime Environment (build 17.0.14-internal+0-adhoc..src)
OpenJDK 64-Bit Server VM (build 17.0.14-internal+0-adhoc..src, mixed mode, sharing)
  • Operating system: GitPod / Ubuntu 22.04.5 LTS
  • Bash version: (use the command $SHELL --version): GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)

Additional context

(Add any other context about the problem here)

@maxulysse
Copy link
Contributor

I did encounter a:

ERROR ~ assert path
       |
       null

with a file from https using the "format": "path" in my schema.
I'll see if I can do a MRE

@bentsherman
Copy link
Member

HTTP paths are backed by the XPath class. My guess is that foo.single_end is falling back to foo.get('single_end'):

static XPath get(String str) {
if( str == null )
return null
def uri = new URI(null,null,str,null,null)
if( uri.scheme && !ALL_SCHEMES.contains(uri.scheme))
throw new ProviderMismatchException()
uri.authority ? (XPath)Paths.get(uri) : new XPath(null, str)
}

Which will return an XPath for the path "single_end".

I think this can only be addressed by static type checking, which will only allow a path to use methods defined in the Path interface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants