Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Not all _delta_log file operations go through the LogStore interface #4175

Open
2 of 8 tasks
gustavoatt opened this issue Feb 19, 2025 · 0 comments
Open
2 of 8 tasks
Labels
bug Something isn't working

Comments

@gustavoatt
Copy link

Bug

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Describe the problem

While trying to implement a custom LogStore I realized that not all operations accessing files in the _delta_log directory go through the store.

Specifically, reads of the delta log entries, <version>.json don't use the LogStore at all and instead are read directly using the Hadoop Filesystem (see an example here in the Snapshot code). I assume this is because we want to read the delta log entries in parallel.

Question: is this the intended behavior of LogStore, or is it a bug? I need to extend a bit how we access the delta log and wondering whether I should just do this at the Filesystem layer.

Steps to reproduce

I reproduced this by doing the following:

  1. Create a pass-through LogStore implementation
  2. Set spark.delta.logStore.class to the classname of my custom LogStore
  3. Set checkpoints on all methods of LogStore to find out when each file is accessed

Observed results

Some files are read using the LogStore, like the checkpoing <version>.crc files, but not the <version>.json files.

Expected results

Expected all files in _delta_log to be read from the LogStore.

Environment information

  • Delta Lake version: 3.2
  • Spark version: 3.5
  • Scala version: 2.12

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
@gustavoatt gustavoatt added the bug Something isn't working label Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant