Support readV/writeV #11

PerilousApricot · 2019-10-09T03:59:50Z

When reading files via Xrootd with Spark (https://github.com/spark-root/laurelin) doing profiling with the code shows there's significant RTTs being burned because the HadoopFile interface doesn't support vectorized read/writes, meaning each TTree basket incurs its own RTT penalty compared with (e.g.) CMSSW who issues reads for multiple baskets with a single preadv() call. Not to mention, the backing filesystem on the other end typically supports vectorized I/O as well, so it would be a win on that side too.

If hadoop-xrootd was to implement a readv()/writev() interface, I could use that to vastly reduce the number of I/O round-trips for spark. XrdCl itself supports this via the synchronous XrdCl::File::VectorRead call, so if that C++ function could be exported up to XRootDClFile and then XRootDInputStream, I can use that to issue a single read at a time instead of potentially hundreds/thousands.

The text was updated successfully, but these errors were encountered:

mrow4a · 2019-10-09T08:02:39Z

@PerilousApricot let me understand - some library will call readv()/writev() on Hadoop interface level (hadoop does not have it however) and this will be translated to xrootd-client calls, am I correct?

PerilousApricot · 2019-10-09T11:52:49Z

Correct. I'd use reflection to get it -- It's dark in this basement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support readV/writeV #11

Support readV/writeV #11

PerilousApricot commented Oct 9, 2019

mrow4a commented Oct 9, 2019

PerilousApricot commented Oct 9, 2019 via email

Support readV/writeV #11

Support readV/writeV #11

Comments

PerilousApricot commented Oct 9, 2019

mrow4a commented Oct 9, 2019

PerilousApricot commented Oct 9, 2019 via email