You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When reading files via Xrootd with Spark (https://github.com/spark-root/laurelin) doing profiling with the code shows there's significant RTTs being burned because the HadoopFile interface doesn't support vectorized read/writes, meaning each TTree basket incurs its own RTT penalty compared with (e.g.) CMSSW who issues reads for multiple baskets with a single preadv() call. Not to mention, the backing filesystem on the other end typically supports vectorized I/O as well, so it would be a win on that side too.
If hadoop-xrootd was to implement a readv()/writev() interface, I could use that to vastly reduce the number of I/O round-trips for spark. XrdCl itself supports this via the synchronous XrdCl::File::VectorRead call, so if that C++ function could be exported up to XRootDClFile and then XRootDInputStream, I can use that to issue a single read at a time instead of potentially hundreds/thousands.
The text was updated successfully, but these errors were encountered:
@PerilousApricot let me understand - some library will call readv()/writev() on Hadoop interface level (hadoop does not have it however) and this will be translated to xrootd-client calls, am I correct?
When reading files via Xrootd with Spark (https://github.com/spark-root/laurelin) doing profiling with the code shows there's significant RTTs being burned because the HadoopFile interface doesn't support vectorized read/writes, meaning each TTree basket incurs its own RTT penalty compared with (e.g.) CMSSW who issues reads for multiple baskets with a single preadv() call. Not to mention, the backing filesystem on the other end typically supports vectorized I/O as well, so it would be a win on that side too.
If hadoop-xrootd was to implement a readv()/writev() interface, I could use that to vastly reduce the number of I/O round-trips for spark. XrdCl itself supports this via the synchronous
XrdCl::File::VectorRead
call, so if that C++ function could be exported up toXRootDClFile
and thenXRootDInputStream
, I can use that to issue a single read at a time instead of potentially hundreds/thousands.The text was updated successfully, but these errors were encountered: