You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Blocks whose largest nonzero value is subnormal can cause floating-point overflow during conversion to zfp's block-floating-point format. Although this conversion corrupts the mantissas, this is an otherwise benign bug as such blocks are still reconstructed as a collection of "random" subnormals, albeit with complete loss of precision.
Two potential solutions have been identified:
Perform the float-to-int normalization in two steps via two separate multipliers. This will, however, incur a performance penalty for all blocks.
Cap the smallest supported block exponent, which effectively flushes subnormals to zero (a strategy already employed by many processors). Given that the tolerances used with zfp often vastly exceed FLT_MIN, this should have a negligible effect in practice. Whereas this approach affects the compressed representation of all-subnormal blocks, current versions of zfp would still correctly decompress such blocks.
The text was updated successfully, but these errors were encountered:
Overflow is prevented when the magnitude, x, of the largest value in a block is at least 2^-127 = FLT_MIN / 2 for floats and at least 2^-1023 = DBL_MIN / 2 for doubles, i.e., when x is at least half of the smallest positive normal number. Rather than capping the exponent and encoding zero-valued coefficients, it is more efficient to simply treat such blocks as all-zeros, which cost only one bit to encode. Such behavior is analogous to Intel's DAZ (denormals-are-zero) floating-point flag, which treats all subnormal inputs as zero. While we could in theory support values as small as {FLT,DBL}_MIN / 2 without causing overflow, for compatibility with DAZ it seems preferable to require that the largest value in a block have magnitude at least {FLT,DBL}_MIN. Note that some subnormals could still be reconstructed when the largest value in a block is normal.
This proposed behavior is invoked when the compile-time option ZFP_WITH_DAZ is enabled.
Blocks whose largest nonzero value is subnormal can cause floating-point overflow during conversion to zfp's block-floating-point format. Although this conversion corrupts the mantissas, this is an otherwise benign bug as such blocks are still reconstructed as a collection of "random" subnormals, albeit with complete loss of precision.
Two potential solutions have been identified:
The text was updated successfully, but these errors were encountered: