Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treatment of NaN in flex #478

Open
graeme-winter opened this issue May 18, 2020 · 8 comments
Open

Treatment of NaN in flex #478

graeme-winter opened this issue May 18, 2020 · 8 comments

Comments

@graeme-winter
Copy link
Contributor

It seems that NaN's confuse flex histogram...

    ratio = flex.log10(t_i.select(t_nn) / r_i.select(r_nn)).as_double()

    print("Min: %f" % flex.min(ratio))
    print("Number < -5: %d" % (ratio < -5).count(True))
    
    r_hist = flex.histogram(ratio, data_min=-10, data_max=10, n_slots=10)

    for c, v in zip(r_hist.slot_centers(), r_hist.slots()):
        print(c, v)

It turns out there are 6,338 NaN's in ratio, which all end up in the smallest bin:

Min: -5.173720
Number < -5: 2
-9.0 6338
-7.0 0
-5.0 9
-3.0 59
-1.0 481347
1.0 343104
3.0 61
5.0 1
7.0 0
9.0 0

I believe flex.histogram should not be treating NaN in this way...

@graeme-winter
Copy link
Contributor Author

      std::size_t
      get_i_slot(ValueType const& d_)
      {
        std::size_t i_slot = 0;
        ValueType d = d_ - data_min_;
        if (d != 0 && d >= slot_width_) {
              i_slot = static_cast<std::size_t>(d / slot_width_);
          if (i_slot >= slots_.size()) i_slot = slots_.size() - 1;
        }
        return i_slot;
      }

    protected:
      void
      assign_to_slot(ValueType const& d)
      {
        slots_[get_i_slot(d)]++;
      }

in scitbx/histogram.h - guess

if (not isnan(d)) {
        slots_[get_i_slot(d)]++;
}

would fix this, opinions?

@graeme-winter graeme-winter changed the title Treatment of NaN in flex histogram Treatment of NaN in flex May 18, 2020
@graeme-winter
Copy link
Contributor Author

Update: this is not limited to flex.histogram - flex.min gives some weird results:

from scitbx.array_family import flex
import math

x = flex.double(range(-10, 100)) + 0.5
l = flex.log10(x)

for _ in l:
    if math.isnan(_):
        continue
    print(_)
    break

print(flex.min(l), flex.max(l))

gives

-0.3010299956639812
1.9800033715837464 1.9978230807457256

i.e. flex.min() gives the wrong answer if there are NaN's in the input list

@graeme-winter
Copy link
Contributor Author

To be clear following the last comment: either NaN or min(!NaN) values should be returned, not some other random element. Returning NaN would be consistent with std::min and the true min value with fmin from math.h

graeme-winter added a commit that referenced this issue May 18, 2020
Includes test to assert this behaviour. Means histogram behaves as expected
for #478 however issues with min() and max() remain
@phyy-nx
Copy link
Contributor

phyy-nx commented May 26, 2020

Hi, so first I note that your test behaves differently on my centos7 build and my macos build. Since #463, macos builds have removed -ffast-math, so that's likely the reason.

Test:

from scitbx.array_family import flex
import math

x = flex.double(range(-10, 100)) + 0.5
l = flex.log10(x)
print(flex.min(l), flex.max(l))

Min result on centos7: -0.3010299956639812
Min result on macos: NaN

In neither case do I get your min result of 1.9800033715837464. Not sure what is going on, but it's enough to say that how the software is compiled greatly impacts the results.

That said, I do have some context that could help inform this discussion (likely familiar to all parties). To my knowledge, cctbx doesn't claim IEEE compatibility. Indeed, until #324, we explicitly crashed developer environments when many operations that produce NaNs were executed (such as 1/0). The reason mainly is because we wanted --fast-math. So it was up to the developer to not allow their code to reach a situation where NaNs were possible.

Since #324, it could be argued that cctbx should support NaNs. This is a big statement to make. Your commit in #479 adds a check for NaN for every entry in a flex array while creating a histogram. The same check would need to be added to every flex operation, not just min and max, but mean, median, etc. That seems to me like a big overhead, but maybe modern compilers with branch prediction are good enough that this doesn't have an impact. Regardless, it's a big set of changes. Is that really what we want?

If we want to be IEEE compliant then we should drop --fast-math, but it's a big decision, likely a community one. Otherwise my vote is to keep the current policy: buyer beware if you generate NaNs. cctbx is specific for a scientific field (e.g. crystallography, cryo-EM, and modeling macromolecules), which generally means that the numbers we work with are finite. We're not like numpy, which is for general math. Usually, any algorithm that regularly generates NaN or infinity values has some issue that should be fixed at the algorithm level.

Thoughts or comments? Seems like a useful discussion to have.

@Anthchirp
Copy link
Member

I just verified this using an installation of DIALS 1.1 we still keep around (March 2016, and thus well before any NaN discussions from #324). cctbx did evidently not make a guarantee that NaNs were always avoided. It just happened to crash some of the time.

$ dials.python
Python 2.7.8 (default_cci, Apr  8 2016, 15:33:21) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from scitbx.array_family import flex
>>> import math
>>> x = flex.double(range(-10, 100)) + 0.5
>>> l = flex.log10(x)
>>> list(l)
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -0.3010299956639812, 0.17609125905568124, 0.3979400086720376, 0.5440680443502757, 0.6532125137753437, 0.7403626894942439, 0.8129133566428556, 0.8750612633917001, 0.9294189257142927, 0.9777236052888477, 1.021189299069938, 1.0606978403536118, 1.0969100130080565, 1.130333768495006, 1.1613680022349748, 1.1903316981702914, 1.2174839442139063, 1.2430380486862944, 1.2671717284030137, 1.290034611362518, 1.3117538610557542, 1.3324384599156054, 1.3521825181113625, 1.3710678622717363, 1.3891660843645326, 1.4065401804339552, 1.423245873936808, 1.4393326938302626, 1.4548448600085102, 1.469822015978163, 1.4842998393467859, 1.4983105537896004, 1.5118833609788744, 1.5250448070368452, 1.5378190950732742, 1.550228353055094, 1.5622928644564746, 1.5740312677277188, 1.5854607295085006, 1.5965970956264601, 1.6074550232146685, 1.6180480967120927, 1.6283889300503116, 1.6384892569546374, 1.6483600109809315, 1.6580113966571124, 1.667452952889954, 1.6766936096248666, 1.6857417386022637, 1.6946051989335686, 1.7032913781186614, 1.711807229041191, 1.7201593034059568, 1.7283537820212285, 1.7363965022766423, 1.7442929831226763, 1.7520484478194385, 1.7596678446896306, 1.7671558660821804, 1.7745169657285496, 1.7817553746524688, 1.7888751157754168, 1.7958800173440752, 1.8027737252919755, 1.8095597146352678, 1.816241299991783, 1.8228216453031045, 1.8293037728310249, 1.8356905714924256, 1.841984804590114, 1.8481891169913987, 1.8543060418010806, 1.8603380065709938, 1.866287339084195, 1.8721562727482928, 1.8779469516291882, 1.8836614351536176, 1.8893017025063104, 1.8948696567452525, 1.9003671286564703, 1.9057958803678685, 1.9111576087399766, 1.916453948549925, 1.921686475483602, 1.9268567089496924, 1.9319661147281726, 1.9370161074648142, 1.9420080530223132, 1.9469432706978254, 1.951823035315912, 1.9566485792052033, 1.9614210940664483, 1.9661417327390325, 1.9708116108725178, 1.975431808509263, 1.9800033715837464, 1.9845273133437926, 1.9890046156985368, 1.9934362304976116, 1.9978230807457253]
>>> print(flex.min(l), flex.max(l))
(-0.3010299956639812, 1.9978230807457253)
>>> import dials.util.version
>>> dials.util.version.dials_version()
'DIALS 1.1.3-ga7ac5d8-release'

I do note however that in 2016 on RHEL7 it returned the correct values

@Anthchirp
Copy link
Member

same results for all versions since (cs03r-sc-serv-16, RHEL 7.8 server)

$ for x in 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.14 2.0 2.1 2.2; do module load dials/$x; dials.python test.py; done
DIALS 1.2.6-gc8a142d-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.3.4-g9f5e5cd-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.4.5-g37e56cb3-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.5.1-gf72becc3-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.6.4-g47a579eb-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.7.4-gbe933d59-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.8.6-gc63d8697-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.9.3-gb491019a-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.10.4-gc14d9720b-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.11.4-g9c616a499-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.12.6-gdcf0b845a-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 1.14.13-g10ecfbb15-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 2.0.4-g0252e41c3-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 2.1.5-g118c27f8c-release
(-0.3010299956639812, 1.9978230807457253)
DIALS 2.2.4-g04de204b4-release
-0.3010299956639812 1.9978230807457253

@phyy-nx
Copy link
Contributor

phyy-nx commented May 26, 2020

@Anthchirp Regarding crashes, the #324 traps aren't present in the release builds so they won't trigger. Regardless, good to see the consistent results from the linux builds. I'd be curious if the mac builds suddenly change from -0.3 to NaN when #463 was merged.

@graeme-winter
Copy link
Contributor Author

Thoughts or comments? Seems like a useful discussion to have.

Agree with this, certainly. Personally I am very much in favour of having code which behaves like IEEE says it should - robust and working trumps --ffast-math in my book.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants