Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same AwkwardArrayDataSouce is (attempted to be) compiled twice #1501

Closed
jpivarski opened this issue Jun 13, 2022 · 2 comments · Fixed by #1474
Closed

Same AwkwardArrayDataSouce is (attempted to be) compiled twice #1501

jpivarski opened this issue Jun 13, 2022 · 2 comments · Fixed by #1474
Assignees
Labels
bug The problem described is something that must be fixed

Comments

@jpivarski
Copy link
Member

Version of Awkward Array

HEAD

Description and code to reproduce

When I ran the tests with localbuild.py, tests/v2/test_1477-generator-entry-type-as-rvec.py::test_BitMaskedArray_NumpyArray attempted to compile AwkwardArrayDataSource_8748206908428 a second time. The test failed because Cling complained at awkward/_v2/_connect/rdataframe/to_rdataframe.py:288: AssertionError.

Full test output
________________________ test_BitMaskedArray_NumpyArray ________________________

    def test_BitMaskedArray_NumpyArray():
        ak_array_one = ak._v2.contents.bitmaskedarray.BitMaskedArray(
            ak._v2.index.Index(
                np.packbits(
                    np.array(
                        [
                            1,
                            1,
                            1,
                            1,
                            0,
                            0,
                            0,
                            0,
                            1,
                            0,
                            1,
                            0,
                            1,
                        ],
                        np.uint8,
                    )
                )
            ),
            ak._v2.contents.numpyarray.NumpyArray(
                np.array(
                    [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6]
                )
            ),
            valid_when=True,
            length=13,
            lsb_order=False,
        )
>       data_frame_one = ak._v2.to_rdataframe({"one": ak_array_one})

ak_array_one = <BitMaskedArray valid_when='true' lsb_order='false' len='13'>
    <mask><Index dtype='uint8' len='2'>[240 168]</Index></mask>
    <content><NumpyArray dtype='float64' len='14'>
        [0.  1.  2.  3.  4.  5.  6.  7.  1.1 2.2 3.3 4.4 5.5 6.6]
    </NumpyArray></content>
</BitMaskedArray>

tests/v2/test_1477-generator-entry-type-as-rvec.py:171: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
awkward/_v2/operations/ak_to_rdataframe.py:44: in to_rdataframe
    return _impl(
        arrays     = {'one': <BitMaskedArray valid_when='true' lsb_order='false' len='13'>
    <mask><Index dtype='uint8' len='2'>[240 168]</Index></mask>
    <content><NumpyArray dtype='float64' len='14'>
        [0.  1.  2.  3.  4.  5.  6.  7.  1.1 2.2 3.3 4.4 5.5 6.6]
    </NumpyArray></content>
</BitMaskedArray>}
        flatlist_as_rvec = True
awkward/_v2/operations/ak_to_rdataframe.py:82: in _impl
    return ak._v2._connect.rdataframe.to_rdataframe.to_rdataframe(
        array      = <BitMaskedArray valid_when='true' lsb_order='false' len='13'>
    <mask><Index dtype='uint8' len='2'>[240 168]</Index></mask>
    <content><NumpyArray dtype='float64' len='14'>
        [0.  1.  2.  3.  4.  5.  6.  7.  1.1 2.2 3.3 4.4 5.5 6.6]
    </NumpyArray></content>
</BitMaskedArray>
        arrays     = {'one': <BitMaskedArray valid_when='true' lsb_order='false' len='13'>
    <mask><Index dtype='uint8' len='2'>[240 168]</Index></mask>
    <content><NumpyArray dtype='float64' len='14'>
        [0.  1.  2.  3.  4.  5.  6.  7.  1.1 2.2 3.3 4.4 5.5 6.6]
    </NumpyArray></content>
</BitMaskedArray>}
        awkward    = <module 'awkward' from '/home/jpivarski/irishep/awkward/awkward/__init__.py'>
        flatlist_as_rvec = True
        layouts    = {'one': <BitMaskedArray valid_when='true' lsb_order='false' len='13'>
    <mask><Index dtype='uint8' len='2'>[240 168]</Index></mask>
    <content><NumpyArray dtype='float64' len='14'>
        [0.  1.  2.  3.  4.  5.  6.  7.  1.1 2.2 3.3 4.4 5.5 6.6]
    </NumpyArray></content>
</BitMaskedArray>}
        length     = 13
        name       = 'one'
awkward/_v2/_connect/rdataframe/to_rdataframe.py:26: in to_rdataframe
    return DataSourceGenerator(length, flatlist_as_rvec=flatlist_as_rvec).data_frame(
        flatlist_as_rvec = True
        layouts    = {'one': <BitMaskedArray valid_when='true' lsb_order='false' len='13'>
    <mask><Index dtype='uint8' len='2'>[240 168]</Index></mask>
    <content><NumpyArray dtype='float64' len='14'>
        [0.  1.  2.  3.  4.  5.  6.  7.  1.1 2.2 3.3 4.4 5.5 6.6]
    </NumpyArray></content>
</BitMaskedArray>}
        length     = 13
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <awkward._v2._connect.rdataframe.to_rdataframe.DataSourceGenerator object at 0x7f4d9cde6ca0>
layouts = {'one': <BitMaskedArray valid_when='true' lsb_order='false' len='13'>
    <mask><Index dtype='uint8' len='2'>[240 168]...='14'>
        [0.  1.  2.  3.  4.  5.  6.  7.  1.1 2.2 3.3 4.4 5.5 6.6]
    </NumpyArray></content>
</BitMaskedArray>}

        def data_frame(self, layouts):
            cpp_code_declare_slots = ""
            cpp_code_define_readers = ""
            cpp_code_column_names = ""
            cpp_code_column_type_names = ""
            cpp_code_column_types_map = ""
            cpp_code_init_slots = ""
            cpp_code_resize_slots = ""
            cpp_code_entries = ""
    
            k = 0
    
            for key in layouts:
                self.generators[key] = ak._v2._connect.cling.togenerator(
                    layouts[key].form, flatlist_as_rvec=self.flatlist_as_rvec
                )
                self.lookups[key] = ak._v2._lookup.Lookup(layouts[key])
                self.generators[key].generate(ROOT.gInterpreter.Declare)
    
                self.entry_types[key] = self.generators[key].entry_type()
                if isinstance(
                    self.generators[key], ak._v2._connect.cling.NumpyArrayGenerator
                ):
                    pass
                elif isinstance(
                    self.generators[key], ak._v2._connect.cling.ListArrayGenerator
                ) and (
                    self.generators[key].is_string
                    or (
                        self.generators[key].flatlist_as_rvec
                        and self.generators[key].is_flatlist
                    )
                ):
                    pass
                elif isinstance(
                    self.generators[key], ak._v2._connect.cling.RegularArrayGenerator
                ) and (
                    self.generators[key].flatlist_as_rvec
                    and self.generators[key].is_flatlist
                ):
                    pass
                elif isinstance(
                    self.generators[key], ak._v2._connect.cling.IndexedOptionArrayGenerator
                ):
                    pass
                elif isinstance(
                    self.generators[key], ak._v2._connect.cling.IndexedArrayGenerator
                ):
                    pass
                elif isinstance(
                    self.generators[key], ak._v2._connect.cling.ByteMaskedArrayGenerator
                ):
                    pass
                elif isinstance(
                    self.generators[key], ak._v2._connect.cling.BitMaskedArrayGenerator
                ):
                    pass
                elif isinstance(
                    self.generators[key], ak._v2._connect.cling.UnmaskedArrayGenerator
                ):
                    pass
                else:
                    self.entry_types[key] = "awkward::" + self.entry_types[key]
    
                self.data_ptrs_list.append(self.lookups[key].arrayptrs.ctypes.data)
    
                cpp_code_declare_slots = (
                    cpp_code_declare_slots
                    + f"""
            ULong64_t fPtrs_{key} = 0;
            std::vector<{self.entry_types[key]}>  slots_{key};
            std::vector<{self.entry_types[key]}*> addrs_{key};
        """
                )
    
                cpp_code_define_readers = (
                    cpp_code_define_readers
                    + f"""
            if (name == "{key}") {{
                for (auto i : ROOT::TSeqU(fNSlots)) {{
                    addrs_{key}[i] = &slots_{key}[i];
                    reader.emplace_back((void *)(&addrs_{key}[i]));
                }}
            }}
        """
                )
                cpp_code_define_readers = cpp_code_define_readers + "else "
    
                cpp_code_column_names = (
                    cpp_code_column_names
                    + f"""
            "{key}"
        """.strip()
                )
    
                cpp_code_column_type_names = (
                    cpp_code_column_type_names
                    + f"""
            "{self.entry_types[key]}"
        """.strip()
                )
    
                cpp_code_column_types_map = (
                    cpp_code_column_types_map
                    + f"""
            {{ "{key}", "{self.entry_types[key]}" }}
        """.strip()
                )
    
                cpp_code_init_slots = (
                    cpp_code_init_slots
                    + f"""
            fPtrs_{key} = fPtrs[{k}];
        """
                )
    
                cpp_code_resize_slots = (
                    cpp_code_resize_slots
                    + f"""
            slots_{key}.resize(fNSlots);
            addrs_{key}.resize(fNSlots);
        """
                )
    
                cpp_code_entries = (
                    cpp_code_entries
                    + f"""
            slots_{key}[slot] = awkward::{self.generators[key].class_type()}(0, fSize, 0, reinterpret_cast<ssize_t*>(fPtrs_{key}))[entry];
        """
                )
    
                k = k + 1
                if k < len(layouts):
                    cpp_code_column_names = cpp_code_column_names + ", "
                    cpp_code_column_type_names = cpp_code_column_type_names + ", "
                    cpp_code_column_types_map = cpp_code_column_types_map + ", "
    
            array_data_source = self.class_type()
    
            if not hasattr(ROOT, array_data_source):
                cpp_code = f"""
    namespace awkward {{
    
        class {array_data_source} final
          : public ROOT::RDF::RDataSource {{
        private:
            ULong64_t fSize = 0ULL;
            std::vector<ULong64_t> fPtrs;
            unsigned int fNSlots{{0U}};
            const std::vector<std::string> fColNames;
            const std::vector<std::string> fColTypeNames;
            const std::map<std::string, std::string> fColTypesMap;
            std::vector<std::pair<ULong64_t, ULong64_t>> fEntryRanges{{ }};
    
            {cpp_code_declare_slots}
    
            PyObject* fPyLookup;
    
            Record_t
            GetColumnReadersImpl(std::string_view name, const std::type_info &id) {{
                Record_t reader;
    
                {cpp_code_define_readers}
                {{
                    for (auto i : ROOT::TSeqU(fNSlots)) {{
                        reader.emplace_back(nullptr);
                    }}
                }}
                return reader;
            }}
    
        public:
            {array_data_source}(PyObject* lookup, ULong64_t size, std::initializer_list<ULong64_t> ptrs_list)
              : fSize(size),
                fPtrs({{ptrs_list}}),
                fColNames({{{cpp_code_column_names}}}),
                fColTypeNames({{{cpp_code_column_type_names}}}),
                fColTypesMap({{{cpp_code_column_types_map}}}),
                fPyLookup(lookup)
                {{
                    Py_INCREF(fPyLookup);
                    {cpp_code_init_slots}
                }}
    
                ~{array_data_source}() {{
                    Py_DECREF(fPyLookup);
                }}
    
                void SetNSlots(unsigned int nSlots) {{
                    fNSlots = nSlots; // FIXME: always 1 slot for now
    
                    {cpp_code_resize_slots}
                }}
    
            void Initialise() {{
                // initialize fEntryRanges
                const auto chunkSize = fSize / fNSlots;
                auto start = 0UL;
                auto end = 0UL;
                for (auto i : ROOT::TSeqUL(fNSlots)) {{
                    start = end;
                    end += chunkSize;
                    fEntryRanges.emplace_back(start, end);
                    (void)i;
                 }}
                 fEntryRanges.back().second += fSize % fNSlots;
            }}
    
            const std::vector<std::string> &GetColumnNames() const {{
                return fColNames;
            }}
    
            bool
            HasColumn(std::string_view colName) const {{
                const auto key = std::string(colName);
                const auto endIt = fColTypesMap.end();
                return endIt != fColTypesMap.find(key);
            }}
    
            std::string
            GetTypeName(std::string_view colName) const {{
                const auto key = std::string(colName);
                return fColTypesMap.at(key);
            }}
    
            std::vector<std::pair<ULong64_t, ULong64_t>> GetEntryRanges() {{
                auto entryRanges(std::move(fEntryRanges)); // empty fEntryRanges
                return entryRanges;
            }}
    
            bool SetEntry(unsigned int slot, ULong64_t entry) {{
                {cpp_code_entries}
                return true;
            }}
        }};
    
        ROOT::RDataFrame* MakeAwkwardArrayDS_{array_data_source}(PyObject* lookup, ULong64_t size, std::initializer_list<ULong64_t> ptrs_list) {{
            return new ROOT::RDataFrame(std::make_unique<{array_data_source}>(std::forward<PyObject*>(lookup), size, ptrs_list));
        }}
    
    }}
                """
    
                done = compile(cpp_code)
>               assert done is True
E               AssertionError

array_data_source = 'AwkwardArrayDataSource_8748206908428'
cpp_code   = ('\n'
 'namespace awkward {\n'
 '\n'
 '    class AwkwardArrayDataSource_8748206908428 final\n'
 '      : public ROOT::RDF::RDataSource {\n'
 '    private:\n'
 '        ULong64_t fSize = 0ULL;\n'
 '        std::vector<ULong64_t> fPtrs;\n'
 '        unsigned int fNSlots{0U};\n'
 '        const std::vector<std::string> fColNames;\n'
 '        const std::vector<std::string> fColTypeNames;\n'
 '        const std::map<std::string, std::string> fColTypesMap;\n'
 '        std::vector<std::pair<ULong64_t, ULong64_t>> fEntryRanges{ };\n'
 '\n'
 '        \n'
 '        ULong64_t fPtrs_one = 0;\n'
 '        std::vector<std::optional<double>>  slots_one;\n'
 '        std::vector<std::optional<double>*> addrs_one;\n'
 '    \n'
 '\n'
 '        PyObject* fPyLookup;\n'
 '\n'
 '        Record_t\n'
 '        GetColumnReadersImpl(std::string_view name, const std::type_info '
 '&id) {\n'
 '            Record_t reader;\n'
 '\n'
 '            \n'
 '        if (name == "one") {\n'
 '            for (auto i : ROOT::TSeqU(fNSlots)) {\n'
 '                addrs_one[i] = &slots_one[i];\n'
 '                reader.emplace_back((void *)(&addrs_one[i]));\n'
 '            }\n'
 '        }\n'
 '    else \n'
 '            {\n'
 '                for (auto i : ROOT::TSeqU(fNSlots)) {\n'
 '                    reader.emplace_back(nullptr);\n'
 '                }\n'
 '            }\n'
 '            return reader;\n'
 '        }\n'
 '\n'
 '    public:\n'
 '        AwkwardArrayDataSource_8748206908428(PyObject* lookup, ULong64_t '
 'size, std::initializer_list<ULong64_t> ptrs_list)\n'
 '          : fSize(size),\n'
 '            fPtrs({ptrs_list}),\n'
 '            fColNames({"one"}),\n'
 '            fColTypeNames({"std::optional<double>"}),\n'
 '            fColTypesMap({{ "one", "std::optional<double>" }}),\n'
 '            fPyLookup(lookup)\n'
 '            {\n'
 '                Py_INCREF(fPyLookup);\n'
 '                \n'
 '        fPtrs_one = fPtrs[0];\n'
 '    \n'
 '            }\n'
 '\n'
 '            ~AwkwardArrayDataSource_8748206908428() {\n'
 '                Py_DECREF(fPyLookup);\n'
 '            }\n'
 '\n'
 '            void SetNSlots(unsigned int nSlots) {\n'
 '                fNSlots = nSlots; // FIXME: always 1 slot for now\n'
 '\n'
 '                \n'
 '        slots_one.resize(fNSlots);\n'
 '        addrs_one.resize(fNSlots);\n'
 '    \n'
 '            }\n'
 '\n'
 '        void Initialise() {\n'
 '            // initialize fEntryRanges\n'
 '            const auto chunkSize = fSize / fNSlots;\n'
 '            auto start = 0UL;\n'
 '            auto end = 0UL;\n'
 '            for (auto i : ROOT::TSeqUL(fNSlots)) {\n'
 '                start = end;\n'
 '                end += chunkSize;\n'
 '                fEntryRanges.emplace_back(start, end);\n'
 '                (void)i;\n'
 '             }\n'
 '             fEntryRanges.back().second += fSize % fNSlots;\n'
 '        }\n'
 '\n'
 '        const std::vector<std::string> &GetColumnNames() const {\n'
 '            return fColNames;\n'
 '        }\n'
 '\n'
 '        bool\n'
 '        HasColumn(std::string_view colName) const {\n'
 '            const auto key = std::string(colName);\n'
 '            const auto endIt = fColTypesMap.end();\n'
 '            return endIt != fColTypesMap.find(key);\n'
 '        }\n'
 '\n'
 '        std::string\n'
 '        GetTypeName(std::string_view colName) const {\n'
 '            const auto key = std::string(colName);\n'
 '            return fColTypesMap.at(key);\n'
 '        }\n'
 '\n'
 '        std::vector<std::pair<ULong64_t, ULong64_t>> GetEntryRanges() {\n'
 '            auto entryRanges(std::move(fEntryRanges)); // empty '
 'fEntryRanges\n'
 '            return entryRanges;\n'
 '        }\n'
 '\n'
 '        bool SetEntry(unsigned int slot, ULong64_t entry) {\n'
 '            \n'
 '        slots_one[slot] = awkward::BitMaskedArray_RdvW8Ep8HwU(0, fSize, 0, '
 'reinterpret_cast<ssize_t*>(fPtrs_one))[entry];\n'
 '    \n'
 '            return true;\n'
 '        }\n'
 '    };\n'
 '\n'
 '    ROOT::RDataFrame* '
 'MakeAwkwardArrayDS_AwkwardArrayDataSource_8748206908428(PyObject* lookup, '
 'ULong64_t size, std::initializer_list<ULong64_t> ptrs_list) {\n'
 '        return new '
 'ROOT::RDataFrame(std::make_unique<AwkwardArrayDataSource_8748206908428>(std::forward<PyObject*>(lookup), '
 'size, ptrs_list));\n'
 '    }\n'
 '\n'
 '}\n'
 '            ')
cpp_code_column_names = '"one"'
cpp_code_column_type_names = '"std::optional<double>"'
cpp_code_column_types_map = '{ "one", "std::optional<double>" }'
cpp_code_declare_slots = ('\n'
 '        ULong64_t fPtrs_one = 0;\n'
 '        std::vector<std::optional<double>>  slots_one;\n'
 '        std::vector<std::optional<double>*> addrs_one;\n'
 '    ')
cpp_code_define_readers = ('\n'
 '        if (name == "one") {\n'
 '            for (auto i : ROOT::TSeqU(fNSlots)) {\n'
 '                addrs_one[i] = &slots_one[i];\n'
 '                reader.emplace_back((void *)(&addrs_one[i]));\n'
 '            }\n'
 '        }\n'
 '    else ')
cpp_code_entries = ('\n'
 '        slots_one[slot] = awkward::BitMaskedArray_RdvW8Ep8HwU(0, fSize, 0, '
 'reinterpret_cast<ssize_t*>(fPtrs_one))[entry];\n'
 '    ')
cpp_code_init_slots = '\n        fPtrs_one = fPtrs[0];\n    '
cpp_code_resize_slots = '\n        slots_one.resize(fNSlots);\n        addrs_one.resize(fNSlots);\n    '
done       = False
k          = 1
key        = 'one'
layouts    = {'one': <BitMaskedArray valid_when='true' lsb_order='false' len='13'>
    <mask><Index dtype='uint8' len='2'>[240 168]</Index></mask>
    <content><NumpyArray dtype='float64' len='14'>
        [0.  1.  2.  3.  4.  5.  6.  7.  1.1 2.2 3.3 4.4 5.5 6.6]
    </NumpyArray></content>
</BitMaskedArray>}
self       = <awkward._v2._connect.rdataframe.to_rdataframe.DataSourceGenerator object at 0x7f4d9cde6ca0>

awkward/_v2/_connect/rdataframe/to_rdataframe.py:288: AssertionError
----------------------------- Captured stderr call -----------------------------
input_line_310:4:11: error: redefinition of 'AwkwardArrayDataSource_8748206908428'
    class AwkwardArrayDataSource_8748206908428 final
          ^
input_line_298:4:11: note: previous definition is here
    class AwkwardArrayDataSource_8748206908428 final
          ^
input_line_310:115:23: error: redefinition of 'MakeAwkwardArrayDS_AwkwardArrayDataSource_8748206908428'
    ROOT::RDataFrame* MakeAwkwardArrayDS_AwkwardArrayDataSource_8748206908428(PyObject* lookup, ULong64_t size, std::initializer_list<ULong64_t> ptrs_list) {
                      ^
input_line_298:115:23: note: previous definition is here
    ROOT::RDataFrame* MakeAwkwardArrayDS_AwkwardArrayDataSource_8748206908428(PyObject* lookup, ULong64_t size, std::initializer_list<ULong64_t> ptrs_list) {
                      ^

@ianna, do you have any idea how a class might accidentally get redefined? I'm a little stumped as to why it happened on my machine and not on yours (MacOS vs Linux?) and not in the tests (pip install vs localbuild.py?). I manually ran the tests, just to check it.

Could it be that the hash is missing some value that it should depend on? Like, is the hash calculation for AwkwardArrayDataSource going recursively through all the things it depends on, or is it only checking the top level? If so, then just putting a check before compile to avoid recompiling a previously existing AwkwardArrayDataSource would be the wrong thing to do because that would prevent different data sources from having different code, and the result would be confusing (probably segfaults, as they try to cast the data to the wrong types). If, on the other hand, the second class declaration really is the same code, then it should be prevented from being recompiled.

Actually, I'm no longer stumped: it's an intermittent problem: running several times, I haven't reproduced it. I do not believe that this was a hash collision (that I happened to get with probability 5.421010862427522e-20). Also, it's a rather low number to be a 64-bit hash: 8748206908428 << 18446744073709551615. Are the hashes being calculated correctly?

Since this is not a reproducible bug, the only thing you can do is scan the code carefully, looking for any state that the AwkwardArrayDataSource depends on that might be missing from the hash calculation. If you don't see any potential issues, then you can close this bug report, at least until we see it again.

@jpivarski jpivarski added the bug The problem described is something that must be fixed label Jun 13, 2022
@ianna
Copy link
Collaborator

ianna commented Jun 15, 2022

@jpivarski - I think, it is fixed in #1474

@jpivarski jpivarski linked a pull request Jun 15, 2022 that will close this issue
@jpivarski
Copy link
Member Author

I linked it so that when that is merged, this will be closed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants