[PyOV] Tensor constructor rework and data casting, adjust bf16 behavior #23263

jiwaszki · 2024-03-04T16:09:34Z

Details:

Added new constructor for Tensor class that resolves issues with up/down-casting of data, including custom bf16 type.
Added cast_data function that can up/down-cast data in available numpy formats, i.e. Tensor.data of bf16 is returning array in f16 that needs to be reinterpreted from bytes -- cast_data allows to cast it to f32 so conversion is not loosing any information and providing easy interface to get data in numpy format.
Adjusted returning values from inference calls to be optionally upcasted to f32 in cases of bf16 outputs. Using cast_bf16 flag determine what result will be produced. To keep backward compatibility, default value is False and returned array is of np.float16 type to be reinterpreted. By setting cast_bf16=True the data will be converted to np.float32 type, however it will loose sharing capabilities i.e. when user sets share_outputs=True, bf16 data will always be a copy.
Adjusted data dispatcher to utilize new Tensor capabilities.
TODO: tests for new constructor Tensor(data, *, dtype, share) and cast_data.

Tickets:

129930, part of 125433, 127292

…nsor class

mlukasze · 2024-03-11T11:01:15Z

@slyalin could you please take a look, especially for bf16->fp32 upcast partr?

@KodiaqQ is it possible to test it functionally at your side from branch?

slyalin · 2024-03-11T13:48:28Z

Adjusted returning values from inference calls to be automatically upcasted to f32 in cases of bf16 outputs.

Do we still have ability to take original bf16 tensor without any upcasts?

Am I right that we still lucking numpy/numpy#19808 in numpy and this is the main reason for adding this bf16 special behavior (and not because user just doesn't expect bf16 as model input/output preferring it to be an internal implementation detail)?

jiwaszki · 2024-03-11T18:43:31Z

Adjusted returning values from inference calls to be automatically upcasted to f32 in cases of bf16 outputs.

Do we still have ability to take original bf16 tensor without any upcasts?

Am I right that we still lucking numpy/numpy#19808 in numpy and this is the main reason for adding this bf16 special behavior (and not because user just doesn't expect bf16 as model input/output preferring it to be an internal implementation detail)?

@slyalin yes, the bfloat16 is not supported and I do not see it being supported in the near future.

For example, Torch is addressing this problem by making Tensor class a primary way to handle data.
For OV it's not that simple as commitment is already made to numpy array being default way of communicating with Python (ov.Tensor is more of a wrapper to the memory than standalone class with various functionalities). Also inference functions return numpy arrays instead of ov.Tensors.

When data is returned as float32, the resulting numpy array can be safely operated on without need to create specialized code for views on the data. This is the main reason of adding this behavior. Similar policies may apply to packed types in a future (i.e. automatic unpacking of u1 to uint8).

Do we still have ability to take original bf16 tensor without any upcasts?

For synchronous inference there is no easy way of doing that. Again, this raises the topic of adding ReturnPolicy (or adding yet another casting=True/False) to inference calls to address such cases (EDIT: added cast_bf16 to ensure backward compatibility).
It is possible to get ov.Tensor by using get_tensor() on InferRequest and getting the data to be reinterpreted. For asynchronous inference this way is already preferred, so concerns are only about sync version.

jiwaszki · 2024-03-18T11:20:56Z

@slyalin @akuporos Please review, thanks!

ilya-lavrenov · 2024-03-18T11:54:31Z

src/bindings/python/src/openvino/runtime/ie_api.py

@@ -133,7 +144,7 @@ def infer(
            self,
            inputs,
            is_shared=share_inputs,
-        ), share_outputs=share_outputs, decode_strings=decode_strings))
+        ), share_outputs=share_outputs, decode_strings=decode_strings, cast_bf16=cast_bf16))


why infer should be responsible for data conversion?
I think plugins need to support bf16 and accept it as input. Current flag looks like a WA

Please read supporting tickets and this comment: #23263 (comment)

It's not WA, it's a solution to the non-existing data type in numpy. This data conversion is also optional.

ilya-lavrenov · 2024-03-18T11:58:24Z

src/bindings/python/src/pyopenvino/core/common.hpp

+    if (dst_dtype == ov::element::boolean) {
+        auto tmp = ov::op::v0::Constant(dst_dtype,
+                                        array_helpers::get_shape(array),
+                                        array_helpers::array_as_vector_bool<T>(array));


here we perform double copy:

from array to vector

from vector to constant

why do we copy and not share data?

If share is not possible, the double copy can be avoided by:

auto src_tensor = ov::Tensor(dst_type, array_helpers::get_shape(array), array.data());
Will make view not copy, then just use memcpy to dst tensor.
Check if can work for bool type.

ilya-lavrenov · 2024-03-18T11:59:40Z

src/bindings/python/src/pyopenvino/core/infer_request.cpp

        },
        R"(
            Gets all outputs tensors of this InferRequest.

            Note: All string-based data is decoded by default.
+            Note: All bf16-based data is upcasted to fp32 by default.


what if I set output tensor as bf16? will it be ignored and new one is created?
If yes, it's against API rules

This function return dict were values are copied numpy arrays -- it can just be switched off. However, it was decided that strings will be decoded here -- let's try to have some uniformity between decisions.

Nothing changes with Model/CompiledModel/InferRequest while calling this.

ilya-lavrenov · 2024-03-18T12:00:49Z

src/bindings/python/src/pyopenvino/core/tensor.cpp

+                :param dtype: Desired type of Tensor. Leave undefined to inherit the dtype
+                              from numpy array. Casting will always result in copy!
+                              i.e. when creating Type.bf16 tensors or creating Type.f32 tensor
+                              from numpy.float64 array, data sharing will be disabled by defult.


I'm against of tensor to convert data.
Data conversion is part of graph execution process.

Python bindings are allowed only to perform bind C++ and Python APIs, but not perform extra computations outside of main infer

I think that extending user-friendly/Pythonic (or following well-known examples) interface and allowing proper data manipulation is more critical.

Let's take an example: https://pytorch.org/docs/stable/generated/torch.from_numpy.html
As you might see there is no way to create bfloat16 Tensor, the approach is following torch.from_numpy(array).to(torch.bfloat16).
Here you have definition of to: https://pytorch.org/docs/stable/generated/torch.Tensor.to.html#torch.Tensor.to

"Otherwise, the returned tensor is a copy of self with the desired torch.dtype and torch.device."

@KodiaqQ please can we have your insight here? As Python (and thus NNCF) perspective can be fundamentally different from C++.

I think that extending user-friendly/Pythonic (or following well-known examples) interface and allowing proper data manipulation is more critical.

Here you extend computations graph, e.g. introduce pre-post processing capabilities outside of main device graph.

In case of CUDA / PyTorch, you can trace your model with from_numy / to, etc and trace CUDA graph, which will contain such conversions as part of the model (and hence, it's optimized), but in OpenVINO it's not true, because you perform data conversions via slow ov::Constant implementations.

From NNCF's perspective, it might be very useful to have methods for data conversion somewhere in Python API.

@ilya-lavrenov so what is your proposal to address it in OpenVINO? How should bf16 data be correctly interpreted (from any possible array i.e. int16/float16/float32, with regards to values and not it's binary representation) and populate Tensors?

I don't see the full problem from the tickets. They just provide some code which doesn't work, but I don't understand how it's related to the fact that we store bf16 as fp16 and we need to change this approach.

ilya-lavrenov · 2024-03-18T13:46:07Z

src/bindings/python/src/pyopenvino/core/common.hpp

+    } else {
+        tmp = ov::op::v0::Constant(t).cast_vector<T>();
+    }
+    return py::array(dtype, t.get_shape(), tmp.data());


does it perform copy of tmp array? if yes, is it possible to fill data directly to py::array?

Maybe is possible to implement it like:

Suggested change

return py::array(dtype, t.get_shape(), tmp.data());

return py::array(dtype, t.get_shape(), t.data());

if py::array accepts void*, it looks like T & dtype is "same" type so tensor data should be interpreted correctly from void*.

ilya-lavrenov · 2024-03-18T13:47:34Z

src/bindings/python/src/pyopenvino/core/common.hpp

+            tmp.emplace_back(static_cast<T>(ov::bfloat16::from_bits(*(reinterpret_cast<uint16_t*>(t.data()) + i)))); 
+        }
+    } else {
+        tmp = ov::op::v0::Constant(t).cast_vector<T>();


here we create 2 temporary objects, which is very unoptimal

should we extend Constant to handle bf16 case as well?

Constant::cast_vector is supporting bf16.

auto tmp = std::make_shared<std::vector<T>>(ov::op::v0::Constant(t).cast_vector<T>());

then, is possible to py::Array extends life for tmp and just wrap in py::array to provide correct iface?

return py::array(dtype, t.get_shape(), tmp);

ilya-lavrenov · 2024-03-18T13:50:26Z

src/bindings/python/src/pyopenvino/core/infer_request.cpp

@@ -716,12 +722,13 @@ void regclass_InferRequest(py::module m) {
    cls.def_property_readonly(
        "results",
        [](InferRequestWrapper& self) {
-            return Common::outputs_to_dict(self, false, true);
+            return Common::outputs_to_dict(self, false, true, true);


why results are converted to fp32 by default?

#23263 (comment)

However, it was decided that strings will be decoded here -- let's try to have some uniformity between decisions.

ilya-lavrenov · 2024-03-18T13:58:18Z

src/bindings/python/src/pyopenvino/core/tensor.cpp

+                :param dtype: Desired type of Tensor. Leave undefined to inherit the dtype
+                              from numpy array. Casting will always result in copy!
+                              i.e. when creating Type.bf16 tensors or creating Type.f32 tensor
+                              from numpy.float64 array, data sharing will be disabled by defult.


I think that extending user-friendly/Pythonic (or following well-known examples) interface and allowing proper data manipulation is more critical.

Here you extend computations graph, e.g. introduce pre-post processing capabilities outside of main device graph.

In case of CUDA / PyTorch, you can trace your model with from_numy / to, etc and trace CUDA graph, which will contain such conversions as part of the model (and hence, it's optimized), but in OpenVINO it's not true, because you perform data conversions via slow ov::Constant implementations.

praasz · 2024-03-12T09:02:01Z

src/bindings/python/src/pyopenvino/core/common.cpp

+    } else if (dst_dtype == ov::element::f32) {
+        return array_from_tensor_t<float>(std::move(t), py::dtype("float32"));
+    }
+    if (dst_dtype == ov::element::f16) {


Suggested change

if (dst_dtype == ov::element::f16) {

else if (dst_dtype == ov::element::f16) {

?

praasz · 2024-03-12T09:02:56Z

src/bindings/python/src/pyopenvino/core/common.cpp

+
+py::array array_from_constant(ov::op::v0::Constant&& c, bool is_shared) {
+    // Get actual dtype from OpenVINO type:
+    auto ov_type = c.get_element_type();


Suggested change

auto ov_type = c.get_element_type();

const auto& ov_type = c.get_element_type();

praasz · 2024-03-12T09:05:41Z

src/bindings/python/src/pyopenvino/core/common.cpp

+    } else if (element_type == ov::element::u8 || element_type == ov::element::u1 || element_type == ov::element::u4 ||
+               element_type == ov::element::nf4) {
+        // WA for u1, u4, nf4, all returned as packed uint8 arrays
+        return _get_byte_strides<uint8_t>(shape);


The fundamental type for these type except u8 is int8_t so maybe use same type as in C++ implementation?

praasz · 2024-03-12T09:07:31Z

src/bindings/python/src/pyopenvino/core/common.cpp

+    if (dst_dtype == ov::element::bf16 || type_helpers::get_ov_type(array) != dst_dtype) {
+        tensor_helpers::fill_tensor(tensor, array, dst_dtype);
+        return tensor;
+    }
+    tensor_helpers::fill_tensor(tensor, array);


Suggested change

if (dst_dtype == ov::element::bf16 || type_helpers::get_ov_type(array) != dst_dtype) {

tensor_helpers::fill_tensor(tensor, array, dst_dtype);

return tensor;

}

tensor_helpers::fill_tensor(tensor, array);

if (dst_dtype == ov::element::bf16 || type_helpers::get_ov_type(array) != dst_dtype) {

tensor_helpers::fill_tensor(tensor, array, dst_dtype);

} else {

tensor_helpers::fill_tensor(tensor, array);

}

praasz · 2024-03-12T09:19:11Z

src/bindings/python/src/pyopenvino/core/common.hpp

+    std::vector<size_t> byte_strides;
+    std::vector<size_t> element_strides = ov::row_major_strides(s);
+    for (auto v : element_strides) {
+        byte_strides.push_back(static_cast<size_t>(v) * sizeof(T));
+    }
+    return byte_strides;


Suggested change

std::vector<size_t> byte_strides;

std::vector<size_t> element_strides = ov::row_major_strides(s);

for (auto v : element_strides) {

byte_strides.push_back(static_cast<size_t>(v) * sizeof(T));

}

return byte_strides;

auto byte_strides = ov::row_major_strides(s);

for (auto&& stride : byte_strides){

stride *= sizeof(T);

}

return byte_strides;

praasz · 2024-03-19T12:13:41Z

src/bindings/python/src/pyopenvino/graph/ops/constant.cpp

+        static_cast<size_t>(c.get_element_type().size()),               /* Size of one scalar */
+        std::string(1, 'H'),                                            /* Python struct-style format descriptor */
+        static_cast<size_t>(shape.size()),                              /* Number of dimensions */
+        std::vector<size_t>{shape.begin(), shape.end()},                /* Buffer dimensions */


Why make copy? the ov::Shape can be casted to this kind of vector is base class for Shape.

praasz · 2024-03-19T12:24:09Z

src/bindings/python/src/pyopenvino/core/common.hpp

+    result.reserve(array.size());
+
+    for(long int i = 0; i < array.size(); i++) {
+        result.emplace_back(*(static_cast<T*>(const_cast<void*>(array.data())) + i) != 0 ? 1 : 0);


Why const cast is required?
is array.data() pointing to bool type.
Is possible to change this loop to something similar like:

std::transform(array.begin(), array.end(), std::back_inserter(result), [](const bool v){ return static_cast<char>(v);});

cast bool value will return 0 or 1.

praasz · 2024-03-19T12:25:25Z

src/bindings/python/src/pyopenvino/core/common.hpp

+
+template <typename T>
+std::vector<T> array_as_vector(py::array& array){
+    T *ptr = static_cast<T*>(const_cast<void*>(array.data()));


Why const_cast is required?
The data is just read so still can be const.

praasz · 2024-03-19T12:34:31Z

src/bindings/python/src/pyopenvino/core/common.hpp

+py::array array_from_tensor_t(ov::Tensor&& t, py::dtype&& dtype) {
+    std::vector<T> tmp;
+    if (t.get_element_type() == ov::element::bf16) {
+        for (size_t i = 0; i < t.get_size(); i++) {


can be converted to std::transform?
Get pointer to tensor before loop and use it.
auto t_ptr = reinterpret_cast<const uint16_t*>(t.data());
instead taking it every iteration.

praasz · 2024-03-19T12:43:09Z

src/bindings/python/src/pyopenvino/core/common.cpp

+    auto element_type = self.get_element_type();
+    auto shape = self.get_shape();


Suggested change

auto element_type = self.get_element_type();

auto shape = self.get_shape();

const auto& element_type = self.get_element_type();

const auto& shape = self.get_shape();

the strides could be calculate like:

auto strides = ov::row_major_strides(s) // then mul each stride in strides by element_type.size()

github-actions · 2024-04-15T02:34:24Z

This PR will be closed in a week because of 2 weeks of no activity.

jiwaszki · 2024-04-18T08:13:15Z

Closed in favor of #23771

jiwaszki added 2 commits March 4, 2024 16:53

[PyOV] Adjust bf16 behavior, add cast_data and new constructor for Te…

5f6d82d

…nsor class

Clang format

413e072

jiwaszki added category: Python API OpenVINO Python bindings WIP work in progress labels Mar 4, 2024

jiwaszki added this to the 2024.1 milestone Mar 4, 2024

jiwaszki requested review from akuporos, slyalin, nikita-malininn, mlukasze and praasz March 4, 2024 16:09

jiwaszki self-assigned this Mar 4, 2024

jiwaszki added 3 commits March 5, 2024 09:37

Fix windows compilation

14800c5

Fix Python spaces

1cf9d4a

Merge remote-tracking branch 'upstream/master' into jiwaszki/fix_bf16

ded47e3

jiwaszki mentioned this pull request Mar 12, 2024

[core] Low precision element iterator and u2, u3, u6 types #23279

Merged

jiwaszki added 3 commits March 13, 2024 16:44

Fix cases for boolean values and numpy-style casting

0926bfb

Remove comment from test

ba3db49

Fix windows error

191a2db

jiwaszki marked this pull request as ready for review March 14, 2024 14:53

jiwaszki requested a review from a team as a code owner March 14, 2024 14:53

jiwaszki added 3 commits March 14, 2024 16:18

Fix casting error

f4d5a1f

Add upcasting of bf16 as flag to infer methods

48311ec

Codestyle cpp

9f60ad7

ilya-lavrenov requested a review from AlexKoff88 March 18, 2024 11:53

ilya-lavrenov requested changes Mar 18, 2024

View reviewed changes

ilya-lavrenov reviewed Mar 18, 2024

View reviewed changes

jiwaszki removed the WIP work in progress label Mar 18, 2024

praasz requested changes Mar 19, 2024

View reviewed changes

mlukasze added the Code Freeze label Mar 20, 2024

sammysun0711 mentioned this pull request Mar 28, 2024

[PyOV] Add compile_model from model and weight buffer in memory #23571

Merged

mlukasze added do not merge and removed Code Freeze labels Mar 29, 2024

mlukasze removed this from the 2024.1 milestone Mar 29, 2024

github-actions bot added the Stale label Apr 15, 2024

jiwaszki closed this Apr 18, 2024

	return py::array(dtype, t.get_shape(), tmp.data());
	return py::array(dtype, t.get_shape(), t.data());

	if (dst_dtype == ov::element::f16) {
	else if (dst_dtype == ov::element::f16) {

	auto ov_type = c.get_element_type();
	const auto& ov_type = c.get_element_type();

		auto element_type = self.get_element_type();
		auto shape = self.get_shape();

[PyOV] Tensor constructor rework and data casting, adjust bf16 behavior #23263

[PyOV] Tensor constructor rework and data casting, adjust bf16 behavior #23263

Conversation

jiwaszki commented Mar 4, 2024 • edited Loading

Details:

Tickets:

mlukasze commented Mar 11, 2024

slyalin commented Mar 11, 2024 • edited Loading

jiwaszki commented Mar 11, 2024 • edited Loading

jiwaszki commented Mar 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiwaszki Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Apr 15, 2024

jiwaszki commented Apr 18, 2024

jiwaszki commented Mar 4, 2024 •

edited

Loading

slyalin commented Mar 11, 2024 •

edited

Loading

jiwaszki commented Mar 11, 2024 •

edited

Loading

jiwaszki Mar 18, 2024 •

edited

Loading