Skip to content

Commit

Permalink
Low Latency transformation (openvinotoolkit#2869)
Browse files Browse the repository at this point in the history
* initial draft of adding sinks to ngraph::Function

* style fixes

* code style fixes

* code style fixes

* code style fix

* review fix+build fix

* code style fix

* fix build

* API changed according to latest discussion

* review fixes

* review fixes + tests

* initial draft of adding sinks to ngraph::Function

* style fixes

* code style fixes

* code style fixes

* code style fix

* review fix+build fix

* code style fix

* fix build

* API changed according to latest discussion

* review fixes

* review fixes + tests

* added 1 more ctor

* style fixes

* used new api in ir parser

* fixed build

* update low latency transformation, fix unroll transformation, add unit tests, modify subgraph tests

* fix low latency transformation

* Update low latency transformation, unit and sub-graph tests

* update LowLatency transformation and tests

* ngraph codestyle

* fix build, update description

* resolve review remarks

Co-authored-by: Svetlana Dolinina <[email protected]>
  • Loading branch information
2 people authored and mryzhov committed Dec 15, 2020
1 parent aa5dca0 commit 8432449
Show file tree
Hide file tree
Showing 24 changed files with 1,023 additions and 93 deletions.
56 changes: 56 additions & 0 deletions inference-engine/include/ie_transformations.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

/**
* @brief This header file defines the list of public transformations.
*
* @file ie_transformations.hpp
*/

#pragma once

#include <ie_api.h>
#include <cpp/ie_cnn_network.h>

namespace InferenceEngine {

/**
* @brief The transformation finds all TensorIterator layers in the network, processes all back
* edges that describe a connection between Result and Parameter of the TensorIterator body,
* and inserts ReadValue layer between Parameter and the next layers after this Parameter,
* and Assign layer after the layers before the Result layer.
* Supported platforms: CPU, GNA.
*
* The example below describes the changes to the inner part (body, back edges) of the TensorIterator layer.
* [] - TensorIterator body
* () - new layer
*
* before applying the transformation:
* back_edge_1 -> [Parameter -> some layers ... -> Result ] -> back_edge_1
*
* after applying the transformation:
* back_edge_1 -> [Parameter -> (ReadValue layer) -> some layers ... -> (Assign layer) ]
* \
* -> Result ] -> back_edge_1
*
* It is recommended to use this transformation in conjunction with the Reshape feature to set sequence
* dimension to 1 and with the UnrollTensorIterator transformation.
* For convenience, we have already enabled the unconditional execution of the UnrollTensorIterator
* transformation when using the LowLatency transformation for CPU, GNA plugins, no action is required here.
* After applying both of these transformations, the resulting network can be inferred step by
* step, the states will store between inferences.
*
* An illustrative example, not real API:
*
* network->reshape(...) // Set sequence dimension to 1, recalculating shapes. Optional, depends on the network.
* LowLatency(network) // Applying LowLatency and UnrollTensorIterator transformations.
* network->infer (...) // Calculating new values for states.
* // All states are stored between inferences via Assign, ReadValue layers.
* network->infer (...) // Using stored states, calculating new values for states.
*
* @param network A network to apply LowLatency transformation
* *
*/
INFERENCE_ENGINE_API_CPP(void) LowLatency(InferenceEngine::CNNNetwork& network);
} // namespace InferenceEngine
1 change: 1 addition & 0 deletions inference-engine/include/inference_engine.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
*/
#pragma once

#include "ie_transformations.hpp"
#include "ie_plugin_config.hpp"
#include "ie_compound_blob.h"
#include "ie_core.hpp"
5 changes: 3 additions & 2 deletions inference-engine/src/gna_plugin/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ ie_add_plugin(NAME ${TARGET_NAME}
# saving rpath to GNA shared library be used by CI
log_rpath_from_dir(GNA ${libGNA_LIBRARIES_BASE_PATH})

target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_legacy Threads::Threads libGNA)
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_legacy inference_engine_transformations
Threads::Threads libGNA)
target_include_directories(${TARGET_NAME} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})

target_compile_definitions(${TARGET_NAME}
Expand All @@ -57,7 +58,7 @@ target_compile_definitions(${TARGET_NAME}_test_static
INTEGER_LOW_P
USE_STATIC_IE)

target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s libGNA::API)
target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s inference_engine_transformations libGNA::API)
target_include_directories(${TARGET_NAME}_test_static PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}
$<TARGET_PROPERTY:inference_engine_legacy,INTERFACE_INCLUDE_DIRECTORIES>)
set_target_properties(${TARGET_NAME}_test_static PROPERTIES COMPILE_PDB_NAME ${TARGET_NAME}_test_static)
Expand Down
2 changes: 1 addition & 1 deletion inference-engine/src/gna_plugin/gna_graph_compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2137,7 +2137,7 @@ GNAPluginNS::ConnectionDetails GNAGraphCompiler::connectInput(CNNLayerPtr layer,

auto prevMemoryLayer =
std::find_if(begin(memory_connection), end(memory_connection), [&](MemoryConnection::value_type &comp) {
return comp.second.getInput()->name == prevLayer->name;
return comp.second.getInput()->params.at("id") == prevLayer->params.at("id");
});
if (prevMemoryLayer != memory_connection.end()) {
// dnnLayer that is input for memory output layer
Expand Down
36 changes: 35 additions & 1 deletion inference-engine/src/gna_plugin/gna_plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,18 @@
#include "gna_model_serial.hpp"
#include "runtime/gna_float_runtime.hpp"

#include <generic_ie.hpp>
#include <ngraph/pass/manager.hpp>
#include <legacy/convert_function_to_cnn_network.hpp>
#include <legacy/transformations/convert_opset1_to_legacy/convert_opset1_to_legacy.hpp>
#include <legacy/transformations/convert_opset1_to_legacy/convert_prior_to_ie_prior.hpp>

#include <transformations/common_optimizations/common_optimizations.hpp>
#include <transformations/control_flow/unroll_tensor_iterator.hpp>
#include <transformations/init_node_info.hpp>
#include <transformations/opset_conversions/convert_opset3_to_opset2.hpp>
#include <transformations/opset_conversions/convert_opset2_to_opset1.hpp>

#if GNA_LIB_VER == 2
#include <gna2-model-api.h>

Expand Down Expand Up @@ -342,7 +354,29 @@ void GNAPlugin::InitGNADevice() {
void GNAPlugin::LoadNetwork(ICNNNetwork & _network) {
std::shared_ptr<InferenceEngine::details::CNNNetworkImpl> convertedNetwork;
if (_network.getFunction()) {
convertedNetwork = std::make_shared<InferenceEngine::details::CNNNetworkImpl>(_network);
std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(_network);
const auto& graph = clonedNetwork->getFunction();
// Disable shape inference (WA for generic operations)
ngraph::op::GenericIE::DisableReshape noReshape(graph);
ngraph::pass::Manager manager;
manager.register_pass<ngraph::pass::InitNodeInfo>();
// WA: ConvertPriorBox must be executed before the 1st ConstantFolding pass
manager.register_pass<ngraph::pass::ConvertPriorBox>();
manager.register_pass<ngraph::pass::CommonOptimizations>();
manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();
manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
// UnrollTI should be the last transformation in the transformation pipeline
manager.register_pass<ngraph::pass::UnrollTensorIterator>();

const auto& pass_config = manager.get_pass_config();
pass_config->set_callback<ngraph::pass::UnrollTensorIterator>(
[](const std::shared_ptr<const ngraph::Node> &node) -> bool {
// UnrollTI transformation is disabled by default, is turned on by LowLatency transformation
return node->get_rt_info().count("UNROLL_TI") == 0;
});
manager.run_passes(graph);
convertedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(graph, *clonedNetwork);
}
InferenceEngine::ICNNNetwork &network = convertedNetwork ? *convertedNetwork : _network;

Expand Down
16 changes: 16 additions & 0 deletions inference-engine/src/inference_engine/ie_transformations.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#include "ie_transformations.hpp"
#include <ngraph/pass/low_latency.hpp>
#include <ngraph/pass/manager.hpp>

using namespace InferenceEngine;

void InferenceEngine::LowLatency(InferenceEngine::CNNNetwork &network) {
auto function = network.getFunction();
ngraph::pass::Manager manager;
manager.register_pass<ngraph::pass::LowLatency>();
manager.run_passes(function);
}
7 changes: 7 additions & 0 deletions inference-engine/src/mkldnn_plugin/mkldnn_plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@

#include <transformations/common_optimizations/common_optimizations.hpp>
#include <transformations/common_optimizations/depth_to_space_fusion.hpp>
#include <transformations/control_flow/unroll_tensor_iterator.hpp>
#include <transformations/op_conversions/convert_depth_to_space.hpp>
#include <transformations/op_conversions/convert_space_to_depth.hpp>
#include <transformations/op_conversions/convert_gelu.hpp>
Expand Down Expand Up @@ -177,6 +178,8 @@ static void Transformation(ICNNNetwork::Ptr& clonedNetwork, const Config& conf)
ngraph::pass::Manager legacyManager;
legacyManager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
legacyManager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::i64, ngraph::element::i32);
// not legacy actually, but it should be the last transformation in the transformation pipeline
legacyManager.register_pass<ngraph::pass::UnrollTensorIterator>();

auto legacyPassConfig = manager.get_pass_config();
legacyPassConfig->set_callback<ngraph::pass::AddMultiplyFusion>([](const_node_ptr &node) -> bool {
Expand All @@ -193,6 +196,10 @@ static void Transformation(ICNNNetwork::Ptr& clonedNetwork, const Config& conf)
return false;
});

legacyManager.get_pass_config()->set_callback<ngraph::pass::UnrollTensorIterator>([](const_node_ptr &node) -> bool {
// UnrollTI transformation is disabled by default, is turned on by LowLatency transformation
return node->get_rt_info().count("UNROLL_TI") == 0;
});
legacyManager.run_passes(nGraphFunc);

clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#include <ngraph/pass/low_latency.hpp>
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ class TRANSFORMATIONS_API UnrollTensorIterator;
* are added to the network.
*/

class ngraph::pass::UnrollTensorIterator: public ngraph::pass::MatcherPass {
class ngraph::pass::UnrollTensorIterator: public ngraph::pass::FunctionPass {
public:
NGRAPH_RTTI_DECLARATION;
UnrollTensorIterator();
bool run_on_function(std::shared_ptr<Function>) override;
};
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,19 @@

NGRAPH_RTTI_DEFINITION(ngraph::pass::UnrollTensorIterator, "UnrollTensorIterator", 0);

ngraph::pass::UnrollTensorIterator::UnrollTensorIterator() : MatcherPass() {
auto tensor_iterator = ngraph::pattern::wrap_type<ngraph::opset4::TensorIterator>();
ngraph::matcher_pass_callback callback = [this](pattern::Matcher& m) {
auto ti = std::dynamic_pointer_cast<ngraph::opset4::TensorIterator>(m.get_match_root());
if (!ti) {
return false;
bool ngraph::pass::UnrollTensorIterator::run_on_function(std::shared_ptr<ngraph::Function> f) {
for (const auto& op : f->get_ops()) {
auto ti = std::dynamic_pointer_cast<ngraph::opset4::TensorIterator>(op);
if (!ti || m_transformation_callback(ti)) {
continue;
}

const auto function = ti->get_body();
auto num_iter = ti->get_num_iterations();

// negative value means inconsistent TI
if (num_iter <= -1) {
return false;
continue;
}

// Create copies of the TensorIterator body, the number of copies is equal to the number of iterations.
Expand Down Expand Up @@ -183,9 +182,9 @@ ngraph::pass::UnrollTensorIterator::UnrollTensorIterator() : MatcherPass() {
}
}

return true;
};

auto m = std::make_shared<ngraph::pattern::Matcher>(tensor_iterator, "UnrollTensorIterator");
register_matcher(m, callback);
for (const auto& body_func : body_functions) {
f->add_sinks(body_func->get_sinks());
}
}
return true;
}
Loading

0 comments on commit 8432449

Please sign in to comment.