Low Latency transformation (openvinotoolkit#2869)

* initial draft of adding sinks to ngraph::Function * style fixes * code style fixes * code style fixes * code style fix * review fix+build fix * code style fix * fix build * API changed according to latest discussion * review fixes * review fixes + tests * initial draft of adding sinks to ngraph::Function * style fixes * code style fixes * code style fixes * code style fix * review fix+build fix * code style fix * fix build * API changed according to latest discussion * review fixes * review fixes + tests * added 1 more ctor * style fixes * used new api in ir parser * fixed build * update low latency transformation, fix unroll transformation, add unit tests, modify subgraph tests * fix low latency transformation * Update low latency transformation, unit and sub-graph tests * update LowLatency transformation and tests * ngraph codestyle * fix build, update description * resolve review remarks Co-authored-by: Svetlana Dolinina <[email protected]>
mryzhov · Dec 15, 2020 · 8432449 · 8432449
1 parent aa5dca0
commit 8432449
Show file tree

Hide file tree

Showing 24 changed files with 1,023 additions and 93 deletions.
diff --git a/inference-engine/include/ie_transformations.hpp b/inference-engine/include/ie_transformations.hpp
@@ -0,0 +1,56 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+/**
+ * @brief This header file defines the list of public transformations.
+ *
+ * @file ie_transformations.hpp
+ */
+
+#pragma once
+
+#include <ie_api.h>
+#include <cpp/ie_cnn_network.h>
+
+namespace InferenceEngine {
+
+/**
+ * @brief The transformation finds all TensorIterator layers in the network, processes all back
+ * edges that describe a connection between Result and Parameter of the TensorIterator body,
+ * and inserts ReadValue layer between Parameter and the next layers after this Parameter,
+ * and Assign layer after the layers before the Result layer.
+ * Supported platforms: CPU, GNA.
+ *
+ *  The example below describes the changes to the inner part (body, back edges) of the TensorIterator layer.
+ *  [] - TensorIterator body
+ *  () - new layer
+ *
+ *  before applying the transformation:
+ *  back_edge_1 -> [Parameter -> some layers ... -> Result ] -> back_edge_1
+ *
+ *  after applying the transformation:
+ *  back_edge_1 -> [Parameter -> (ReadValue layer) -> some layers ... -> (Assign layer) ]
+ *                                                              \
+ *                                                               -> Result ] -> back_edge_1
+ *
+ *  It is recommended to use this transformation in conjunction with the Reshape feature to set sequence
+ *  dimension to 1 and with the UnrollTensorIterator transformation.
+ *  For convenience, we have already enabled the unconditional execution of the UnrollTensorIterator
+ *  transformation when using the LowLatency transformation for CPU, GNA plugins, no action is required here.
+ *  After applying both of these transformations, the resulting network can be inferred step by
+ *  step, the states will store between inferences.
+ *
+ *    An illustrative example, not real API:
+ *
+ *    network->reshape(...) // Set sequence dimension to 1, recalculating shapes. Optional, depends on the network.
+ *    LowLatency(network)   // Applying LowLatency and UnrollTensorIterator transformations.
+ *    network->infer (...)  // Calculating new values for states.
+ *    // All states are stored between inferences via Assign, ReadValue layers.
+ *    network->infer (...)  // Using stored states, calculating new values for states.
+ *
+ * @param network A network to apply LowLatency transformation
+ * *
+ */
+INFERENCE_ENGINE_API_CPP(void) LowLatency(InferenceEngine::CNNNetwork& network);
+} // namespace InferenceEngine
diff --git a/inference-engine/include/inference_engine.hpp b/inference-engine/include/inference_engine.hpp
@@ -8,6 +8,7 @@
  */
 #pragma once
 
+#include "ie_transformations.hpp"
 #include "ie_plugin_config.hpp"
 #include "ie_compound_blob.h"
 #include "ie_core.hpp"
diff --git a/inference-engine/src/gna_plugin/CMakeLists.txt b/inference-engine/src/gna_plugin/CMakeLists.txt
@@ -31,7 +31,8 @@ ie_add_plugin(NAME ${TARGET_NAME}
 # saving rpath to GNA shared library be used by CI
 log_rpath_from_dir(GNA ${libGNA_LIBRARIES_BASE_PATH})
 
-target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_legacy Threads::Threads libGNA)
+target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_legacy inference_engine_transformations
+        Threads::Threads libGNA)
 target_include_directories(${TARGET_NAME} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
 
 target_compile_definitions(${TARGET_NAME}
@@ -57,7 +58,7 @@ target_compile_definitions(${TARGET_NAME}_test_static
             INTEGER_LOW_P
             USE_STATIC_IE)
 
-target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s libGNA::API)
+target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s inference_engine_transformations libGNA::API)
 target_include_directories(${TARGET_NAME}_test_static PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}
     $<TARGET_PROPERTY:inference_engine_legacy,INTERFACE_INCLUDE_DIRECTORIES>)
 set_target_properties(${TARGET_NAME}_test_static PROPERTIES COMPILE_PDB_NAME ${TARGET_NAME}_test_static)

diff --git a/inference-engine/src/gna_plugin/gna_graph_compiler.cpp b/inference-engine/src/gna_plugin/gna_graph_compiler.cpp
@@ -2137,7 +2137,7 @@ GNAPluginNS::ConnectionDetails GNAGraphCompiler::connectInput(CNNLayerPtr layer,
 
     auto prevMemoryLayer =
             std::find_if(begin(memory_connection), end(memory_connection), [&](MemoryConnection::value_type &comp) {
-                return comp.second.getInput()->name == prevLayer->name;
+                return comp.second.getInput()->params.at("id") == prevLayer->params.at("id");
             });
     if (prevMemoryLayer != memory_connection.end()) {
         // dnnLayer that is input for memory output layer

diff --git a/inference-engine/src/gna_plugin/gna_plugin.cpp b/inference-engine/src/gna_plugin/gna_plugin.cpp
@@ -38,6 +38,18 @@
 #include "gna_model_serial.hpp"
 #include "runtime/gna_float_runtime.hpp"
 
+#include <generic_ie.hpp>
+#include <ngraph/pass/manager.hpp>
+#include <legacy/convert_function_to_cnn_network.hpp>
+#include <legacy/transformations/convert_opset1_to_legacy/convert_opset1_to_legacy.hpp>
+#include <legacy/transformations/convert_opset1_to_legacy/convert_prior_to_ie_prior.hpp>
+
+#include <transformations/common_optimizations/common_optimizations.hpp>
+#include <transformations/control_flow/unroll_tensor_iterator.hpp>
+#include <transformations/init_node_info.hpp>
+#include <transformations/opset_conversions/convert_opset3_to_opset2.hpp>
+#include <transformations/opset_conversions/convert_opset2_to_opset1.hpp>
+
 #if GNA_LIB_VER == 2
 #include <gna2-model-api.h>
 
@@ -342,7 +354,29 @@ void GNAPlugin::InitGNADevice() {
 void GNAPlugin::LoadNetwork(ICNNNetwork & _network) {
     std::shared_ptr<InferenceEngine::details::CNNNetworkImpl> convertedNetwork;
     if (_network.getFunction()) {
-        convertedNetwork = std::make_shared<InferenceEngine::details::CNNNetworkImpl>(_network);
+        std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(_network);
+        const auto& graph = clonedNetwork->getFunction();
+        // Disable shape inference (WA for generic operations)
+        ngraph::op::GenericIE::DisableReshape noReshape(graph);
+        ngraph::pass::Manager manager;
+        manager.register_pass<ngraph::pass::InitNodeInfo>();
+        // WA: ConvertPriorBox must be executed before the 1st ConstantFolding pass
+        manager.register_pass<ngraph::pass::ConvertPriorBox>();
+        manager.register_pass<ngraph::pass::CommonOptimizations>();
+        manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
+        manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();
+        manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
+        // UnrollTI should be the last transformation in the transformation pipeline
+        manager.register_pass<ngraph::pass::UnrollTensorIterator>();
+
+        const auto& pass_config = manager.get_pass_config();
+        pass_config->set_callback<ngraph::pass::UnrollTensorIterator>(
+                [](const std::shared_ptr<const ngraph::Node> &node) -> bool {
+                    // UnrollTI transformation is disabled by default, is turned on by LowLatency transformation
+                    return node->get_rt_info().count("UNROLL_TI") == 0;
+            });
+        manager.run_passes(graph);
+        convertedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(graph, *clonedNetwork);
     }
     InferenceEngine::ICNNNetwork &network = convertedNetwork ? *convertedNetwork : _network;
 

diff --git a/inference-engine/src/inference_engine/ie_transformations.cpp b/inference-engine/src/inference_engine/ie_transformations.cpp
@@ -0,0 +1,16 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "ie_transformations.hpp"
+#include <ngraph/pass/low_latency.hpp>
+#include <ngraph/pass/manager.hpp>
+
+using namespace InferenceEngine;
+
+void InferenceEngine::LowLatency(InferenceEngine::CNNNetwork &network) {
+    auto function = network.getFunction();
+    ngraph::pass::Manager manager;
+    manager.register_pass<ngraph::pass::LowLatency>();
+    manager.run_passes(function);
+}
diff --git a/inference-engine/src/mkldnn_plugin/mkldnn_plugin.cpp b/inference-engine/src/mkldnn_plugin/mkldnn_plugin.cpp
@@ -32,6 +32,7 @@
 
 #include <transformations/common_optimizations/common_optimizations.hpp>
 #include <transformations/common_optimizations/depth_to_space_fusion.hpp>
+#include <transformations/control_flow/unroll_tensor_iterator.hpp>
 #include <transformations/op_conversions/convert_depth_to_space.hpp>
 #include <transformations/op_conversions/convert_space_to_depth.hpp>
 #include <transformations/op_conversions/convert_gelu.hpp>
@@ -177,6 +178,8 @@ static void Transformation(ICNNNetwork::Ptr& clonedNetwork, const Config& conf)
     ngraph::pass::Manager legacyManager;
     legacyManager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
     legacyManager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::i64, ngraph::element::i32);
+    // not legacy actually, but it should be the last transformation in the transformation pipeline
+    legacyManager.register_pass<ngraph::pass::UnrollTensorIterator>();
 
     auto legacyPassConfig = manager.get_pass_config();
     legacyPassConfig->set_callback<ngraph::pass::AddMultiplyFusion>([](const_node_ptr &node) -> bool {
@@ -193,6 +196,10 @@ static void Transformation(ICNNNetwork::Ptr& clonedNetwork, const Config& conf)
         return false;
     });
 
+    legacyManager.get_pass_config()->set_callback<ngraph::pass::UnrollTensorIterator>([](const_node_ptr &node) -> bool {
+        // UnrollTI transformation is disabled by default, is turned on by LowLatency transformation
+        return node->get_rt_info().count("UNROLL_TI") == 0;
+    });
     legacyManager.run_passes(nGraphFunc);
 
     clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);

diff --git a/...e-engine/src/transformations/include/transformations/common_optimizations/low_latency.hpp b/...e-engine/src/transformations/include/transformations/common_optimizations/low_latency.hpp
@@ -0,0 +1,5 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <ngraph/pass/low_latency.hpp>
diff --git a/...ngine/src/transformations/include/transformations/control_flow/unroll_tensor_iterator.hpp b/...ngine/src/transformations/include/transformations/control_flow/unroll_tensor_iterator.hpp
@@ -27,8 +27,8 @@ class TRANSFORMATIONS_API UnrollTensorIterator;
  * are added to the network.
  */
 
-class ngraph::pass::UnrollTensorIterator: public ngraph::pass::MatcherPass {
+class ngraph::pass::UnrollTensorIterator: public ngraph::pass::FunctionPass {
 public:
     NGRAPH_RTTI_DECLARATION;
-    UnrollTensorIterator();
+    bool run_on_function(std::shared_ptr<Function>) override;
 };
diff --git a/...ce-engine/src/transformations/src/transformations/control_flow/unroll_tensor_iterator.cpp b/...ce-engine/src/transformations/src/transformations/control_flow/unroll_tensor_iterator.cpp
@@ -15,20 +15,19 @@
 
 NGRAPH_RTTI_DEFINITION(ngraph::pass::UnrollTensorIterator, "UnrollTensorIterator", 0);
 
-ngraph::pass::UnrollTensorIterator::UnrollTensorIterator() : MatcherPass() {
-    auto tensor_iterator = ngraph::pattern::wrap_type<ngraph::opset4::TensorIterator>();
-    ngraph::matcher_pass_callback callback = [this](pattern::Matcher& m) {
-        auto ti = std::dynamic_pointer_cast<ngraph::opset4::TensorIterator>(m.get_match_root());
-        if (!ti) {
-            return false;
+bool ngraph::pass::UnrollTensorIterator::run_on_function(std::shared_ptr<ngraph::Function> f) {
+    for (const auto& op : f->get_ops()) {
+        auto ti = std::dynamic_pointer_cast<ngraph::opset4::TensorIterator>(op);
+        if (!ti || m_transformation_callback(ti)) {
+            continue;
         }
 
         const auto function = ti->get_body();
         auto num_iter = ti->get_num_iterations();
 
         // negative value means inconsistent TI
         if (num_iter <= -1) {
-            return false;
+            continue;
         }
 
         // Create copies of the TensorIterator body, the number of copies is equal to the number of iterations.
@@ -183,9 +182,9 @@ ngraph::pass::UnrollTensorIterator::UnrollTensorIterator() : MatcherPass() {
             }
         }
 
-        return true;
-    };
-
-    auto m = std::make_shared<ngraph::pattern::Matcher>(tensor_iterator, "UnrollTensorIterator");
-    register_matcher(m, callback);
+        for (const auto& body_func : body_functions) {
+            f->add_sinks(body_func->get_sinks());
+        }
+    }
+    return true;
 }