Track PLDI Training fork #8

MarisaKirisame · 2019-10-07T17:47:18Z

No description provided.

* Fix VM invoke with set_params * add test * tweak

The -mfloat-abi hard option does not work for LLVM < 6.0 as it is ignored. This adds a fatal error when using unsupported LLVM versions so that the failure is not silent.

Issue: git clone latest TVM/VTA and run VTA on xilinx FPGA board, application crashed due to the "call stack overflow" which caused by a infinite recursive function call. this issue ever happen before and get addressed by PR 3843. Analysis: seems like de10-nano driver PR used old code base then the logic change of 3843 get eliminated. Solution: add the logic back.

) if n_trial is larger then config space.

Use fdiv in the tests for the deduce_bound

…ls. (apache#4076) The current bounds checking infrastructure inserts checks like: ``` for (i, 0, bounds[n]) { if (likely(i < bounds[n]) { ... } } ``` into the TVM IR which is currently not removed by simplification infrastructure. This is a little unclean, as these are trivially true since for a loop var `i` with a given min and extent, we are guaranteed that `i >= min` and `i < min + extent`. Thus, we can insert these checks into the IR and use them to eliminate trivial bounds checks early on.

* add valid autotune * fix pylint

* [Relay][VM] Fix constant folding issue in VM compiler 1. allow pass params when compile a module 2. enhance profiler robustness * remove dead code * fix lint * add get_params * fix test * don't pass params back * remove get_params * docs * move compile function to api * compile clashes with builtin name * fix compilation error * remove dead code

The torch package depends on the future package but the torch wheel does not expose that dependency resulting in an inconsitent install. Ideally the wheel should declare all of its dependencies, I'm not sure why the packagers have choosen not to do this, for now the simple work around is to explicitly install the future package. Change-Id: Ic9f0f4bb4c78ab65706fc1b20c1b4fd287856a9e

…che#4074) * move the number of nodes constraint in op fusion up to the dom tree level * add test case of limiting the max number of ops to be fused * uncomment other test cases

…nd using MKLDNN. User can choose between the two at runtime. (apache#3945) - Added tests for new methods added.

…pache#4094)

* app init push * fix on readme * change name, add bit serial explanantion * rm serialLoadMM, change doc * syntax change for readme * add parallel test functionality * fix readme * add python doc * syntax

…onvolutions (apache#4039) * Add FIFO buffer op to enable explicit computation re-use in convolution * Add a test * Add end-to-end test with 1D convolution * Add a stub in MXNet frontend * Address reviewer comments * Add back stub for MXNet frontend

…m. (apache#4059)

…che#4060) * update build_gcn.py tutorial updates * support bias in GCN layer * download pretrained gcn model * verify model accuracy * use time_evaluator to measure runtime * fix adding bias in gcn layer * remove printing output * fix small bug * add DGL-PyTorch comparison into the build_gcn tutorial * add accuracy testing * adjust import order * handle different dgl versions * update number for dgl version checking

* [tvm][any] broadcast with values other than 1 * Add test for incompatible runtime values * Remove hybrid script compact buffer binding * retrigger ci

* Fix layer norm for fp16 * [Fix] Fix arange for fp16 * [Fix] Fix mxnet frontend for fp16 * [Fix] Fix arange for fp16 * remove comments * x * fix nnvm

…compilation (apache#4056) * overload half operators for cuda codegen * add float16 te test_op_level1 * fix test_op_level1.py * fix lint * disable fp16 test if gpu does not support * disable fp16 test if gpu does not support * bypass float16 test if gpu does not support float16

This implementation provides cast to limited number of dtypes that tflite currently supports for placeholder op. Add INT64 in the possible dtypes as it appears to be supported accrording to tlfite schema.

fix dense do save add grad save

* relay op strategy fix lint bitpack strategy bitserial_dense (#6) * update strategy * address comments fix a few topi test Dense strategy (#5) * dense * add biforst; remove comments * address comment Refactor x86 conv2d_NCHWc (#4) * Refactor x86 conv2d * Add x86 depthwise_conv2d_NCHWc * Add back topi x86 conv2d_nchw * Merge x86 conv2d_nchw and conv2d_NCHWc * Minor fix for x86 conv2d fix more strategy Add x86 conv2d_NCHWc_int8 strategy (#8) * Add x86 conv2d_NCHWc_int8 strategy * Remove contrib_conv2d_nchwc_int8 * Fix generic conv2d_NCHWc for int8 * Fix topi arm_cpu conv2d_NCHWc_int8 update x86 conv2d enable specify relay ops to be tuned for autotvm add cuda conv2d strategy add conv2d strategy for rocm add conv2d strategy for hls add conv2d strategy for arm cpu add conv2d strategy for mali add conv2d strategy for bifrost add conv2d strategy for intel graphics clean up and fix lint remove template keys from autotvm remove 2 in the func name address comments fix * fix bugs * lint * address comments * add name to op implement * Modify topi tests (#9) * Add pooling, reorg, softmax and vision * Add lrn * fix topi test * fix more topi test * lint * address comments * x * fix more tests & bugs * Modify more tests (#10) * Modify tests for bitserial_conv2d, bitserial_dense, bitserial_conv2d_rasp and bnn * Minor fix * More minor fix * fix more test * try to update vta using strategy * fix cpptest * x * fix rebase err * Fix two tests (#11) * change autotvm log format * lint * minor fix * try fix vta test * fix rebase err * tweak * tmp hack for vta pass * fix tutorial * fix * fix more tutorials * fix vta tutorial * minor * address comments * fix * address comments * fix cpptest * fix docs * change data structure name and api * address comments * lint * fix rebase err * updates * fix winograd test * fix doc * rebase * upgrade tophub version number * fix bug * re-enable vta tsim test after tophub is upgraded * fix vta test to use the correct args so the config can be found in tophub Co-authored-by: Yao Wang <[email protected]>

…generating (apache#5962) * Code migration Start (#1) * Init commit: Code migration Start * Add loop_state.cc/h * Add ComputeDAG basic test * Split transform_step out & Update more UTs (#3) * Split transform_step out * Update GetProducers & GetConsumers * Update UTs * Add UT for CacheReadWrite & Some bug fix * Add search_task, measure and serialization (#4) * Add FollowSplit & FollowFusedSplit tests * Update dag.InferBound & its UT * Add search_task, measure and serialization * Update Serialization UT * Add MetaTileRewritePolicy (#5) * Add feature * Add cost_model, meta_tile_rewrite_policy * Add MetaTileRewritePolicy basic UT * Basic Python API for State (#6) * Add Basic Python API for State * Add UTs for State * Add Python API: Measure & Task (#7) * Update the return value of state operation * Add task * Copy measure.py & utils.py * Fix LocalBuilder * Fix LocalRunner * Add ansor.auto_schedule() API; First AutoSchedule working version(#8) * Add basic Python support for ansor.auto_schedule * Update AutoSchedule API * Bug fix for get the attach point of a fused iter * Update UT after infer bug fix * Bug fix & Add python serialization API (#10) * Delete C++ UT hack since Python is ready * Add ndarray.non_empty * Update Serialization python API * Improve code style, python wrapper and test cases (#11) * Update c++ code style and unit test * Update python State wrapper and test cases * fix unit tests * Add RPCRunner & OpenCL/CUDA test (#12) * Add RPCRunner & OpenCL search test * Add CUDA search test * Add RPCRunner test * rebase to upstream/master * Add Ansor basic tutorial (apache#13) * Add basic tutorial * migrate feature extraction (apache#14) * Add XGBModel & RPCRunnerWarpper (apache#15) * Add XGBModel & RPCRunnerWarpper * Revert "Add Parallel Granularity Mutation" * Migrate workload_registry.py (apache#16) * add workload registry * update * update * add task scheduler (apache#17) * Add conv2d cuda tutorial with workload registry (apache#18) * add tune_test.py (the old tune_wkl.py) (apache#19) * add tune_test.py (the old tune_wkl.py) * update * fix measure * fix for gpu * Code refine for tune_test.py & Add a pre load callback (apache#20) * Bug fix for tutorials * Add PreLoadMeasuredStates * Add search_callback support for task tuner * Code refine for tune_test.py * Update * Update * Update * Update * Bug fix * Add python custom sketch rule (apache#21) * Add custom sketch rule * Bug fix * Ansor Relay Integration (without layout rewrite) (apache#22) * relay integration * Add tune_op_subgraph.py & Some code clean for tune_network.py (apache#23) * Add single op tune scripts * Add tune subgraph support * Merge all op & all subgraph to one file * Rename file * add explicit_unroll_max_extent (apache#25) * Add Index simplification & API update (apache#26) * Add vectorized cooperative_fetching test * Update math simplify for vectorized CF * File rename * Update tune_network * API update * Update PreLoadMeasuredStates & Some bug fix (apache#27) * Add a threading wrapper to fix the test bug * Set default TVM_USE_AUTO_SCHEDULER to false * Update PreLoadMeasuredStates callback * Add tensorize step for loop_state (apache#31) * Add tensorize step * State python api update (apache#33) * Start to update api * Add compute_dag to state * API update * kernel layout rewrite (apache#28) * kernel layout rewrite * remove some hacks * add defuse_ops pass and move kernel_layout_rewrite pass after fuse_ops pass * set TVM_RELAY_DISABLE_BUILD_CACHE for task extraction and prepare_layout_rewrite * [cache flush] port cache flush to ansor (apache#32) * Improve relay integration (apache#34) * tmp checkpoint * Improve relay integration * Improve relay integration * Fix xgb error & Simplify dispatcher (apache#35) * Rename "MetaTileRewritePolicy" to "SketchPolicy". (apache#36) * Rename "MetaTileRewritePolicy" to "SketchPolicy". * Add a new class for auto_unroll_max_step, storage_offset in StageNode * fix tune_op_subgraph.py * rebase * Migrate all node::make to noderef's construct function (apache#37) * Start to move xxxnode::make to noderef() * Update * Update * Finish transform_step * Finish comute dag & auto schedule * Update * Update * Update * Update * Update * Code refine * Code refine * Code refine * Update * Update * Some lint fix & Recover the double constructor of tvm::PrimExpr (apache#39) * lint fix * clang-format-fix * pylint fix * Update * Recover the double constructor of tvm::PrimExpr * Fix pylint * pylint fix * pylint fix * Add MutateComputeLocation and MutateParallel in evolutionary search (apache#40) * Add MutateComputeLocation and MutateParallel in evolutionary search * fix lint * Improve loop state python API (stage_tensors -> stage_ops) (apache#41) * improve loop state python API (stage_tensors -> stage_ops) * fix * ComputeDAG bug fix & Add Custom TensorCore Matmul Example (apache#42) * Bug Fix * Sample example of Custom TensorCore Matmul * Rever Commits, Start to build minimum Ansor system * Code clean for minimum Ansor system * Bug fix & Delete AccessAnalyzer * Delete attachmap & Code clean * Doc update Update statenode::stages from vector to Array * Headfile update & Python doc update * clang-format fix * pylint fix * Update * Doc update * Update * Bug fix after code merge to the new master * clang-format fix * Update * Update * Update std::vector to Array; Update verbosity setting; Some commemts addressed * std::vector->Array & std::string->String * Add init_state to ComputeDAG * Update * Update some unordered_map to Map * clang-format fix * Comments addressed Delete ReplayAndInferBound Delete ReplaySteps & InferBoundCommon * Lint fix * Update * Update * Update * Update * Update * Update * Update * Update * Update * Rename ansor namespace to auto_schedule * Update * Rename ThreadPool to ParallelFor * Add parallel_for * Remove ThreadPool * Update python/tvm/auto_schedule/auto_schedule.py * trigger CI Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: Minmin Sun (孙敏敏) <[email protected]> Co-authored-by: Zhao Wu <[email protected]>

vinx13 and others added 30 commits October 8, 2019 10:17

[QNN] Refactor fixed point multiplication in requantize (apache#4073)

425430d

[Fix][VM] Fix VM invoke with set_params (apache#4079)

b5bcdbb

* Fix VM invoke with set_params * add test * tweak

[AlterOpLayout][x86] NHWC to NCHWc conv support. (apache#4080)

153fd7f

[CodeGen] Disable -mfloat-abi hard option for LLVM < 6.0 (apache#4071)

2ee0f65

The -mfloat-abi hard option does not work for LLVM < 6.0 as it is ignored. This adds a fatal error when using unsupported LLVM versions so that the failure is not silent.

Fix wrong n_trial number in autotvm tutorials' progress bar (apache#4070

90b10b8

) if n_trial is larger then config space.

[ARITH] Add floordiv for the deduce bound (apache#4025)

ec375a8

Use fdiv in the tests for the deduce_bound

[topi] enable fp16 sort for arm (apache#4084)

1c56c72

[TOPI][X86] Pool operator parallel support. (apache#4090)

3a32729

[relay] Small refactor for context (apache#4091)

c12275e

[TOPI] Add valid auto tvm for Intel Graphics (apache#4078)

4d875d1

* add valid autotune * fix pylint

Fixing tensor not found issue in bitserial operator (apache#4095)

283afac

[Fix] Fix the logic of the number of nodes checking in op fusion (apa…

9572d98

…che#4074) * move the number of nodes constraint in op fusion up to the dom tree level * add test case of limiting the max number of ops to be fused * uncomment other test cases

- Adding support for Mxnet flavored dequantization for both default a…

9bbc98c

…nd using MKLDNN. User can choose between the two at runtime. (apache#3945) - Added tests for new methods added.

correct error (apache#4093)

f312288

Add a python tutorial of deploying tvm module with tvm runtime only (a…

aad48ff

…pache#4094)

[VTA][TSIM] Serial GEMM Application Added (apache#4082)

47e50e1

* app init push * fix on readme * change name, add bit serial explanantion * rm serialLoadMM, change doc * syntax change for readme * add parallel test functionality * fix readme * add python doc * syntax

[Relay][AlterOp] NHWC to NCHWc support for Pool, pad, concatenate, su…

d69c6fd

…m. (apache#4059)

force code object v2 for amd gpu backend (apache#4099)

15ae978

[tvm][any] broadcast with values other than one (apache#3967)

9d5cba2

* [tvm][any] broadcast with values other than 1 * Add test for incompatible runtime values * Remove hybrid script compact buffer binding * retrigger ci

[Fix] Fix a few bugs when dtype is fp16 (apache#4088)

d08ec10

* Fix layer norm for fp16 * [Fix] Fix arange for fp16 * [Fix] Fix mxnet frontend for fp16 * [Fix] Fix arange for fp16 * remove comments * x * fix nnvm

adding soiferj to the list of reviewers (apache#4108)

985d219

Add parser support for CAST tflite operator (apache#4096)

068c148

This implementation provides cast to limited number of dtypes that tflite currently supports for placeholder op. Add INT64 in the possible dtypes as it appears to be supported accrording to tlfite schema.

add dependency of compilation with LLVM (apache#4117)

d7e30ed

MarisaKirisame added 2 commits October 13, 2019 21:44

save

dcf084e

fix dense do save add grad save

save

b8df23a

MarisaKirisame force-pushed the training-fork branch from 58c9409 to b8df23a Compare October 13, 2019 22:15

MarisaKirisame added 4 commits October 14, 2019 04:39

refactor

50ed608

save

9c679b7

save

5d1e05d

save

9ab4864

MarisaKirisame closed this Oct 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track PLDI Training fork #8

Track PLDI Training fork #8

MarisaKirisame commented Oct 7, 2019

Track PLDI Training fork #8

Track PLDI Training fork #8

Conversation

MarisaKirisame commented Oct 7, 2019