Skip to content

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Notifications You must be signed in to change notification settings

JamesSand/MultiLayer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

1 Environment Setup

Please make sure you have installed Pytorch

2 Run the Code

python low_rank_approx.py

3 Get the Result

You will have results like this in the terminal

dim:4   seq_len:160     relative_error:4.673    actual:1.010    our:0.710       speedup:1.423
dim:5   seq_len:320     relative_error:2.505    actual:1.090    our:0.630       speedup:1.730
dim:6   seq_len:640     relative_error:2.846    actual:2.190    our:0.850       speedup:2.576
dim:7   seq_len:1280    relative_error:1.838    actual:3.780    our:0.970       speedup:3.897
dim:8   seq_len:2560    relative_error:2.005    actual:12.160   our:1.290       speedup:9.426
dim:9   seq_len:5120    relative_error:1.620    actual:47.300   our:1.790       speedup:26.425
dim:10  seq_len:10240   relative_error:1.286    actual:192.330  our:7.200       speedup:26.713
dim:11  seq_len:20480   relative_error:1.558    actual:807.740  our:20.030      speedup:40.327
dim:12  seq_len:40960   relative_error:0.773    actual:3297.410 our:41.750      speedup:78.980

About

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages