Skip to content


Repository files navigation


kaggle Project. The task of this project is to detect the malware based on features extracted from the API calls. More info on Kaggle website.


  • Python 3.7
  • Pytorch 10.1
  • requirements.txt
  • GPU with at least 1GB memory avaible (recommended)
  • Downloads train and test data from kaggle

Project Structure

├───test                       # Downloads from kaggle
|    ├───0.npy
│    ├───...
│    └───6050.npy
├───train                      # Downloads from kaggle
|    ├───0.npy
│    ├───...
│    └───18661.npy
├───train_kaggle.csv           # Downloads from kaggle
├───                   # Epoch training
├───                    # Generates solution.csv which can be submitted
├───                   # Model
├───                     # Starts training
└───                 # Used to provide data in batch


In this project, we are using the same model as described in the paper: Dynamic Malware Analysis with Feature Engineering and Feature Learning. The model structure is shown below:

  • Input: N×C×L tensor, where N is batch size, C is feature size (102) and L is the max sequence length(1000).
    • batchSize: 50
  • Batch Normalization: It speeds up the process of convergence.
  • Gated CNN: It extracts the usable features from the raw input.
    • gated_cnn_outputs: 128
    • gated_cnn_stride1: 1
    • gated_cnn_stride2: 1
    • gated_cnn_kernel1: 2
    • gated_cnn_kernel2: 3
  • BiLSTM: The input features are with sequential patterns and we use bi-directional LSTM to understandboth the past and future context.
    • lstm_layers: 1
    • lstm_neurons: 100
  • MaxPool1D: Extracts the most important features from the hidden states generated by BiLSTM.
  • Dense: Reduces the dimension of feature space.
    • fc_outputs: 64
  • Dropout: Defeats overfitting.
    • dropout: 0.5
  • Sigmoid: Generates probabilities for binary classification.
Exp logs

Exp logs

Exp Description
1573179669 seed:28 90% train, 10% validation, pc
1573200428 seed:29 95% train, 5% validation, pc
1573204629 seed:29 95% train, 5% validation, server
1573983562 pc, batch 50
1574035600 server, batch 25
1574035703 server, batch 100


Python # all the hyperparameters can be set inside


No releases published


No packages published