Malware-Detection

kaggle Project. The task of this project is to detect the malware based on features extracted from the API calls. More info on Kaggle website.

Requirement

Python 3.7
Pytorch 10.1
requirements.txt
GPU with at least 1GB memory avaible (recommended)
Downloads train and test data from kaggle

Project Structure

|
|
├───test                       # Downloads from kaggle
|    ├───0.npy
│    ├───...
│    └───6050.npy
├───train                      # Downloads from kaggle
|    ├───0.npy
│    ├───...
│    └───18661.npy
├───train_kaggle.csv           # Downloads from kaggle
|
|
├───train.py                   # Epoch training
├───test.py                    # Generates solution.csv which can be submitted
├───model.py                   # Model
├───run.py                     # Starts training
└───dataset.py                 # Used to provide data in batch

Model

In this project, we are using the same model as described in the paper: Dynamic Malware Analysis with Feature Engineering and Feature Learning. The model structure is shown below:

Input: N×C×L tensor, where N is batch size, C is feature size (102) and L is the max sequence length(1000).
- batchSize: 50
Batch Normalization: It speeds up the process of convergence.
Gated CNN: It extracts the usable features from the raw input.
- gated_cnn_outputs: 128
- gated_cnn_stride1: 1
- gated_cnn_stride2: 1
- gated_cnn_kernel1: 2
- gated_cnn_kernel2: 3
BiLSTM: The input features are with sequential patterns and we use bi-directional LSTM to understandboth the past and future context.
- lstm_layers: 1
- lstm_neurons: 100
MaxPool1D: Extracts the most important features from the hidden states generated by BiLSTM.
Dense: Reduces the dimension of feature space.
- fc_outputs: 64
Dropout: Defeats overfitting.
- dropout: 0.5
Sigmoid: Generates probabilities for binary classification.

Exp logs

Exp	Description
1573179669	seed:28 90% train, 10% validation, pc
1573200428	seed:29 95% train, 5% validation, pc
1573204629	seed:29 95% train, 5% validation, server
1573983562	pc, batch 50
1574035600	server, batch 25
1574035703	server, batch 100

Train

Python run.py # all the hyperparameters can be set inside run.py

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
images		images
ipynb		ipynb
readings		readings
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
model.py		model.py
requirements.txt		requirements.txt
run.py		run.py
test.py		test.py
train.py		train.py
train_kaggle.csv		train_kaggle.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware-Detection

Requirement

Project Structure

Model

Exp logs

Train

About

Releases

Packages

Languages

VeryLazyBoy/Malware-Detection

Folders and files

Latest commit

History

Repository files navigation

Malware-Detection

Requirement

Project Structure

Model

Exp logs

Train

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages