Sequence Labelling Model for Informal Text

This project was done as part of SUTD's 50.007 Machine Learning course.

Description

Many start-up companies are interested in developing automated systems for analysing sentiment information associated with social media data. Such sentiment information can be used for making important decisions such as making product recommendations, predicting social stance and forecasting financial market trends.

The idea behind sentiment analysis is to analyse the natural language texts types, shared and read by users through services such as Twitter and Weibo and analyse such texts to infer the users' sentiment information towards certain targets. Such social texts can be different from standard texts that appear, for example on news articles. They are often very informal, and can be very noisy. It is very essential to build machine learning systems that can automatically analyse and comprehend the underlying sentiment information associated with such informal texts.

As such, we designed a sequence labelling model for informal texts using the hidden Markov model (HMM).

Setup

Before starting, do make sure that your project directory is in the following structure:

data/
  EN/
    train
    dev.in
    dev.out
    test.in
  SG/
    train
    dev.in
    dev.out
  CN/
    train
    dev.in
    dev.out
models/
  hmm.py
  structured_perceptron.py
preprocess.py
main.py

Following which, you should be able to execute the commands below.

Generally the code can be run through the main.py file as follows:

usage: main.py [-h] --part PART --datasets DATASETS [--epochs EPOCHS]

optional arguments:
  -h, --help           show this help message and exit
  --part PART          Possible parts: 2, 3, 4, 5, 5-laplace, 5-good_turing,
                       5-structured_perceptron
  --datasets DATASETS  Input datasets to be used, separated by commas.
                       Datasets should be stored in data/
  --epochs EPOCHS      Needed only when running 5-structured_perceptron or 5.
                       Defaults to 8.

Below are the instructions to run the specific parts of the project.

Part 2

python main.py --part=2 --datasets=EN,SG,CN

Part 3

python main.py --part=3 --datasets=EN,SG,CN

Part 4

python main.py --part=4 --datasets=EN

Part 5

There are several models included in this part, HMM with laplace smooting, HMM with good turing estimate smoothing, and Structured Perceptron.

HMM with Laplace Smoothing

python main.py --part=5-laplace --datasets=EN

HMM with Good Turning Estimate Smoothing

python main.py --part=5-good_turing --datasets=EN

Structured Perceptron

python main.py --part=5-structured_perceptron --datasets=EN [--epochs=n]

--epochs is used to control the number of epochs to be trained before prediction on the dev set is made. We have set the default to 8 as per our results.

Test Set

We have included a line which will allow predictions to be made on the test set too, as follows.

python main.py --part=5 --datasets=EN [--epochs=n]

This runs the Structured Perceptron model and defaults to 8 epochs.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
EvalScript		EvalScript
data		data
models		models
submission		submission
.gitignore		.gitignore
Project.pdf		Project.pdf
README.md		README.md
main.py		main.py
preprocess.py		preprocess.py
run_eval.sh		run_eval.sh
submission.zip		submission.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequence Labelling Model for Informal Text

Description

Setup

Part 2

Part 3

Part 4

Part 5

HMM with Laplace Smoothing

HMM with Good Turning Estimate Smoothing

Structured Perceptron

Test Set

About

Releases

Packages

Contributors 3

Languages

LinHuiqing/informal-text-sentiments

Folders and files

Latest commit

History

Repository files navigation

Sequence Labelling Model for Informal Text

Description

Setup

Part 2

Part 3

Part 4

Part 5

HMM with Laplace Smoothing

HMM with Good Turning Estimate Smoothing

Structured Perceptron

Test Set

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages