Sentiment Analysis on Yelp Reviews
Python
NLP
Deep Learning
Classification
Comparative analysis of CBOW and LSTM models for sentiment classification on Yelp reviews
Project Overview
This project focuses on performing sentiment analysis on the Yelp Review dataset, classifying reviews into 1 to 5 stars. The analysis explores various modeling approaches, ranging from traditional machine learning baselines to deep learning architectures.
By implementing and comparing TF-IDF with Logistic Regression, Continuous Bag of Words (CBOW), and Bidirectional LSTM models, the study evaluates the effectiveness of different techniques in handling text classification tasks.
Key Concepts Applied
- Exploratory Data Analysis (EDA): Analyzed the distribution of ratings and review lengths using Altair. The dataset was found to be balanced across different rating classes.
- Baseline Modeling: Established a baseline using TF-IDF Vectorization and Logistic Regression, achieving a Macro F1 score of 0.5936, setting a strong benchmark for subsequent models.
- Deep Learning Architectures:
- CBOW (Continuous Bag of Words): Implemented a custom CBOW model using Spacy word embeddings.
- Bi-LSTM (Bidirectional Long Short-Term Memory): Constructed a Bi-LSTM model to capture sequential dependencies in the text. The model outperformed the CBOW approach, achieving a Validation F1 score of approximately 0.557.
- Hyperparameter Tuning: Conducted experiments with different architectures (e.g., varying hidden sizes and number of layers) to optimize the LSTM model performance.