CineSLEUTH is an intelligent movie recommendation system that combines advanced machine learning techniques such as TF-IDF vectorization, cosine similarity, fuzzy matching, and Apriori-based collaborative filtering. The system provides highly accurate and personalized movie recommendations for users.
-
Content-Based Filtering:
- Uses TF-IDF vectorization to calculate similarity based on genres and tags.
- Cosine similarity is applied to find the closest matches.
- Fuzzy matching ensures typos are handled effectively.
-
Collaborative Filtering:
- Utilizes the Apriori algorithm to process user ratings and generate association rules for better recommendations.
- Generates frequent itemsets for movies based on user behavior.
-
Interactive User Experience:
- Auto-complete suggestions for movie titles with clickable options.
- Streamlit interface for an interactive and responsive application.
-
Real-Time Data Processing:
- Handles large datasets with chunk-based reading for scalability.
-
Clone the repository:
git clone https://github.com/AaryanGole26/CineSLEUTH.git cd CineSLEUTH -
Install the required Python libraries:
pip install -r requirements.txt
-
Add the required CSV files to the repository:
movies.csvratings.csvtags.csvgenome_tags.csv
Run the application locally using Streamlit:
streamlit run CineSLEUTH.py- Enter a movie title in the input box.
- Get auto-complete suggestions and click to see recommendations instantly.
- Analyze the recommendations through a table of similarity scores.
- View Apriori-based rules generated from collaborative filtering.
Here is a sample video showcasing the output of CineSLEUTH:
Example video showing movie recommendation process and visualization.
The flowchart below represents the working of CineSLEUTH:
Flowchart depicting the movie recommendation process from user input to final recommendations.
CineSLEUTH.py: Main Python script for the Streamlit application.movies.csv: Movie metadata.ratings.csv: User rating data.tags.csv: Tags associated with movies.genome_tags.csv: Genome tags dataset.requirements.txt: Required Modules.
-
Python Libraries:
pandasfor data manipulation.apyorifor association rule mining.sklearnfor TF-IDF vectorization and cosine similarity.fuzzywuzzyfor fuzzy matching of movie titles.matplotlibandseabornfor data visualization.streamlitfor interactive UI.
-
Algorithms:
- Content-based filtering (TF-IDF + Cosine Similarity).
- Collaborative filtering (Apriori Algorithm).
- Rule: Movie A → Movie B
- Support: 0.02
- Confidence: 0.85
- Lift: 3.5
Feel free to fork the repository, raise issues, or contribute to the project. Contributions are always welcome!
This project is licensed under the MIT License.
