Document Similarity

A high-performance C++ program for computing the similarity between text documents. Designed for tasks such as plagiarism detection, duplicate content analysis, and document clustering, this tool delivers accuracy and efficiency for both small and large datasets.

🌟 Features

Multiple Similarity Metrics: Includes algorithms like cosine similarity, Jaccard similarity, and more.
Text Preprocessing: Handles tokenization, case normalization, stopword removal, and stemming.
Scalable: Optimized for handling large datasets and multiple comparisons.
Configurable: Easily extend or modify to suit specific text analysis needs.

📂 Project Structure

Document_Similarity/
├── src/                # Source code files
├── include/            # Header files
├── samples/            # Example input documents
├── build/              # Directory for compiled files (generated)
├── Makefile            # Build system configuration
├── README.md           # Project documentation
└── LICENSE             # License information

🚀 Getting Started

Prerequisites

Before using the program, ensure you have the following installed:

C++17 or newer: Required for compilation.
Make: For building the project.
CMake (optional): For advanced build configuration.

Installation

Clone the repository:

git clone https://github.com/Mohammed-3tef/Document_Similarity.git
cd Document_Similarity

Compile the program:

Using Make:
```
make
```
Using CMake:
```
mkdir build && cd build
cmake ..
make
```

The executable file will be created in the build/ or project root directory.

👨‍💻 Contributing

We welcome contributions from the community! To contribute:

Fork the repository.
Create a feature branch: git checkout -b feature-name
Commit your changes: git commit -m "Add feature or fix a bug"
Push to your fork and open a pull request.

✍️ Authors:

Name: Mohammed Atef Abd El-Kader
ID: 20231143
Version: 1.0
Date: 15 Nov. 2024

📝 License

This project is licensed under the MIT License. You are free to use, modify, and distribute this software under the terms of the license.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
cmake-build-debug		cmake-build-debug
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
main.cpp		main.cpp
main.exe		main.exe
test1.txt		test1.txt
test2.txt		test2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Similarity

🌟 Features

📂 Project Structure

🚀 Getting Started

Prerequisites

Installation

👨‍💻 Contributing

✍️ Authors:

📝 License

About

Uh oh!

Releases

Packages

Languages

License

Mohammed-3tef/Document_Similarity

Folders and files

Latest commit

History

Repository files navigation

Document Similarity

🌟 Features

📂 Project Structure

🚀 Getting Started

Prerequisites

Installation

👨‍💻 Contributing

✍️ Authors:

📝 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages