This is a project made by:
- Prajneya Kumar
- Shivansh S.
- Tejasvi Chebrolu
- Clone the repository
- Install all dependencies mentioned in
requirements.txt - Choose which method you would like to use, and depending on that go to appropriate section
This model generates a summary using a Document Term Matrix and frequency count. To use this
-
Go to the
method_1folder -
Place your article in
validfolder named asarticle.txt. -
Run the
extractive.pyfile using python3. -
You will end up getting a summary named as
summary.txtinside thevalidfolder.
This model generates a summary using modified TF-IDF of the document dataset, with weights attached. To use this
-
Go to the
method_2folder -
Place your article in
validfolder -
Run the code in jupyter notebook
-
Input the name of your file which is within that directory
-
You will end up getting a summary + wordcloud in the output folder :)
- Add the Gold standard for the summary as
n.txtin the Gold folder in the Summaries directory. Herenis the next number in the sequence in the Gold folder. - For example, if there are 7 files in the Gold Folder, they must be labelled as
1.txt2.txt...7.txtetc. - Repeat this process for the summaries generated by the rule-based method and the extractive method and store them in the
ExtractiveandRuleBaseddirectories. - You can do this on the terminal via simple redirection.
- Now, in the
accuracy.pyfile on line number 15, change the code tofor i in range(1, n+1):where n is the same variable as above. - For example, if your file was saved as
9.txtyou would change the code tofor i in range(1, 10): - Run the code as
python accuracy.py - If you want individual accuracies for any article, you can uncomment line number 62 in the
Rouge_1.pyfile. - It is advised then to redirect to a new file as
python accuracy.py > output.txtto enable better formatting.
For Method I we got an accuracy of 74.1% For Method II we got an accuracy of 83.4%
The evaulation was done based on the Rouge method proposed by Chin-Yew Lin. For this project, since the summarization has been extractive, only Rouge-I has been used. To generate the gold standard for the summaries, the annotation was done manually. For any given article, the annotators were asked to pick the most important sentences. The only rule was that the number of sentences they could choose was equal to 0.3N where N was the number of sentences in the initial article.
We thank the following for creating the gold standard summaries:
- Abhinav Menon
- Trisha Kaore
- Yash Agrawal
- Eshika Khandelwal
- Vidushi Bhartari
- Shashwat Singh
- Shubhankar Kamthankar
- Fork this repository
- Clone the forked repository to your local system
git remote add upstream https://github.com/AurumnPegasus/Text-Summariser.git- Install all required dependencies (mentioned in
requirements.txt) - Commit and Send PRs :)