Citation

This repository provides the source codes of "Finding NLP Papers by Asking a Multi-hop Question" published in Special Interest Group on Spoken Language Understanding and Dialogue Processing (SIG-SLUD) 2022

Requirements

tqdm
transformers
nltk
dateparser
scikit-learn
fuzzywuzzy
sentencepiece
stanza

Introduction

import json
import os
import random
from tqdm import tqdm
import numpy as np
import argparse
from pretrained_models import T5_QG
import stanza
import nltk
from sklearn.metrics.pairwise import cosine_similarity
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
from config import *
from pretrained_models import *
from utils import *
xlm_roberta_base_unmasker = pipeline('fill-mask', model='xlm-roberta-base')
nltk.download('punkt')

Large Size Data

Create { jargon : defination }

Building Schema

Automatic Keyphrase Extraction

"ml6team/keyphrase-extraction-kbir-inspec"

# Load pipeline
print("Load pipeline for ml6team/keyphrase-extraction-kbir-inspec")
model_name = "ml6team/keyphrase-extraction-kbir-inspec"
extractor = KeyphraseExtractionPipeline(model=model_name)

abstract = 'Sememes are defined as the atomic units to describe the semantic meaning of concepts. Due to the difficulty of manually annotating sememes and the inconsistency of annotations between experts, the lexical sememe prediction task has been proposed. However, previous methods heavily rely on word or character embeddings, and ignore the fine-grained information. In this paper, we propose a novel pre-training method which is designed to better incorporate the internal information of Chinese character. The Glyph enhanced Chinese Character representation (GCC) is used to assist sememe prediction. We experiment and evaluate our model on HowNet, which is a famous sememe knowledge base. The experimental results show that our method outperforms existing non-external information models.'

print(extractor(abstract))

output:
>> ['Glyph enhanced Chinese Character representation' 'HowNet'
>>  'character embeddings' 'lexical sememe prediction task'
>>  'sememe knowledge base']

print(qg_nlp.qg_without_answer(abstract))

output:
>> [{'answer': 'Sememes', 'question': 'What are defined as the atomic units to describe the semantic meaning of concepts?'}, 
>> {'answer': 'manually', 'question': 'How is annotating sememes difficult?'}, 
>> {'answer': 'word or character embeddings', 'question': 'What do previous methods heavily rely on?'}, 
>> {'answer': 'internal information of Chinese character', 'question': 'What is the Glyph enhanced Chinese Character representation (GCC) designed to better incorporate?'}, 
>> {'answer': 'Glyph enhanced Chinese Character representation', 'question': 'What is the GCC?'}, 
>> {'answer': 'HowNet', 'question': 'What is a famous sememe knowledge base?'}, 
>> {'answer': 'external information models', 'question': 'What does the Glyph enhanced Chinese Character representation outperform?'}]

Convert a question into its declarative

# Load xlm-roberta-base-squad2
model_name = "deepset/xlm-roberta-base-squad2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

getXlmRobertaTop1(xlm_roberta_base_unmasker("The <mask> that are defined as the atomic units to describe the semantic meaning of concepts ."))

output:
>> <terms>

Train the paper embeddings by TransE

install useing https://github.com/thunlp/OpenKE

Train the paper embeddings by LINE

install useing https://github.com/tangjianpku/LINE

Citation

@article{,
  title={Finding NLP Papers by Asking a Multi-hop Question},
  author={Li, Xiaoran and Takano, Toshiaki}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
README.md		README.md
config.py		config.py
defination_part.png		defination_part.png
logo_main.png		logo_main.png
schema.png		schema.png
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirements

Introduction

Large Size Data

Create { jargon : defination }

Building Schema

Automatic Keyphrase Extraction

Convert a question into its declarative

Train the paper embeddings by TransE

Train the paper embeddings by LINE

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Requirements

Introduction

Large Size Data

Create { jargon : defination }

Building Schema

Automatic Keyphrase Extraction

Convert a question into its declarative

Train the paper embeddings by TransE

Train the paper embeddings by LINE

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages