in src/workflow/agents/information_retriever/tool_kit/retrieve_entity.py
semantic similarity scores are computed for (column, question_hint) pairs, and the scores are used for sorting:
similar_column_names.sort(key=lambda x: x[2], reverse=True)
but the sorting is not followed by any shortlisting and the scores are then discarded
table_column_pairs = list(set([(table, column) for table, column, _ in similar_column_names]))
and then it comes to a structure change, so the sorting is also useless
similar_columns = self._get_similar_column_names(keywords=keywords, question=question, hint=hint)
for table_name, column_name in similar_columns:
if table_name not in selected_columns:
selected_columns[table_name] = []
if column_name not in selected_columns[table_name]:
selected_columns[table_name].append(column_name)
Essentially, the column retrieval based keywords is only according to difflib.SequenceMatcher(column_name, potential_column_name). Though embeddings are computed but not used. Please clarify if am wrong. Thanks a lot.
in src/workflow/agents/information_retriever/tool_kit/retrieve_entity.py
semantic similarity scores are computed for (column, question_hint) pairs, and the scores are used for sorting:
similar_column_names.sort(key=lambda x: x[2], reverse=True)but the sorting is not followed by any shortlisting and the scores are then discarded
table_column_pairs = list(set([(table, column) for table, column, _ in similar_column_names]))and then it comes to a structure change, so the sorting is also useless
Essentially, the column retrieval based keywords is only according to
difflib.SequenceMatcher(column_name, potential_column_name). Though embeddings are computed but not used. Please clarify if am wrong. Thanks a lot.