Embeddings are not used in _get_similar_columns

in src/workflow/agents/information_retriever/tool_kit/retrieve_entity.py

semantic similarity scores are computed for (column, question_hint) pairs, and the scores are used for sorting:

`similar_column_names.sort(key=lambda x: x[2], reverse=True)`

but the sorting is not followed by any shortlisting and the scores are then discarded

`table_column_pairs = list(set([(table, column) for table, column, _ in similar_column_names]))`

and then it comes to a structure change, so the sorting is also useless

```Python
similar_columns = self._get_similar_column_names(keywords=keywords, question=question, hint=hint)
        for table_name, column_name in similar_columns:
            if table_name not in selected_columns:
                selected_columns[table_name] = []
            if column_name not in selected_columns[table_name]:
                selected_columns[table_name].append(column_name)
```

Essentially, the column retrieval based keywords is only according to `difflib.SequenceMatcher(column_name, potential_column_name)`. Though embeddings are computed but not used. Please clarify if am wrong. Thanks a lot.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embeddings are not used in _get_similar_columns #44

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Embeddings are not used in _get_similar_columns #44

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions