Pyastran

The goal of this repository is to handle the creation of node descriptions and embeddings for graphs generated by pykagcee.

Requirements

Neo4j DBMS (local or remote). We recommend using the Neo4j Desktop application due to its better performance.
uv tool to manage python virtual environment and dependencies.
Chat model API to generate the descriptions.
Embedding model API to generate the embeddings.

Installation

Create your .env. You can use the .env.example file as a template.

cp .env.example .env

Set your Neo4j connection details in the .env file. Note that you previously need to create a knowledge graph out of pykagcee.

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password

For the chat and embedding models, the project comes with the langchain-openai integration. If you use other provider than OpenAI, add the integration package with uv add langchain-{provider} command. See available providers.

CHAT_PROVIDER=openai
CHAT_MODEL=gpt-4.1-nano
CHAT_API_KEY=sk-proj-fakekey123

EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_API_KEY=sk-proj-fakekey123

If you are serving an OpenAI-compatible API (e.g., with vLLM) you can set the CHAT_BASE_URL and EMBEDDING_BASE_URL variables, keeping the provider as openai.

CHAT_PROVIDER=openai
CHAT_BASE_URL=https://example-vllm-openai-compatible-serve.test/v1

EMBEDDING_PROVIDER=openai
EMBEDDING_BASE_URL=https://example-vllm-openai-compatible-serve.test/v1

You will need to set the CHAT_MAX_CONTEXT to avoid exceeding the model context length when generating the descriptions.

CHAT_MAX_CONTEXT=3000

By default, when generating the description for a symbol, we include n random related symbols to provide more context to the model. You can configure how many related symbols to include by setting the MAX_RELATION_CONTEXT variable in the .env file. The default value is 3.

MAX_RELATION_CONTEXT=3

Create environment and install dependencies:

uv sync

Usage

Generate descriptions

To generate descriptions for a single project use describe command:

uv run pyastran describe /path/to/single/project --max-concurrent-queries 50

The optional max-concurrent-queries param determines how many nodes will be described in parallel. Default is 100.

To generate descriptions for multiple projects under a directory use describe --all command:

uv run pyastran describe --all /path/to/multiple/projects --max-concurrent-queries 50 --max-concurrent-tasks 2

The optional max-concurrent-tasks param determines how many projects will be processed in parallel. Default is 1.

Embed descriptions

To generate embeddings for a single project use embed command:

uv run pyastran embed /path/to/single/project --max-concurrent-queries 50

This command will create the embeddings for all nodes and a vector index named description_embedding_index.

The optional max-concurrent-queries param determines how many nodes will be embedded in parallel. Default is 100.

To generate embeddings for multiple projects under a directory use embed --all command:

uv run pyastran embed --all /path/to/multiple/projects --max-concurrent-queries 50 --max-concurrent-tasks 2

This command will create the embeddings for all nodes and a vector index named description_embedding_index per project.

The optional max-concurrent-tasks param determines how many projects will be processed in parallel. Default is 1.

Fix path issues

Due to a bug in pykagcee, sometimes the file_path property of the nodes is empty. To fix this issue, use the fix-paths --all command at any time (even if you have not generated descriptions or embeddings yet):

uv run pyastran fix-paths --all /path/to/multiple/projects

Wipe all descriptions and embeddings

Clean all descriptions, embeddings and indexed from all databases.

uv run pyastran wipe

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
src/pyastran		src/pyastran
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pyastran

Requirements

Installation

Usage

Generate descriptions

Embed descriptions

Fix path issues

Wipe all descriptions and embeddings

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pyastran

Requirements

Installation

Usage

Generate descriptions

Embed descriptions

Fix path issues

Wipe all descriptions and embeddings

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages