Report

This report consists of the design decisions and instructions to execute the tasks

Method

Task 1: Conversational Core (Streaming & Cost Telemetry) This is a core task which involves

Using the given model, I used openAI LLM which takes input from the user and responds with a response.
The response is streamed token by token instead of user waiting till all tokens predicted and returned.
The conversation history is preserved for the last 10 messages and the conversation is sent to the LLM which has context of the previous chat before responding.
Once the response is received from the model, I calculate the KPIs like total input tokens, output tokens, cost and latency which is presented to the user on the screen.

Task 2: High-Performance Retrieval-Augmented QA This task is built on top of task 1 where I reused the task 1 components. This task involves

Extracting text from a large pdf.
Creating chunking mechanism where the extracted text is further divided into sentences as cutting the sentence in middle will impact the accuracy of the RAG
The sentences are then divided into tokens, where I fixed the maximum number of tokens with an fixed overlap to preserve the sematic meaning between the sentences.
The chunks are then further embedded into vectors using sentence transformers.
These embeddeding along with the chunks and metadata are loaded into a chromaDB vector store.
Now, the large PDF is divided into smaller chunks and loaded into vector store and the RAG is ready to take input from the user.
The user input is then embedded and compared with the other vectors in the vector store using a similarity search and gets the top 5 relavent chunks.
These relavent chunks along with the prompt is sent to the task 1 workflow which responds with the answer for the user query using the retrived chunks and inline citation.

Task 3: Autonomous Planning Agent with Tool Calling This task reuses the components from the previous tasks. This task involves

This task uses available tools to complete the user query
I have created 2 different tools, a weather tool which gives information regarding the weather details of a particular city and number of days required. The other tool was to get the flight details like price and timing from once city to the other city.
The weather tool uses an external live API and the flights tool uses the mocked data
When user asks a query regarding planning an itenary, the model reasons and looks for the available tools to complete the task.
I have limited the agent reasoning steps to 10 which acts as a gaurdrail avoiding infinite loop which saves the cost and time.
If the user query requires using multiple tools, the model continue reasoning until a final response is generated or the reasoning steps limit is reached.
The modal displays its reasoning as it continous to generate the response.
The final response is outputted as a JSON object where the schema is predefined in the prompt.

Task 4: Self-Healing Code Assistant This is the final task which reuse all the previous tasks components. This task involves

I have created 2 new tools, one for executing the commands in the shell environment and the other tool to save the file in the current working directory to the disk.
Before generating the code, it check the current OS and if python, pytest and rust were already installed in the machine. If they were not installed, the models stops immediatly and notifies the user.
Once all the necessary frameworks were installed, the model then proceed to generate the code and writes to a file on the disk
The model then executes the code and checks for the errors. If there are errors, those errors will be sent to the model as a conversation history and themodel will retry to resolve the issue.
Once the issues are resolved, the model then executes the test cases and display the passes/failed tests to the user. If there are any failed testcases, the model then retries to resolve issue and execute the testcases again.
The number of retries are restricted to 3 and after that the final result will be shown to the user.