Official repository for Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation
We quantized the following models using the AutoAWQ library:
- Create the conda environment we used by running the following command:
conda env create --file ollm.yml- Setup the VLLM inference endpoint by refering to the VLLM documentation.
- Create a
.envfile in the root directory with the following content:
INFERENCE_PORT=YOUR_INFERENCE_PORT
INFERENCE_URL=http://<YOUR_INFERENCE_ENDPOINT_IP_ADDRESS>/:${INFERENCE_PORT}/v1
USE_OPEN_SOURCE=1 # Set to 0 if you want to try GPT-4
OLLM_SERVER_TYPE=vllm
MODEL_TEMPERATURE=0
MODEL_NAME=YOUR_MODEL_NAME # e.g. TechxGenus/Meta-Llama-3-70B-Instruct-AWQ
UNDERSTAND_PATH=THE_PATH_TO_UNDERSTAND_EXECUTABLE # e.g. /path/to/und
GITHUB_API_TOKEN=YOUR_GITHUB_API_TOKEN
OPENAI_API_KEY=YOUR_OPENAI_API_KEY # Only required if you want to use GPT-4
OPENAI_ORGANIZATION=YOUR_OPENAI_ORGANIZATION # Only required if you want to use GPT-4- Download the Java projects locally by running the following command:
cd CMG ; python download_projects.py-
Download and unzip
datasets.zipto the root directory of this project. (A folder nameddatashould be available in the root directory after unzipping). -
Download
training_data_semantic_embedding.ptand copy it to theCMGfolder. -
Download
java-jars.zipand unzip it to theCMGfolder. It should create the JavaParser jar files in theprogram_contextsfolder of theCMGfolder. -
Download
MMS.zipand unzip it to theCMGfolder. It should create the requires for MMS and CMMS in theprogram_contextsfolder of theCMGfolder.
| Diff Augmentation | OMG_BLEU | OMG_METEOR | OMG_ROUGEL | HUMAN_BLEU | HUMAN_METEOR | HUMAN_ROUGEL |
|---|---|---|---|---|---|---|
| None | 10.86 | 31.67 | 28.28 | 0.58 | 13.96 | 7.16 |
| Diff Narrative | 11.65 | 32.65 | 29.65 | 0.83 | 14.94 | 7.74 |
| FIDEX-generated Diff Summary | 12.29 | 33.85 | 29.76 | 0.67 | 14.79 | 7.37 |
The changes for this model were not consistent due to the model's issue in repeating the same sentence in its output. See the CSV files in the cmg_results/CodeQwen1.5-7B-Chat-AWQ folder for the generated CMs by this model.
| Diff Augmentation | OMG_BLEU | OMG_METEOR | OMG_ROUGEL | HUMAN_BLEU | HUMAN_METEOR | HUMAN_ROUGEL |
|---|---|---|---|---|---|---|
| None | 4.82 | 28.40 | 21.75 | 0.41 | 12.17 | 5.80 |
| Diff Narrative | 5.08 | 28.34 | 22.36 | 0.53 | 13.23 | 6.54 |
| FIDEX-generated Diff Summary | 4.85 | 29.26 | 21.63 | 0.47 | 12.37 | 5.82 |
Reported in the paper.
| Diff Augmentation | OMG_BLEU | OMG_METEOR | OMG_ROUGEL | HUMAN_BLEU | HUMAN_METEOR | HUMAN_ROUGEL |
|---|---|---|---|---|---|---|
| None | 14.19 | 36.44 | 32.06 | 0.95 | 16.38 | 8.16 |
| FIDEX-generated Diff Summary | 15.78 | 38.02 | 33.80 | 0.88 | 17.12 | 8.45 |
CMG: Contains the scripts to download the Java projects and generate the commit messages.cmg_results: Contains the generated commit messages by each SLM/OLLM.common: Contains the common scripts used by the different models.evaluation: Contains the scripts for calculating our automatic evaluation metrics.survey-1tosurvey-3: Contains the survey data and the analysis scripts for each survey.quantization: Contains the scripts used to quantize two of the candidate OLLMs.data: Contains the datasets used in the paper.
-
Run the
Meta-Llama-3-70B-Instruct-AWQmodel for inference on your machine using VLLM. -
Uncomment the
answering_instructionsvariable that has the# Original - Used For Surveycomment and comment out any other variable with the same name. -
Run following command:
cd CMG
REMOVE_COMMENTS=FALSE METHOD_SUMMARIES=OLD python omega.py ../data/omg_data_preprocessed.csv all-
Run the
Meta-Llama-3-70B-Instruct-AWQmodel for inference on your machine using VLLM. -
Uncomment the
answering_instructionsvariable that has the# Modified - Used After Survey to make it more comprehensivecomment and comment out any other variable with the same name. -
Run following command:
cd CMG
python omega.py ../data/omg_data_preprocessed.csv all-
Run the
casperhansen/llama-3-8b-instruct-awqmodel for inference on your machine using VLLM. -
Uncomment the
answering_instructionsvariable that has the# Modified - Used After Survey to make it more comprehensivecomment and comment out any other variable with the same name. -
Run following command to use Diff Narrative:
cd CMG
python omega.py ../data/omg_data_preprocessed.csv all --dn- Run following command to use FIDEX-generated Diff Summary:
cd CMG
python omega.py ../data/omg_data_preprocessed.csv all --fidexIf you found this work helpful, please consider citing it using the following:
@inproceedings{imani2025context,
title={Context Conquers Parameters: Outperforming Proprietary Llm in Commit Message Generation},
author={Imani, Aaron and Ahmed, Iftekhar and Moshirpour, Mohammad},
booktitle={2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)},
pages={1844--1856},
year={2025},
organization={IEEE}
}