Reducing Computational Requirements for Large Language Models

Overview

This repository contains my master's dissertation titled "Methods of Reducing Computational Requirements for Large Language Models". The research explores various techniques to compress large language models (LLMs) to reduce their computational requirements, making them more accessible for consumers and small organizations with strict security or privacy requirements. The study focuses on post training quantization methods such as GGUF, AWQ, and VPTQ, as well as pruning techniques, and evaluates their effectiveness on selected LLMs like Gemma2 9B, LLaMa 3.1 8B, and Qwen2.5 7B.

Key Contributions

Investigates the effectiveness of popular quantization methods (GGUF, AWQ, VPTQ) and pruning techniques.
Evaluates the impact of these methods on the performance of LLMs across various benchmark tests.

Dissertation PDF

You can access the full dissertation PDF directly from this repository. Click the link below to download or view the PDF:

Reducing Computational Requirements for Large Language Models (PDF)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing Computational Requirements for Large Language Models

Overview

Key Contributions

Dissertation PDF

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Reducing Computational Requirements for Large Language Models

Overview

Key Contributions

Dissertation PDF