Skip to content

Latest commit

 

History

History
17 lines (9 loc) · 1.21 KB

File metadata and controls

17 lines (9 loc) · 1.21 KB

Reducing Computational Requirements for Large Language Models

Overview

This repository contains my master's dissertation titled "Methods of Reducing Computational Requirements for Large Language Models". The research explores various techniques to compress large language models (LLMs) to reduce their computational requirements, making them more accessible for consumers and small organizations with strict security or privacy requirements. The study focuses on post training quantization methods such as GGUF, AWQ, and VPTQ, as well as pruning techniques, and evaluates their effectiveness on selected LLMs like Gemma2 9B, LLaMa 3.1 8B, and Qwen2.5 7B.

Key Contributions

  • Investigates the effectiveness of popular quantization methods (GGUF, AWQ, VPTQ) and pruning techniques.
  • Evaluates the impact of these methods on the performance of LLMs across various benchmark tests.

Dissertation PDF

You can access the full dissertation PDF directly from this repository. Click the link below to download or view the PDF:

Reducing Computational Requirements for Large Language Models (PDF)