Parallel BFGS Optimization in C++ with OpenMP and SIMD

Overview

This project implements a parallelized version of the BFGS (Broyden–Fletcher–Goldfarb–Shanno) optimization algorithm using C++, OpenMP, and SIMD (Single Instruction Multiple Data) vectorization.

The goal of this project is to accelerate the computation of gradient-based optimization in high-dimensional spaces ( e.g., 5,000+ dimensions) through a hybrid approach that combines shared-memory parallelism (OpenMP) and * data-level parallelism* (SIMD intrinsics).

This work was developed as part of the "Parallel computing systems" Master’s course at the Faculty of Electrical Engineering, University of Sarajevo.

Key Features

Full BFGS optimization implementation for multivariate functions
Parallel gradient and Hessian evaluations using OpenMP
SIMD acceleration for vector operations (dot products, matrix-vector multiplications)
Flexible dimensionality — tested at 5,000+ dimensions
Benchmarking suite comparing serial vs. parallel execution times on multiple benchmark functions
Continuous (differentiable) TSP relaxation for testing BFGS

Technical Details

1. Parallelization Strategy

OpenMP parallel regions are used to divide gradient and Hessian computations across threads.
Reduction clauses are applied for summation-heavy operations (e.g., dot products).
The algorithm maintains thread-safe updates to shared memory (Hessian approximation) through synchronized regions.

2. SIMD Optimization

Utilizes compiler auto-vectorization and manual SIMD intrinsics (e.g., SSE/AVX) for linear algebra routines.
Ensures memory alignment of vectors and matrices for maximum SIMD efficiency (alignas(32) or std::aligned_alloc).

Performance Results

The following table shows relative speedups (normalized to the sequential version = 1.0) for different compilation modes and problem sizes:

Problem Dimension	Sequential	OpenMP + Autovectorization	OpenMP + SIMD	OpenMP + SIMD + Pointer Array
1,000	1.00	1.95	1.88	2.02
3,000	1.00	9.18	9.07	9.27
5,000	1.00	6.49	6.49	7.15

(Exact results depend on CPU architecture and compiler optimizations.)

Hardware configuration for testing:

Intel Core i5-14600K
- 14 cores (6 Performance + 8 Efficiency)
- 20 threads
- Maximum boost clock up to 5.3 GHz
- 1.2 MB L1 cache, 20 MB L2 cache, 24 MB L3 cache
- 32 GB DDR5 RAM, 6000 MT/s
Windows 11 Pro

Dependencies

C++23 or newer
OpenMP (usually included with GCC/Clang)
Optional: Intel or GCC SIMD intrinsics

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
benchmark		benchmark
src		src
tsp_test		tsp_test
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
benchmark.py		benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel BFGS Optimization in C++ with OpenMP and SIMD

Overview

Key Features

Technical Details

1. Parallelization Strategy

2. SIMD Optimization

Performance Results

Hardware configuration for testing:

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parallel BFGS Optimization in C++ with OpenMP and SIMD

Overview

Key Features

Technical Details

1. Parallelization Strategy

2. SIMD Optimization

Performance Results

Hardware configuration for testing:

Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages