Skip to content

imuslic1/bfgs-parallelization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel BFGS Optimization in C++ with OpenMP and SIMD

Overview

This project implements a parallelized version of the BFGS (Broyden–Fletcher–Goldfarb–Shanno) optimization algorithm using C++, OpenMP, and SIMD (Single Instruction Multiple Data) vectorization.

The goal of this project is to accelerate the computation of gradient-based optimization in high-dimensional spaces ( e.g., 5,000+ dimensions) through a hybrid approach that combines shared-memory parallelism (OpenMP) and * data-level parallelism* (SIMD intrinsics).

This work was developed as part of the "Parallel computing systems" Master’s course at the Faculty of Electrical Engineering, University of Sarajevo.


Key Features

  • Full BFGS optimization implementation for multivariate functions
  • Parallel gradient and Hessian evaluations using OpenMP
  • SIMD acceleration for vector operations (dot products, matrix-vector multiplications)
  • Flexible dimensionality — tested at 5,000+ dimensions
  • Benchmarking suite comparing serial vs. parallel execution times on multiple benchmark functions
  • Continuous (differentiable) TSP relaxation for testing BFGS

Technical Details

1. Parallelization Strategy

  • OpenMP parallel regions are used to divide gradient and Hessian computations across threads.
  • Reduction clauses are applied for summation-heavy operations (e.g., dot products).
  • The algorithm maintains thread-safe updates to shared memory (Hessian approximation) through synchronized regions.

2. SIMD Optimization

  • Utilizes compiler auto-vectorization and manual SIMD intrinsics (e.g., SSE/AVX) for linear algebra routines.
  • Ensures memory alignment of vectors and matrices for maximum SIMD efficiency (alignas(32) or std::aligned_alloc).

Performance Results

The following table shows relative speedups (normalized to the sequential version = 1.0) for different compilation modes and problem sizes:

Problem Dimension Sequential OpenMP + Autovectorization OpenMP + SIMD OpenMP + SIMD + Pointer Array
1,000 1.00 1.95 1.88 2.02
3,000 1.00 9.18 9.07 9.27
5,000 1.00 6.49 6.49 7.15

(Exact results depend on CPU architecture and compiler optimizations.)


Hardware configuration for testing:

  • Intel Core i5-14600K
    • 14 cores (6 Performance + 8 Efficiency)
    • 20 threads
    • Maximum boost clock up to 5.3 GHz
    • 1.2 MB L1 cache, 20 MB L2 cache, 24 MB L3 cache
    • 32 GB DDR5 RAM, 6000 MT/s
  • Windows 11 Pro

Dependencies

  • C++23 or newer
  • OpenMP (usually included with GCC/Clang)
  • Optional: Intel or GCC SIMD intrinsics

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.
© 2025 Ismar Muslić, Tarik Hastor, Ivona Jozić, Kanita Kadušić, Merjem Gutošić

About

OpenMP parallelization of BFGS optimization algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors