Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,39 @@ Versioning follows [Semantic Versioning](https://semver.org/).
## [Unreleased]

### Added
- **LoRA Engine integration** (v0.5, part 3 of 4). Engine-side
glue tying the v0.5 LoRA Foundation (#36) and PEFT → mlx
Converter (#37) together so a downloaded HuggingFace adapter
works end-to-end via the InferenceEngine protocol.
- `InferenceEngine.applyAdapter(_:)` — new protocol method with
a default `extension`-level no-op so test stubs and future
CPU/Python engines compile unchanged. Implementations that
DO support adapters wire it to their LoRA loader.
- `MLXSwiftEngine.applyAdapter(_:)` — auto-routes PEFT-format
adapters through `LoRAAdapterConverter` (cached at
`~/.mac-mlx/adapters/.cache/<adapter-name>/` so repeat loads
skip conversion), then calls
`LoRAContainer.from(directory:)` and
`LanguageModel.load(adapter:)`. Throws
`EngineError.adapterApplyFailed(reason:)` on either step.
- `EngineCoordinator.load(_, adapter:)` — optional adapter
parameter; default `nil` keeps every existing call site
unchanged. When provided, the adapter is applied immediately
after the base model loads.
- `AdapterStore.scan(_:)` now detects mlx-native format
(`adapters.safetensors` + mlx-schema `adapter_config.json`)
in addition to PEFT, with mlx-native taking precedence when
both files coexist (caller already has the converter output
side-by-side with the source).
- `LocalAdapter.format: Format` (`peft` / `mlx`) drives the
engine's auto-conversion branch. Backwards-compatible decode
defaults pre-v0.5 records to `.peft`.
- `ModelParameters.adapterName: String?` persists the user's
adapter pick per model. Custom decoder defaults to nil so
pre-v0.5 `~/.mac-mlx/model-params/*.json` files load unchanged.
- 4 new tests (2 LocalAdapter format/round-trip, 2 AdapterStore
mlx detection / dual-format precedence). 137/137 Core green.
- Parameters-inspector picker UI lands in v0.5 part 4.
- **LoRA PEFT → mlx Converter** (v0.5, part 2 of 3). Pure-Swift
in-process converter that turns a HuggingFace PEFT-format adapter
directory into the mlx-swift-lm native format that
Expand Down
3 changes: 3 additions & 0 deletions MacMLXCore/Sources/MacMLXCore/Engine/EngineError.swift
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ public enum EngineError: LocalizedError, Equatable, Sendable {
case engineNotReady
case generationInProgress
case modelLoadFailed(reason: String)
case adapterApplyFailed(reason: String)
case unsupportedOperation(String)

public var errorDescription: String? {
Expand All @@ -21,6 +22,8 @@ public enum EngineError: LocalizedError, Equatable, Sendable {
return "A generation is already in progress on this engine."
case .modelLoadFailed(let reason):
return "Model failed to load: \(reason)"
case .adapterApplyFailed(let reason):
return "LoRA adapter failed to apply: \(reason)"
case .unsupportedOperation(let op):
return "Operation not supported by this engine: \(op)."
}
Expand Down
20 changes: 20 additions & 0 deletions MacMLXCore/Sources/MacMLXCore/Engine/InferenceEngine.swift
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,17 @@ public protocol InferenceEngine: Actor {
/// Bring a model into memory. Replaces any previously loaded model.
func load(_ model: LocalModel) async throws

/// Apply a LoRA adapter to the currently-loaded model (v0.5+).
///
/// Called after `load(_:)` to layer adapter weights on top of the
/// base model. The protocol-extension default is a no-op so engines
/// that don't support adapters (test stubs, future CPU/Python
/// engines) compile unchanged. The MLX engine routes through
/// `LoRAContainer.from(directory:)` + `LanguageModel.load(adapter:)`,
/// auto-converting PEFT-format adapters via
/// `LoRAAdapterConverter` when needed.
func applyAdapter(_ adapter: LocalAdapter) async throws

/// Release the loaded model (and any caches) from memory.
func unload() async throws

Expand All @@ -36,3 +47,12 @@ public protocol InferenceEngine: Actor {
/// Synchronously confirm the engine is responsive.
func healthCheck() async -> Bool
}

extension InferenceEngine {
/// Default no-op for engines that don't yet support LoRA adapters
/// (test stubs, future CPU/Python engines, …). Throws nothing,
/// silently leaves the model unchanged.
public func applyAdapter(_ adapter: LocalAdapter) async throws {
// intentional no-op
}
}
67 changes: 67 additions & 0 deletions MacMLXCore/Sources/MacMLXCore/Engine/MLXSwiftEngine.swift
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,73 @@ public actor MLXSwiftEngine: InferenceEngine {
status = .idle
}

/// Apply a LoRA adapter (v0.5+) to the currently-loaded model.
///
/// PEFT-format adapters are auto-converted to mlx-native format
/// via `LoRAAdapterConverter`, with the conversion output cached
/// at `~/.mac-mlx/adapters/.cache/<adapter-name>/` so repeat
/// loads reuse the converted bytes. mlx-native adapters skip the
/// converter and load directly.
public func applyAdapter(_ adapter: LocalAdapter) async throws {
guard let container = loadedSupport.container else {
throw EngineError.modelNotLoaded
}

// Resolve the directory the LoRAContainer should read from.
// PEFT → run the converter into a sibling cache dir; mlx-
// native → use the adapter's own directory.
let mlxDirectory: URL
switch adapter.format {
case .mlx:
mlxDirectory = adapter.directory
case .peft:
mlxDirectory = try await convertedDirectory(for: adapter)
}

// Hand the mlx-format directory to LoRAContainer.from then
// load it into the model. Both calls happen inside the
// ModelContainer's actor so we serialise correctly with any
// concurrent generation.
do {
try await container.perform { context in
let loraContainer = try LoRAContainer.from(directory: mlxDirectory)
try context.model.load(adapter: loraContainer)
}
} catch {
throw EngineError.adapterApplyFailed(reason: error.localizedDescription)
}

await LogManager.shared.info(
"LoRA adapter applied: \(adapter.name) (format=\(adapter.format.rawValue))",
category: .inference
)
}

/// Convert a PEFT-format adapter to the mlx-native cache layout.
/// Cached at `~/.mac-mlx/adapters/.cache/<adapter-name>/` so repeat
/// loads of the same adapter reuse the conversion result.
private func convertedDirectory(for adapter: LocalAdapter) async throws -> URL {
let cacheDir = DataRoot.macMLX("adapters/.cache")
.appending(path: adapter.name, directoryHint: .isDirectory)
let configURL = cacheDir.appending(path: "adapter_config.json", directoryHint: .notDirectory)
let weightsURL = cacheDir.appending(path: "adapters.safetensors", directoryHint: .notDirectory)

if FileManager.default.fileExists(atPath: configURL.path),
FileManager.default.fileExists(atPath: weightsURL.path) {
return cacheDir
}

do {
try LoRAAdapterConverter.convertPEFTAdapter(
source: adapter.directory,
destination: cacheDir
)
} catch {
throw EngineError.adapterApplyFailed(reason: "PEFT → mlx conversion failed: \(error)")
}
return cacheDir
}

/// Stream tokens for a generation request.
///
/// This method is `nonisolated` so the `AsyncThrowingStream` is returned
Expand Down
85 changes: 70 additions & 15 deletions MacMLXCore/Sources/MacMLXCore/Managers/AdapterStore.swift
Original file line number Diff line number Diff line change
Expand Up @@ -33,22 +33,77 @@ public actor AdapterStore {
guard try url.resourceValues(forKeys: [.isDirectoryKey]).isDirectory == true,
!url.lastPathComponent.hasPrefix(".") else { continue }

let configURL = url.appendingPathComponent("adapter_config.json")
let weightsURL = url.appendingPathComponent("adapter_model.safetensors")
guard fileManager.fileExists(atPath: configURL.path),
fileManager.fileExists(atPath: weightsURL.path),
let data = try? Data(contentsOf: configURL),
let cfg = try? JSONDecoder().decode(LocalAdapter.PEFTConfig.self, from: data)
else { continue }

results.append(LocalAdapter(
name: url.lastPathComponent,
directory: url,
targetModel: cfg.baseModelNameOrPath,
rank: cfg.r,
targetModules: cfg.targetModules ?? []
))
if let mlxAdapter = readMLXAdapter(at: url) {
results.append(mlxAdapter)
} else if let peftAdapter = readPEFTAdapter(at: url) {
results.append(peftAdapter)
}
}
return results.sorted { $0.name.localizedCompare($1.name) == .orderedAscending }
}

// MARK: - Format-specific decoders

/// Detect mlx-native format: `adapter_config.json` (mlx schema)
/// + `adapters.safetensors`. Returns `nil` if either file is
/// missing or the config doesn't decode cleanly.
private func readMLXAdapter(at url: URL) -> LocalAdapter? {
let configURL = url.appendingPathComponent("adapter_config.json")
let weightsURL = url.appendingPathComponent("adapters.safetensors")
guard fileManager.fileExists(atPath: configURL.path),
fileManager.fileExists(atPath: weightsURL.path),
let data = try? Data(contentsOf: configURL),
let cfg = try? JSONDecoder().decode(MLXAdapterConfig.self, from: data)
else { return nil }
return LocalAdapter(
name: url.lastPathComponent,
directory: url,
format: .mlx,
targetModel: nil, // mlx-native config doesn't carry base model id
rank: cfg.loraParameters.rank,
targetModules: cfg.loraParameters.keys ?? []
)
}

/// Detect PEFT format: `adapter_config.json` (PEFT schema)
/// + `adapter_model.safetensors`. Returns `nil` if either file is
/// missing or the config doesn't decode cleanly.
private func readPEFTAdapter(at url: URL) -> LocalAdapter? {
let configURL = url.appendingPathComponent("adapter_config.json")
let weightsURL = url.appendingPathComponent("adapter_model.safetensors")
guard fileManager.fileExists(atPath: configURL.path),
fileManager.fileExists(atPath: weightsURL.path),
let data = try? Data(contentsOf: configURL),
let cfg = try? JSONDecoder().decode(LocalAdapter.PEFTConfig.self, from: data)
else { return nil }
return LocalAdapter(
name: url.lastPathComponent,
directory: url,
format: .peft,
targetModel: cfg.baseModelNameOrPath,
rank: cfg.r,
targetModules: cfg.targetModules ?? []
)
}
}

/// Minimal mirror of `MLXLMCommon.LoRAConfiguration` shape used to
/// detect mlx-native adapter directories without depending on
/// MLXLMCommon at this layer (Manager file stays MLX-free).
private struct MLXAdapterConfig: Decodable {
let numLayers: Int
let fineTuneType: String
let loraParameters: LoRAParameters

struct LoRAParameters: Decodable {
let rank: Int
let scale: Float
let keys: [String]?
}

private enum CodingKeys: String, CodingKey {
case numLayers = "num_layers"
case fineTuneType = "fine_tune_type"
case loraParameters = "lora_parameters"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,41 @@ public struct ModelParameters: Codable, Hashable, Sendable {
public var maxTokens: Int
/// System prompt prepended to every generation.
public var systemPrompt: String
/// Optional LoRA adapter name (folder under
/// `~/.mac-mlx/adapters/<name>/`) to apply on model load (v0.5+).
/// Empty string is treated identically to `nil` and means "no
/// adapter" — matches how the parameters inspector represents
/// the "None" pick.
public var adapterName: String?

public init(
temperature: Double = 0.7,
topP: Double = 0.95,
maxTokens: Int = 2048,
systemPrompt: String = "You are a helpful assistant."
systemPrompt: String = "You are a helpful assistant.",
adapterName: String? = nil
) {
self.temperature = temperature
self.topP = topP
self.maxTokens = maxTokens
self.systemPrompt = systemPrompt
self.adapterName = adapterName
}

/// Backwards-compatible decoder. Pre-v0.5 JSON has no
/// `adapterName` key — default to nil so existing per-model
/// override files load unchanged.
private enum CodingKeys: String, CodingKey {
case temperature, topP, maxTokens, systemPrompt, adapterName
}

public init(from decoder: Decoder) throws {
let c = try decoder.container(keyedBy: CodingKeys.self)
self.temperature = try c.decode(Double.self, forKey: .temperature)
self.topP = try c.decode(Double.self, forKey: .topP)
self.maxTokens = try c.decode(Int.self, forKey: .maxTokens)
self.systemPrompt = try c.decode(String.self, forKey: .systemPrompt)
self.adapterName = try c.decodeIfPresent(String.self, forKey: .adapterName)
}

/// Factory for the factory defaults — handy in "Reset" buttons.
Expand Down
41 changes: 36 additions & 5 deletions MacMLXCore/Sources/MacMLXCore/Models/LocalAdapter.swift
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,24 @@ import Foundation

/// One LoRA adapter directory present on the local filesystem.
///
/// Discovered by `AdapterStore.scan(_:)` via the presence of
/// `adapter_config.json` + `adapter_model.safetensors` (PEFT format).
/// Discovered by `AdapterStore.scan(_:)` via the presence of one of:
/// - PEFT format: `adapter_config.json` + `adapter_model.safetensors`
/// - mlx-native: `adapter_config.json` (mlx schema) + `adapters.safetensors`
///
/// `targetModel` is advisory — the engine layer applies the adapter
/// regardless and surfaces a clear typed error if the dimensions
/// don't fit the loaded base model.
public struct LocalAdapter: Codable, Hashable, Identifiable, Sendable {
public var id: String { name }
public let name: String
public let directory: URL
/// On-disk format of the adapter weights / config.
public let format: Format
/// Base-model id from the adapter's config (e.g.
/// `mlx-community/Qwen3-8B-4bit`). Optional — older adapters
/// don't always carry it.
/// `mlx-community/Qwen3-8B-4bit`). Optional — only PEFT carries
/// it; mlx-native adapters don't include the base-model id.
public let targetModel: String?
/// LoRA rank (`r` in PEFT config). Nil if absent / unparseable.
/// LoRA rank. Nil if absent / unparseable.
public let rank: Int?
/// Names of the linear layers the adapter touches (e.g.
/// `["q_proj", "v_proj"]`). Empty array if absent.
Expand All @@ -24,17 +28,44 @@ public struct LocalAdapter: Codable, Hashable, Identifiable, Sendable {
public init(
name: String,
directory: URL,
format: Format = .peft,
targetModel: String?,
rank: Int?,
targetModules: [String]
) {
self.name = name
self.directory = directory
self.format = format
self.targetModel = targetModel
self.rank = rank
self.targetModules = targetModules
}

/// On-disk format of the adapter directory. Drives engine-side
/// behaviour: PEFT adapters get auto-converted to mlx-native
/// before `LoRAContainer.from(directory:)` is called.
public enum Format: String, Codable, Hashable, Sendable {
case peft // adapter_config.json + adapter_model.safetensors (HuggingFace standard)
case mlx // adapter_config.json (mlx schema) + adapters.safetensors
}

/// Backwards-compatible decoder. Adapters persisted before format
/// tagging existed default to `.peft` (the only previously-
/// recognised format).
private enum CodingKeys: String, CodingKey {
case name, directory, format, targetModel, rank, targetModules
}

public init(from decoder: Decoder) throws {
let c = try decoder.container(keyedBy: CodingKeys.self)
self.name = try c.decode(String.self, forKey: .name)
self.directory = try c.decode(URL.self, forKey: .directory)
self.format = try c.decodeIfPresent(Format.self, forKey: .format) ?? .peft
self.targetModel = try c.decodeIfPresent(String.self, forKey: .targetModel)
self.rank = try c.decodeIfPresent(Int.self, forKey: .rank)
self.targetModules = try c.decode([String].self, forKey: .targetModules)
}

/// On-disk PEFT `adapter_config.json` shape.
///
/// Exposed publicly so `AdapterStore` and tests can decode it
Expand Down
Loading
Loading