Skip to content

TaiyoYamada/AbacusKit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

117 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AbacusKit

Soroban Recognition SDK for iOS

Swift Platform CoreML License

AbacusKit is an iOS / macOS SDK that recognizes a soroban (Japanese abacus) in real time through the camera and reads off the value shown. It uses a lightweight, guided-capture design: image processing with Core Image and inference with Apple Core ML (including the Neural Engine). It has no external binary dependencies.

Features β€’ Installation β€’ Usage β€’ How it works


πŸš€ Features

  • 🎯 Guided capture β€” your app passes only a frame and the digit count N; the library rectifies the frame, divides it into N equal columns, and classifies each digit.
  • πŸ”’ Integer output β€” returns the digits (0–9) concatenated left to right (no unit point or decimal point).
  • ⚑ Real time β€” Core Image preprocessing plus parallel Core ML inference, stabilized with temporal aggregation over consecutive frames.
  • πŸŒ‘ Color-independent β€” converts to grayscale before classification, matching the training pipeline (color carries no bead signal).
  • 🧡 Swift 6 ready β€” AbacusRecognizer is an actor and safe to call from async/await.
  • πŸ“¦ Zero dependencies β€” no OpenCV or ExecuTorch binaries; the model is bundled with the SDK.

πŸ“¦ Installation

Swift Package Manager

dependencies: [
    .package(url: "https://github.com/TaiyoYamada/AbacusKit.git", from: "1.0.0")
]

The Core ML model (AbacusDigitClassifier.mlpackage) is bundled with the SDK and is compiled and loaded on first use.


πŸƒ Usage

Basics

import AbacusKit

let recognizer = AbacusRecognizer()

// Pass the frame your app showed on the preview, in the source image's
// (CVPixelBuffer) pixel coordinates.
let region = AbacusRegion.rect(guideRectInImageCoords)

let result = try await recognizer.recognize(
    pixelBuffer: cameraFrame,
    region: region,
    digitCount: 23          // The digit count N specified by the app.
)

print(result.text)          // "09452037697982622718818" (the digit string)
print(result.digits)        // [0, 9, 4, 5, 2, ...]
print(result.intValue ?? 0) // The value if it fits in an Int (nil for 23 digits, etc.)
print(result.confidence)    // The lowest column's confidence

Camera integration (stabilized with temporal aggregation)

import AbacusKit
import AVFoundation

final class CameraController: NSObject, AVCaptureVideoDataOutputSampleBufferDelegate {
    private let recognizer = AbacusRecognizer()
    private let region = AbacusRegion.rect(/* the preview frame in image coordinates */)
    private let digitCount = 23

    func captureOutput(_ output: AVCaptureOutput,
                       didOutput sampleBuffer: CMSampleBuffer,
                       from connection: AVCaptureConnection) {
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
        Task {
            do {
                // Average probabilities over consecutive frames; return once stable.
                if let stable = try await recognizer.recognizeStabilized(
                    pixelBuffer: pixelBuffer, region: region, digitCount: digitCount
                ) {
                    await MainActor.run { self.commit(stable) }
                }
            } catch let error as AbacusError where error.isRetryable {
                // Wait for the next frame.
            } catch {
                print("error:", error)
            }
        }
    }
}

When the shot is tilted and distortion remains, pass the frame as four corners (rectified with a perspective correction):

let region = AbacusRegion.quad([topLeft, topRight, bottomRight, bottomLeft])

When the soroban is rotated 90Β° and placed vertically, pass orientation: .vertical (the digits are stood back upright during extraction):

let result = try await recognizer.recognize(
    pixelBuffer: frame, region: region, digitCount: 23, orientation: .vertical
)

🧠 How it works

CVPixelBuffer + region + digitCount (N)
        β”‚
        β–Ό  AbacusRecognizer (actor)
   ColumnExtractor (Core Image)
     β”œβ”€ rectify the frame (rect = crop / quad = perspective correction)
     β”œβ”€ divide the width into N
     └─ convert each column to 32Γ—128 grayscale (R = G = B, Rec. 601)
        β”‚
        β–Ό
   DigitInferenceEngine (Core ML, parallel)
     └─ AbacusDigitClassifier β†’ digit 0–9 (softmax probabilities used directly)
        β”‚
        β–Ό
   BoardAggregator (optional temporal aggregation: averaging over frames)
        β”‚
        β–Ό
   SorobanResult { digits, text, columns, confidence, timing }

The model (AbacusDigitClassifier) is the .mlpackage produced by the training and conversion repo abacus-recognition. Its input is an RGB 32Γ—128 image (with the 1/255 scale baked in), and its output is classLabel and classLabel_probs (softmax already applied).


πŸ“š API Reference

public actor AbacusRecognizer {
    public init(configuration: AbacusConfiguration = .default)
    public func configure(_ config: AbacusConfiguration) async throws

    public func recognize(
        pixelBuffer: sending CVPixelBuffer,
        region: AbacusRegion,
        digitCount: Int,
        orientation: SorobanOrientation = .horizontal
    ) async throws -> SorobanResult

    public func recognizeStabilized(
        pixelBuffer: sending CVPixelBuffer,
        region: AbacusRegion,
        digitCount: Int,
        orientation: SorobanOrientation = .horizontal,
        minFrames: Int = 3
    ) async throws -> SorobanResult?

    public func resetAggregation()
}

public enum AbacusRegion: Sendable, Equatable {
    case rect(CGRect)       // An axis-aligned frame (the common case).
    case quad([CGPoint])    // Four corners [TL, TR, BR, BL] (perspective correction).
}

public enum SorobanOrientation: Sendable, Codable {
    case horizontal         // Standard (digits left to right, heaven beads on top).
    case vertical           // Rotated 90Β° (digits top to bottom); stood upright during extraction.
}

public struct SorobanResult: Sendable, Equatable {
    public let digits: [Int]            // Left to right, length N.
    public let columns: [SorobanColumn] // Per-digit detail.
    public let confidence: Float        // The lowest column's confidence.
    public let timing: TimingBreakdown
    public var text: String { get }     // The digits joined (leading zeros kept).
    public var intValue: Int? { get }   // Only when it fits in an Int.
}

Configure with AbacusConfiguration (default / fast / highAccuracy).


⚑ Performance

Metric iPhone 15 Pro (approx.)
Column extraction (Core Image) 1–3 ms
Inference (Core ML, 23 digits in parallel) 3–8 ms
Total 5–12 ms
FPS 40–60

πŸ”§ Requirements

  • iOS 17.0+ / macOS 14.0+
  • Xcode 16.0+
  • Swift 6.0+

πŸ“„ License

MIT License. See LICENSE.


πŸ›  Development

swift build          # Build
swift test           # Test
swift package generate-documentation   # DocC

Code style is enforced with SwiftFormat (.swiftformat) and SwiftLint (.swiftlint.yml). Run them locally before opening a PR.


Made with ❀️ for iOS developers

About

Soroban Recognition SDK for iOS

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors