Skip to content

Add a small files compactor for Minio #139

@MrCreosote

Description

@MrCreosote

The blobstore saves all object data in Minio, which saves each file individually, even if they're very small.

Add a single process, single thread, standalone file compactor that periodically

  • Scans for files under some size (say 50KB)
  • When it finds enough documents to make a compacted file of some max size (say 1MB) or just enough files (say 100) it will:
  • Make a checkpoint in a special mongo collection that records the files to be compacted, their order, and their sizes and the target filename
  • Compact the files into a single file in Minio
  • Update the checkpoint state with the new file information
  • Update the records in the blobstore s3 collection that point to the old non-compacted files to point to the new compacted file and add their offsets
  • Update the checkpoint state
  • Delete the old non-compacted files
  • Delete the checkpoint

If the compaction produces a file smaller than the 50KB size, that document should be first in the next compaction.

The blobstore will need to be updated to take the file offsets into account before the compactor is ever run.

This makes file deletion more complicated since more objects depend on the same file. File deletion will also need some sort of checkpointing system most likely. Also figure out what happens if a file is deleted while it's being read by the compactor.

If the compactor starts and finds a checkpoint:

  • If the file has not yet been completely written delete the file and the checkpoint.
  • Otherwise, continue the compaction from based on the checkpoint state

Open question - how do we want to monitor the compactor?

Also see
kbase/workspace_deluxe#577
#136

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions