Skip to content
/ m8p Public

LLM Microprocessor for Acelerated inference on x86-64 machines

License

Notifications You must be signed in to change notification settings

0xae/m8p

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The M8P Microprocessor

M8P System Architecture

M8P is a virtual machine designed to build and run sophisticated AI Systems. It is a glorified wrapper as it is an architectural shift. M8P treats AI operations—like inference, vector search, matrix operations, and embedding—as native first class instructions.

Built on a robust C++ codebase, M8P combines llama.cpp, an HNSW Vector DB, and AVX2/AVX512 optimizations into a single runtime environment. This allows for zero-copy latency and atomic "thought loops" that others frameworks will find hard to match.

This implementations is based off llama.cpp and ships the whole llama runtime inside the M8 interpreter/VM. The vm codebase is in m8p core. The server is here

Build

First generate build conf (disable LIBCURL if you prefer)

cmake -B build -DLLAMA_CURL=OFF  && cd build

generate for NVIDIA GPU(disable LIBCURL if you prefer)

cmake -B build -DLLAMA_CURL=OFF -DGGML_CUDA=ON && cd build

now check support for avx in you processor

lscpu

The ouputs of lscpu will give you the complete capabilities of machine processor. According to your settings set the CXX_FLAGS according to your processor support for either avx2 or avx512 (default is avx2):

Then lets set support for avx:

cat > flags.make
CXX_DEFINES = -DGGML_BACKEND_SHARED -DGGML_SHARED -DGGML_USE_CPU -DLLAMA_SHARED
CXX_INCLUDES = -I/opt/m8p/tools/server -I/opt/m8p/build/tools/server -I/opt/m8p/tools/server/../mtmd -I/workspace/m8p -I/opt/m8p/common/. -I/opt/m8p/common/../vendor -I/opt/m8p/src/../include -I/opt/m8p/ggml/src/../include -I/opt/m8p/tools/mtmd/.
CXX_FLAGS = -O3 -DNDEBUG -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -mavx2
#CXX_FLAGS = -O3 -DNDEBUG -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -mavx512f

Build with AVX Support (Advanced vector eXtensions)

# change 17 for your processor count - 2 (ideally)
cp flags.make tools/server/CMakeFiles/llama-server.dir && make -j 17 llama-server

Build without AVX (Advanced vector eXtensions), example for inference only mat instructions wont be available

make -j 17 llama-server

Run

If build successful.

./bin/llama-server -m ~/models/nomic-embed-text-v1.5.Q4_K_M.gguf  -t 4 --port 8090  --host 127.0.0.1  --jinja

Some Models

https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blob/main/tinyllama-1.1b-chat-v1.0.Q2_K.gguf https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/blob/main/nomic-embed-text-v1.5.f32.gguf https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/tree/main https://huggingface.co/ggml-org/gemma-3-1b-it-GGUF/blob/main/gemma-3-1b-it-Q4_K_M.gguf

Bonus

If you're in Ubuntu and dont have rcp, here's the command to install

apt-get update && apt-get install rsh-redone-client 

To which the output will be something like (for GPU): Build Preview

MORE DETAILS ABOUT BUILD README IS here We use the same build toolchain as llama. Visit Website

          |\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\|
          |/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/|
          |\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\|
          |_________________________________________|
__________|                                         |__________
__________|                                         |__________
__________|                                         |__________
__________|       /$$      /$$  /$$$$$$             |__________
__________|      | $$$    /$$$ /$$__  $$            |__________
__________|      | $$$$  /$$$$| $$  \ $$            |__________
__________|      | $$ $$/$$ $$|  $$$$$$/            |__________
__________|      | $$  $$$| $$ >$$__  $$            |__________
__________|      | $$\  $ | $$| $$  \ $$            |__________
__________|      | $$ \/  | $$|  $$$$$$/            |__________
__________|      |__/     |__/ \______/             |__________
__________|                                         |__________
__________|           LLM MICROPROCESSOR            |__________
__________|                                         |__________
__________|_________________________________________|__________
          |_________________________________________|
          |\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\|
          |/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/|
          |\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\|

About

LLM Microprocessor for Acelerated inference on x86-64 machines

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published