UbiquitousLearning · KKkai0315 · Jan 22, 2026 · Jan 23, 2026 · Jan 17, 2026 · Jan 17, 2026
diff --git a/.claude/skills/impl-jit-kernel/SKILL.md b/.claude/skills/impl-jit-kernel/SKILL.md
diff --git a/.claude/skills/install-pymllm/SKILL.md b/.claude/skills/install-pymllm/SKILL.md
@@ -0,0 +1,73 @@
+---
+name: install-pymllm
+description: Install the pymllm Python package. Asks the user whether to do a full build (with CMake C++ compilation) or a fast install (Python-only, skip CMake). Use when the user asks to install, set up, or reinstall pymllm.
+---
+
+# Install pymllm
+
+## Goal
+
+Help the user install the `pymllm` package with the right configuration for their use case.
+
+## Workflow
+
+### Step 1: Ask the user which install mode they want
+
+Use `AskUserQuestion` to present two options:
+
+**Full Install (with C++ build)**
+- Compiles the C++ mllm runtime and FFI extension via CMake
+- Required if the user needs mobile inference, model conversion with FFI, or CPU/QNN backends
+- Slower (several minutes depending on the machine)
+- Command: `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall`
+
+**Fast Install (Python-only, skip CMake)**
+- Skips the entire CMake build step
+- Only installs the pure Python package
+- Recommended for users who only use CUDA backends (FlashInfer, TileLang) and do not need the C++ mllm runtime
+- Much faster (seconds)
+- Command: `SKBUILD_WHEEL_CMAKE=false pip install -e .`
+
+### Step 2: Ask editable or non-editable
+
+Use `AskUserQuestion` to ask:
+
+- **Editable (`pip install -e .`)**: For active development. Python imports point to the source tree. Changes to `.py` files take effect immediately without reinstalling.
+- **Non-editable (wheel)**: For stable usage. Installs a wheel into site-packages.
+
+### Step 3: Ask whether the user needs CUDA optional dependencies
+
+Use `AskUserQuestion` to ask whether the user needs CUDA support (FlashInfer, TileLang, pyzmq, etc.).
+
+This determines whether to append `[cuda]` to the install specifier (e.g. `pip install -e ".[cuda]"` instead of `pip install -e .`).
+
+**This applies to ALL install modes.** For fast-install users this is especially important since the CUDA packages are the primary compute backend.
+
+### Step 4: Execute the install
+
+Based on user choices, compose and run the appropriate command. The install specifier is either `.` or `".[cuda]"` depending on Step 3.
+
+| Mode | Editable | CUDA | Command |
+|------|----------|------|---------|
+| Full | Yes | No  | `pip install -e -v .` |
+| Full | Yes | Yes | `pip install -e -v ".[cuda]"` |
+| Full | No  | No  | `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall` |
+| Full | No  | Yes | `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall && pip install "pymllm[cuda]"` |
+| Fast | Yes | No  | `SKBUILD_WHEEL_CMAKE=false pip install -e .` |
+| Fast | Yes | Yes | `SKBUILD_WHEEL_CMAKE=false pip install -e ".[cuda]"` |
+| Fast | No  | No  | `SKBUILD_WHEEL_CMAKE=false pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall` |
+| Fast | No  | Yes | `SKBUILD_WHEEL_CMAKE=false pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall && pip install "pymllm[cuda]"` |
+
+### Step 5: Post-install for editable + full build
+
+If the user chose **editable + full build**, the compiled `.so` files live in a build directory (e.g. `build/bin/`), not in the source tree. The Python code at `pymllm/__init__.py` looks for libraries at `pymllm/lib/MllmFFIExtension.so`. A symlink is needed to bridge this gap.
+
+**Invoke the `/link-pymllm-lib` skill** to help the user set up the symlink.
+
+## Important Notes
+
+- The project root must contain `pyproject.toml` with `scikit-build-core` as the build backend.
+- The `wheel.cmake = true` flag in `pyproject.toml` controls whether CMake runs. The env var `SKBUILD_WHEEL_CMAKE=false` overrides it at install time without modifying the file.
+- For non-editable full builds, the `.so` files are bundled inside the wheel automatically — no symlink needed.
+- For fast installs, `pymllm.is_mobile_available()` will return `False` since no C++ libraries are present. This is expected.
+- The `[cuda]` optional dependencies are defined in `pyproject.toml` under `[project.optional-dependencies]`.
diff --git a/.claude/skills/link-pymllm-lib/SKILL.md b/.claude/skills/link-pymllm-lib/SKILL.md
@@ -0,0 +1,83 @@
+---
+name: link-pymllm-lib
+description: Create or update the pymllm/lib symlink to point to a C++ build directory's bin/ folder. Required after editable installs with C++ builds so that Python can find the compiled .so libraries. Use when the user asks to link, fix, or set up pymllm native libraries.
+---
+
+# Link pymllm lib
+
+## Goal
+
+Create a symlink at `pymllm/lib` pointing to the correct build output directory so that an editable-installed pymllm can load the compiled C++ shared libraries (`MllmFFIExtension.so`, `libMllmRT.so`, etc.).
+
+## Background
+
+When pymllm is installed in editable mode (`pip install -e .`), Python imports from the source tree directly. The C++ libraries are compiled into `<build-dir>/bin/` by CMake, but pymllm looks for them at `pymllm/lib/`. A symlink bridges this gap:
+
+```
+pymllm/lib -> <project-root>/<build-dir>/bin
+```
+
+## Workflow
+
+### Step 1: Detect available build directories
+
+Scan the project root for directories matching the pattern `build*/bin/` that contain `MllmFFIExtension.so` (or `.dylib` on macOS). List all valid candidates.
+
+Common build directories and their corresponding platforms:
+
+| Build directory | Platform / Config | Typical build command |
+|----------------|-------------------|----------------------|
+| `build/bin` | X86 CPU only | `python task.py tasks/build_x86.yaml` |
+| `build-x86-cuda/bin` | X86 + CUDA | `python task.py tasks/build_x86_cuda.yaml` |
+| `build-qnn-aot/bin` | X86 + QNN AOT | `python task.py tasks/build_x86_qnn_aot.yaml` |
+| `build-android-arm64-v8a-qnn/bin` | Android ARM + QNN | `python task.py tasks/build_android_qnn.yaml` |
+
+### Step 2: Ask the user which build to link
+
+Use `AskUserQuestion` to let the user pick from the detected build directories. Show each option with its path and the platform it corresponds to.
+
+If no build directories with `.so` files are found, inform the user they need to build first:
+
+```bash
+pip install -r requirements.txt
+python task.py tasks/build_x86.yaml  # or another build task
+```
+
+### Step 3: Check existing symlink
+
+Before creating a new symlink, check if `pymllm/lib` already exists:
+
+- If it's a symlink, show where it currently points and confirm replacement.
+- If it's a real directory, warn the user and ask before removing it.
+- If it doesn't exist, proceed directly.
+
+### Step 4: Create the symlink
+
+```bash
+ln -sfn <project-root>/<build-dir>/bin <project-root>/pymllm/lib
+```
+
+Use `ln -sfn` to atomically replace any existing symlink.
+
+### Step 5: Verify
+
+After creating the symlink, verify by checking that the target `.so` file is accessible:
+
+```bash
+ls -la pymllm/lib/MllmFFIExtension.so
+```
+
+Then run a quick Python check:
+
+```bash
+python -c "import pymllm; print('mobile available:', pymllm.is_mobile_available())"
+```
+
+If `is_mobile_available()` returns `True`, the link is correct.
+
+## Important Notes
+
+- The symlink target must be an **absolute path** for reliability.
+- On macOS, the library extension is `.dylib` instead of `.so`.
+- Android build directories (e.g., `build-android-arm64-v8a-qnn/bin`) contain ARM binaries that cannot run on x86 hosts. Warn the user if they select one of these on a non-ARM machine.
+- If the user has multiple build directories, they can re-run this skill anytime to switch which build pymllm uses.
diff --git a/.claude/skills/update-codeowners/SKILL.md b/.claude/skills/update-codeowners/SKILL.md
@@ -0,0 +1,44 @@
+---
+name: update-codeowners
+description: Updates CODEOWNERS entries safely with consistent path and owner formatting. Use when the user asks to add, remove, or modify CODEOWNERS rules, ownership mappings, reviewers, or module maintainers.
+---
+
+# Update CODEOWNERS
+
+## Goal
+Maintain `CODEOWNERS` accurately while preserving the repository's existing section/comment style.
+
+## Workflow
+1. Read the current `CODEOWNERS` file before editing.
+2. Identify requested changes as one of:
+   - Add new path rule
+   - Modify owners for existing path rule
+   - Remove obsolete path rule
+   - Reorganize section comments (only if requested)
+3. Update rules in place instead of creating duplicates for the same path.
+4. Keep existing section headers and comment style unless the user asks to refactor structure.
+5. Return a concise changelog describing which paths were added, changed, or removed.
+
+## Rule Format
+- Use one rule per line: `<path-pattern> <owner1> <owner2> ...`
+- Owners must be GitHub handles prefixed with `@`.
+- Keep path style consistent with the file (in this repo, path patterns typically start with `/`).
+- Do not leave rules with empty owner lists.
+
+## Editing Guidelines
+- Prefer minimal edits near related sections.
+- If a path already exists, update that line instead of adding a second conflicting line.
+- If a new rule logically belongs to an existing section, place it in that section.
+- Preserve human-readable grouping and blank lines.
+- Keep comments intact unless they are clearly outdated and the user asked for cleanup.
+
+## Validation Checklist
+- [ ] Every non-comment, non-empty line has at least one owner.
+- [ ] Every owner token starts with `@`.
+- [ ] No accidental duplicate rule for the exact same path pattern.
+- [ ] Existing comments/sections were preserved unless explicitly changed.
+
+## Example Requests
+- "Add `/mllm/models/new_model/ @alice @bob` under models."
+- "Change `/core/Storage` owner to `@team-core`."
+- "Remove ownership rule for deprecated path `/legacy/`."
diff --git a/.codespellrc b/.codespellrc
@@ -1,3 +1,3 @@
 [codespell]
-ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS, bfloat, constexpr, cuda, dlpack, expt, forceinline, ifndef, linalg, LPBQ, mllm, pymllm, Quantizaton, Qwen, ROCM, silu, torchao
+ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS, bfloat, constexpr, cuda, dlpack, expt, forceinline, ifndef, linalg, LPBQ, mllm, pymllm, Quantizaton, Qwen, ROCM, silu, torchao, flashinfer
 skip = *.json,*.jsonl,*.patch,*.txt
diff --git a/.gitignore b/.gitignore
@@ -4,7 +4,7 @@
 .cache/
 .tmp/
 compile_commands.json
-.claude/
+settings.local.json
 
 # MLLM Team Specific
 tasks/mllmteam*
@@ -13,7 +13,7 @@ tasks/mllmteam*
 
 # Building files and binary
 build*/
-install*/
+/install*/
 mllm-sdk-*/
 mllm-install-*/
 

diff --git a/README-ZH.md b/README-ZH.md
@@ -17,6 +17,7 @@ mllm
 
 ## 最新动态
 
+- [2026 年 3 月 18 日] 🔥🔥🔥 `pymllm` 已支持在 Jetson Orin 和 Jetson Thor 设备上使用 CUDA（实验特性，仍在持续开发中）。
 - [2026 年 2 月 3 日] 🔥🔥🔥 MLLM Qnn AOT 已支持在 NPU 上全图执行！[快速开始](https://ubiquitouslearning.github.io/mllm/qnn_backend/aot_execute.html), [技术报告](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support/)
 - [2025 年 11 月 27 日] Android Demo 更新：通过一种全新的 In-App Go 服务架构，在 Android 上实现了 Qwen3 和 DeepSeek-OCR 的稳定流式推理。
 - [2025 年 11 月 23 日] MLLM v2 发布！
@@ -78,6 +79,7 @@ mllm 框架可以与主流社区框架的模型检查点无缝集成。通过 ml
 |-----------------------------------------------------------------------------|------|-----------------------|
 | [Qwen3-0.6B](https://github.com/QwenLM/Qwen3)                     | [✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/Qwen3-0.6B-w4a32kai)  |  | 
 | [Qwen3-1.7B](https://github.com/QwenLM/Qwen3)                     | [✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/Qwen3-1.7B-w4a8-i8mm-kai)  | [W4A16-SM8650](https://modelscope.cn/models/mllmTeam/Qwen3-1.7B-Qnn-AOT-SM8650/summary) |
+| [Qwen3-4B](https://github.com/QwenLM/Qwen3)                      | [✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/Qwen3-4B-w4a8-i8mm-kai)  |  |
 | [DeepSeek-OCR](https://github.com/deepseek-ai/DeepSeek-OCR)       | [✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/DeepSeek-OCR-w4a8-i8mm-kai)  |  |
 | [SmolLM3](https://huggingface.co/blog/smollm3)| [✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/SmolLM3-3B-w4a8-i8mm-kai)  |  |
 | [Qwen2-VL-2B-Instruct](https://qwenlm.github.io/zh/blog/qwen2-vl/)|[✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/Qwen2-VL-2B-Instruct-w4a32kai) ||

diff --git a/README.md b/README.md
@@ -17,6 +17,7 @@ mllm
 
 ## Latest News
 
+- [2026 Mar 18] 🔥🔥🔥 `pymllm` now supports CUDA on Jetson Orin and Jetson Thor devices (experimental; still under active development).
 - [2026 Feb 03] 🔥🔥🔥 MLLM Qnn AOT Support for Full Graph Execution on NPU! [Quick Start](https://ubiquitouslearning.github.io/mllm/qnn_backend/aot_execute.html), [Technical Report](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support-en/)
 - [2025 Nov 27] Android Demo Update: Enabled stable Qwen3 and DeepSeek-OCR streaming on Android via a novel In-App Go Server Architecture.
 - [2025 Nov 23] MLLM v2 released!
@@ -76,6 +77,7 @@ The mllm framework integrates seamlessly with popular community frameworks' chec
 |-----------------------------------------------------------------------------|------|-----------------------|
 | [Qwen3-0.6B](https://github.com/QwenLM/Qwen3)                     | [✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/Qwen3-0.6B-w4a32kai)  |  | 
 | [Qwen3-1.7B](https://github.com/QwenLM/Qwen3)                     | [✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/Qwen3-1.7B-w4a8-i8mm-kai)  | [W4A16-SM8650](https://modelscope.cn/models/mllmTeam/Qwen3-1.7B-Qnn-AOT-SM8650/) |
+| [Qwen3-4B](https://github.com/QwenLM/Qwen3)                      | [✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/Qwen3-4B-w4a8-i8mm-kai)  |  |
 | [DeepSeek-OCR](https://github.com/deepseek-ai/DeepSeek-OCR)       | [✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/DeepSeek-OCR-w4a8-i8mm-kai)  |  |
 | [SmolLM3](https://huggingface.co/blog/smollm3)| [✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/SmolLM3-3B-w4a8-i8mm-kai)  |  |
 | [Qwen2-VL-2B-Instruct](https://qwenlm.github.io/zh/blog/qwen2-vl/)|[✔️ w4a8](https://www.modelscope.cn/models/mllmTeam/Qwen2-VL-2B-Instruct-w4a32kai) ||
@@ -308,6 +310,15 @@ mllm provides a set of model converters to convert models from other popular mod
 bash ./scripts/install_pymllm.sh
 ```
 
+> **Tip for CUDA-only users:** If you only use CUDA backends (e.g., FlashInfer, TileLang) and do not need the C++ mllm runtime, you can skip the CMake build to speed up installation significantly:
+>
+> ```shell
+> SKBUILD_WHEEL_CMAKE=false pip install -e .
+> pip install pymllm[cuda]
+> ```
+>
+> This installs only the pure Python package without compiling the C++ components.
+
 **future:**
 
 Once PyPI approves the creation of the mllm organization, we will publish it there. Afterwards, you can use the command below to install it in the future.

diff --git a/assets/pymllm-arch.png b/assets/pymllm-arch.png
@@ -246,6 +246,17 @@ mllm provides a set of model converters to convert models from other popular mod
 
    bash ./scripts/install_pymllm.sh
 
+.. tip::
+
+   **For CUDA-only users:** If you only use CUDA backends (e.g., FlashInfer, TileLang) and do not need the C++ mllm runtime, you can skip the CMake build to speed up installation significantly:
+
+   .. code-block:: shell
+
+      SKBUILD_WHEEL_CMAKE=false pip install -e .
+      pip install pymllm[cuda]
+
+   This installs only the pure Python package without compiling the C++ components.
+
 **future:**
 
 Once PyPI approves the creation of the mllm organization, we will publish it there. Afterwards, you can use the command below to install it in the future.

@@ -60,6 +60,10 @@ Taking ``qwen3_qnn_aot`` as an example, the detailed steps are as follows.
          pip install -e .
 
          # link lib to pymllm's dir, so that tvm ffi can find the lib
+         # 
+         # NOTE:! build x86 qualcomm aot first !
+         source <absolute path to where you install qnn>/bin/envsetup.sh
+         python task.py tasks/build_x86_qnn_aot.yaml
          ln -s <absolute path to where you build mllm>/bin/ mllm/pymllm/lib
 
 
@@ -82,6 +86,7 @@ Taking ``qwen3_qnn_aot`` as an example, the detailed steps are as follows.
    .. code-block:: shell
 
       # In the mllm-v2 project root directory
+      source <absolute path to where you install qnn>/bin/envsetup.sh
       python task.py tasks/build_x86_qnn_aot.yaml
 
       # Run the compiler program

diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt
@@ -2,6 +2,7 @@ add_subdirectory(qwen2vl)
 add_subdirectory(qwen2vl_tracer)
 add_subdirectory(qwen2_5vl)
 add_subdirectory(qwen2_5vl_tracer)
+add_subdirectory(minicpm_o45)
 add_subdirectory(llama)
 add_subdirectory(minicpm_o)
 add_subdirectory(minicpm4)

diff --git a/examples/minicpm_o45/CMakeLists.txt b/examples/minicpm_o45/CMakeLists.txt
@@ -0,0 +1,11 @@
+add_executable(mllm-minicpm-o45-runner main.cpp)
+target_link_libraries(mllm-minicpm-o45-runner PRIVATE MllmRT MllmCPUBackend)
+target_include_directories(mllm-minicpm-o45-runner PRIVATE ${MLLM_INCLUDE_DIR})
+
+add_executable(mllm-minicpm-o45-runner-dbg main_dbg.cpp)
+target_link_libraries(mllm-minicpm-o45-runner-dbg PRIVATE MllmRT MllmCPUBackend)
+target_include_directories(mllm-minicpm-o45-runner-dbg PRIVATE ${MLLM_INCLUDE_DIR})
+
+# add_executable(mllm-minicpm-o45-runner-python main_python.cpp)
+# target_link_libraries(mllm-minicpm-o45-runner-python PRIVATE MllmRT MllmCPUBackend)
+# target_include_directories(mllm-minicpm-o45-runner-python PRIVATE ${MLLM_INCLUDE_DIR})