> Miguel VF

Running Qwen 3.6 on NVIDIA DGX Spark

Published: at 08:40 PM

To run ./run-recipe.sh qwen3.6-35b-a3b-fp8 -d --solo at boot on a DGX Spark (which runs Ubuntu/Debian), create a systemd service:

  1. Install and build spark-vllm-docker:

    sudo git clone https://github.com/eugr/spark-vllm-docker.git /opt/spark-vllm-docker
    cd /opt/spark-vllm-docker
    sudo ./build-and-copy.sh
  2. Create a systemd service:

    [Unit]
    Description=vLLM Qwen3.6-35B-A3B-FP8
    After=network.target docker.service
    Requires=docker.service
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    WorkingDirectory=/opt/spark-vllm-docker
    ExecStart=/opt/spark-vllm-docker/run-recipe.sh qwen3.6-35b-a3b-fp8 -d --solo
    ExecStop=/usr/bin/docker stop vllm_node
    
    [Install]
    WantedBy=multi-user.target/etc/systemd/system/vllm-qwen.service
  3. Enable the service at boot time:

    sudo systemctl daemon-reload
    sudo systemctl enable vllm-qwen.service
    sudo systemctl start vllm-qwen.service
  4. Benchmark with llama-benchy:

    uvx --from git+https://github.com/eugr/llama-benchy llama-benchy --base-url http://localhost:8000/v1 --model Qwen/Qwen3.6-35B-A3B-FP8 \
       --depth 0 4096 8192 16384 32768 65535 100000 \
       --pp 2048 \
       --tg 128 \
       --enable-prefix-caching \
       --concurrency 1 2 5 10 \
       --save-result results.csv
  5. Install OpenCode to build coding agents:

    curl -fsSL https://opencode.ai/install | bash
  6. Configure OpenCode to use the local vLLM instance:

    {
      "$schema": "https://opencode.ai/config.json",
      "provider": {
        "local": {
          "npm": "@ai-sdk/anthropic",
          "name": "local",
          "options": {
            "baseURL": "http://localhost:8000/v1",
            "apiKey": "dummy"
          },
          "models": {
            "Qwen/Qwen3.6-35B-A3B-FP8": {
              "name": "Qwen3.6-35B-A3B-FP8",
              "tool_call": true,
              "limit": {
                "context": 212992,
                "output": 32768
              }
            }
          }
        }
      },
      "compaction": {
        "auto": true,
        "prune": true,
        "reserved": 16384
      },
      "agent": {
        "build": {
          "temperature": 0.6,
          "top_p": 0.95,
          "max_tokens": 32768
        },
        "plan": {
          "temperature": 0.6,
          "top_p": 0.95,
          "max_tokens": 32768
        }
      },
      "model": "Qwen/Qwen3.6-35B-A3B-FP8",
      "permission": {
        "*": {
          "*": "allow"
        }
      }
    }~/.config/opencode/config.json