Qwen3-32B on AMD's 7900XTX

David Pickett

07 Jan 2026 — 1 min read

People: Me

Idea: I wanted to see how well the Qwen3-32B model runs on an AMD 7900XTX using different quantization formats and inference backends—spoiler: AWQ is not the move on this generation of consumer AMD card.

Details:

Tested on a host system with Ubuntu 24.04 with ROCm 7.1.1, using both Ollama and dockerized vllm
AWQ quants technically work in vllm but are painfully slow—around 5 tokens/sec for a ~700 input / ~70 output generation
Ollama with Q4KM hit about 25 tokens/sec, which is respectable
The winner was Qwen3-32B-autoround-4bit-gptq in vllm at ~35 tok/sec single request
Bumping batch size to 3 concurrent requests pushed that to ~40 tok/sec
Getting vllm running on AMD is still an adventure - I tried building from source using the rocm Dockerfile, rocm-dev nightlies, AMD TheRock images, and community Docker images from /r/localllama
Some of those setups couldn't run AWQ at all
Also gave Vulkan a spin in Ollama for comparison
Bottom line: if you're on AMD and want decent Qwen3-32B performance, look for autoround GPTQ quants instead of AWQ
The ROCm ecosystem is getting better but still requires some patience and experimentation

Neural Network Intent Routing with UIUC's LLMRouter

People: David Idea: Tested UIUC's LLMRouter framework as an alternative to LiteLLM's semantic routing—this one trains an actual neural network for intent classification and can run on hardware as small as a Raspberry Pi with 4GB of ram. Details: * Wanted to compare this against last

Basic Semantic Routing with LiteLLM Proxy

People: David Idea: Testing semantic routing as a way to automatically send LLM requests to different models based on what the user is asking—part of a bigger vision for smart edge-to-hub-to-cloud routing on our Maui cluster. Details: * Our LiteLLM proxy already bundles multiple machines into one API endpoint, so

Automous terminal agent accessible through Slack

People: David Idea: Testing KIRA, Krafton's open-source project that lets you run a full Claude Code instance through Slack, on our local PMF hardware. Details: * Nicole mentioned wanting more Slack bot and automation options for the team * Found Krafton's KIRA project on GitHub which does exactly

CrewAI plus open-source LLMs for marketing strategy

People: Me Idea: Tested whether local/open-source models can handle real marketing strategy work using CrewAI as the orchestration layer. Details: * Wanted to see how open source tools do for automating some tasks in marketing * Used gpt-oss-120b as the backbone model * Ran everything through CrewAI for agent orchestration * Got Joe&

Read more

Neural Network Intent Routing with UIUC's LLMRouter

Basic Semantic Routing with LiteLLM Proxy

Automous terminal agent accessible through Slack

CrewAI plus open-source LLMs for marketing strategy