Edge-First LLM Semantic Routing on a 4GB Jetson Nano
People: David Pickett
Idea: Testing whether a 4GB NVIDIA Jetson Nano can act as an autonomous routing brain - classifying incoming queries with a local embedding model and deciding to answer locally or escalate to more powerful servers across four compute tiers.
Details:
- The Jetson Nano can run llama.cpp natively with two models - nomic-embed-text for embeddings and gemma-3-1b for chat - both fitting comfortably in 4GB RAM
- LiteLLM's semantic router runs on the Jetson using the local embedding model to classify queries, adding only ~300MB of overhead
- Simple questions like "What is the capital of France?" get answered entirely on-device with no network calls - critical for spotty connectivity
- Coding queries tested to route to a 24GB GPU local hub running Ollama, complex reasoning goes to MiniMax-M2.5 with LM Studio on a regional server, and deep analysis hits DeepSeek-R1 on university HPC
- All four tiers - edge, local hub, regional, datacenter - route correctly from the Jetson in a single LiteLLM config
- A key gotcha was using LiteLLM's llamafile/ provider instead of openai/ for llama.cpp embeddings - the OpenAI SDK sends a null encoding format that llama.cpp rejects
- The UIUC KNN router also works on the Jetson but only inside a Docker container - the Jetson's ancient glibc 2.27 blocks native install
- Longformer embeddings for the KNN classifier take ~6 seconds per query on the Jetson's Cortex-A57 CPU vs sub-second on an M3 Mac (would need more investigation to figure out if GPU acceleration can be made to work in that age of Jetson with modern Python tools)
- GPU acceleration in Docker is blocked because PyTorch's aarch64 wheels on PyPI are CPU-only - no CUDA support (but there may be a way to build a custom Dockerfile)
- NVIDIA's Triton-based router is too heavy for 4GB and will likely stay on the farm hub (or require a newer/bigger Jetson model)
- https://github.com/pickettd/litellm-local-semantic-router-example
- https://github.com/pickettd/local-uiuc-llmrouter-example