Edge-First LLM Semantic Routing on a 4GB Jetson Nano

People: David Pickett

Idea: Testing whether a 4GB NVIDIA Jetson Nano can act as an autonomous routing brain - classifying incoming queries with a local embedding model and deciding to answer locally or escalate to more powerful servers across four compute tiers.

Details:

  • The Jetson Nano can run llama.cpp natively with two models - nomic-embed-text for embeddings and gemma-3-1b for chat - both fitting comfortably in 4GB RAM
  • LiteLLM's semantic router runs on the Jetson using the local embedding model to classify queries, adding only ~300MB of overhead
  • Simple questions like "What is the capital of France?" get answered entirely on-device with no network calls - critical for spotty connectivity
  • Coding queries tested to route to a 24GB GPU local hub running Ollama, complex reasoning goes to MiniMax-M2.5 with LM Studio on a regional server, and deep analysis hits DeepSeek-R1 on university HPC
  • All four tiers - edge, local hub, regional, datacenter - route correctly from the Jetson in a single LiteLLM config
  • A key gotcha was using LiteLLM's llamafile/ provider instead of openai/ for llama.cpp embeddings - the OpenAI SDK sends a null encoding format that llama.cpp rejects
  • The UIUC KNN router also works on the Jetson but only inside a Docker container - the Jetson's ancient glibc 2.27 blocks native install
  • Longformer embeddings for the KNN classifier take ~6 seconds per query on the Jetson's Cortex-A57 CPU vs sub-second on an M3 Mac (would need more investigation to figure out if GPU acceleration can be made to work in that age of Jetson with modern Python tools)
  • GPU acceleration in Docker is blocked because PyTorch's aarch64 wheels on PyPI are CPU-only - no CUDA support (but there may be a way to build a custom Dockerfile)
  • NVIDIA's Triton-based router is too heavy for 4GB and will likely stay on the farm hub (or require a newer/bigger Jetson model)
  • https://github.com/pickettd/litellm-local-semantic-router-example
  • https://github.com/pickettd/local-uiuc-llmrouter-example

Read more