Everinfer is a system that offloads inference of ONNX graphs to remote GPUs.

Core Features

  • Remote GPU resources: Run your ONNX-compatible models on remote GPUs, that are managed by Everinfer.
  • Peer-to-peer communication: Inference client connects directly to highly optimized C++ ONNX runtimes, running on remote GPUs. Overhead as low as 2ms
    is possible and enables near-real-time applications.
  • 📈
    Scalability: Our architecture allows linear horizontal scaling — scale to 1000s of RPS with no extra effort on your side.
  • 📥
    Model storage: Upload your models once and reuse them infinitely.
  • 👏
    Minimalistic SDK: Client-side SDK is open-source, provides simple Pythonic primitives, and is extremely easy to use — you are free to add complexity as needed.
  • 🏦
    On-premise deployment: Want to use your own hardware for added security, or mix and match your hardware with external computing power? On-premise deployment is possible, contact us!
Feel free to contact us even if you are a sole developer. We are quick to respond and ready to give out API keys and provide demos — [email protected]

Superior tech

We use
blazing fast
buzzword-worthy stack to ensure Everinfer technical superiority. Honourable mentions: ONNX, Rust, Zenoh, FlatBuffers, ScyllaDB.
Last modified 6mo ago