# Introduction

Everinfer is a system that offloads inference of ONNX graphs to remote GPUs.

## Core Features

* :cloud: **Remote GPU resources:** Run your ONNX-compatible models on remote GPUs that are managed by Everinfer.&#x20;
* :zap: **Peer-to-peer communication:** Inference client connects directly to highly optimized C++ ONNX runtimes, running on remote GPUs. Overhead as low as **2ms** :zap: is possible and enables near-real-time applications.&#x20;
* :fire: **Instant cold starts:** No Docker or Git is involved, cold start time is limited by model download time only and is close to theoretical limit. (e.g. BERT cold start **<= 1s**).&#x20;
* :chart\_with\_upwards\_trend: **Scalability:** Our architecture allows linear horizontal scaling — scale to 1000s of RPS with no extra effort on your side.&#x20;
* :inbox\_tray: **Model storage:** Upload your models once and reuse them infinitely.&#x20;
* :clap: **Minimalistic SDK:** Client-side SDK is open-source, provides simple Pythonic primitives, and is extremely easy to use — you are free to add complexity as needed.
* :bank: **On-premise deployment:** Want to use your own hardware for added security, or mix and match your hardware with external computing power? On-premise deployment is possible. Contact us!
* :hourglass\_flowing\_sand:**\[COMING SOON]: Multiple runtimes supported:** TensorRT, torchscript and tflite support through unified interface.

> Feel free to contact us even if you are a sole developer. We are quick to respond and ready to give out API keys and provide demos — <hello@everinfer.ai>

## Superior tech

We use :zap:blazing fast:zap: buzzword-worthy stack to ensure Everinfer technical superiority. Honourable mentions: [ONNX](https://onnx.ai/), [Rust](https://www.rust-lang.org/), [Zenoh](https://zenoh.io/), [FlatBuffers](https://google.github.io/flatbuffers/), [ScyllaDB](https://www.scylladb.com/). &#x20;

## Quick links&#x20;

* See the [simplest example](https://docs.everinfer.ai/getting-started/basics) of Everinfer in action.&#x20;
* Want to skip the boring parts and dive straight in? Take a look at how you could [deploy Faster-RCNN](https://docs.everinfer.ai/getting-started/faster-rcnn-example) while fusing pre- and post-processing in a single graph with the model.&#x20;
* Doubt latency and scalability claims? Take a look at [GPT-2 running at 900 RPS](https://docs.everinfer.ai/examples/gpt2-900+rps), still with four lines of code.&#x20;
* [Stable Diffusion demo](https://docs.everinfer.ai/examples/stable-diffusion-decouple-gpu-ops-from-code) - offload U-net to remote GPUs, while running lightweight models locally.
