# Introduction

Everinfer is a system that offloads inference of ONNX graphs to remote GPUs.

## Core Features

* :cloud: **Remote GPU resources:** Run your ONNX-compatible models on remote GPUs that are managed by Everinfer.&#x20;
* :zap: **Peer-to-peer communication:** Inference client connects directly to highly optimized C++ ONNX runtimes, running on remote GPUs. Overhead as low as **2ms** :zap: is possible and enables near-real-time applications.&#x20;
* :fire: **Instant cold starts:** No Docker or Git is involved, cold start time is limited by model download time only and is close to theoretical limit. (e.g. BERT cold start **<= 1s**).&#x20;
* :chart\_with\_upwards\_trend: **Scalability:** Our architecture allows linear horizontal scaling — scale to 1000s of RPS with no extra effort on your side.&#x20;
* :inbox\_tray: **Model storage:** Upload your models once and reuse them infinitely.&#x20;
* :clap: **Minimalistic SDK:** Client-side SDK is open-source, provides simple Pythonic primitives, and is extremely easy to use — you are free to add complexity as needed.
* :bank: **On-premise deployment:** Want to use your own hardware for added security, or mix and match your hardware with external computing power? On-premise deployment is possible. Contact us!
* :hourglass\_flowing\_sand:**\[COMING SOON]: Multiple runtimes supported:** TensorRT, torchscript and tflite support through unified interface.

> Feel free to contact us even if you are a sole developer. We are quick to respond and ready to give out API keys and provide demos — <hello@everinfer.ai>

## Superior tech

We use :zap:blazing fast:zap: buzzword-worthy stack to ensure Everinfer technical superiority. Honourable mentions: [ONNX](https://onnx.ai/), [Rust](https://www.rust-lang.org/), [Zenoh](https://zenoh.io/), [FlatBuffers](https://google.github.io/flatbuffers/), [ScyllaDB](https://www.scylladb.com/). &#x20;

## Quick links&#x20;

* See the [simplest example](https://docs.everinfer.ai/getting-started/basics) of Everinfer in action.&#x20;
* Want to skip the boring parts and dive straight in? Take a look at how you could [deploy Faster-RCNN](https://docs.everinfer.ai/getting-started/faster-rcnn-example) while fusing pre- and post-processing in a single graph with the model.&#x20;
* Doubt latency and scalability claims? Take a look at [GPT-2 running at 900 RPS](https://docs.everinfer.ai/examples/gpt2-900+rps), still with four lines of code.&#x20;
* [Stable Diffusion demo](https://docs.everinfer.ai/examples/stable-diffusion-decouple-gpu-ops-from-code) - offload U-net to remote GPUs, while running lightweight models locally.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.everinfer.ai/introduction.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
