Segformer from HuggingFace

Deploy SegFormer to serverless machines with 4 lines of code

In this tutorial we're going to demonstrate how well we are integrated with amazing HugginFace libraries by deploying SegFormer computer vision model with zero effort.

How to deploy SegFormer on Everinfer

Install Everinfer and HuggingFace transformers library.

Convert the model to ONNX format:

!python3 -m transformers.onnx --model=nvidia/segformer-b0-finetuned-ade-512-512 onnx_segformer/

Authenticate on Everinfer using your API key, upload the model, and create inference engine:

from everinfer import Client
client = Client('my_api_key') # hit us up on to get your key
pipeline = client.register_pipeline('segformer', ['onnx_segformer/model.onnx'])
runner = client.create_engine(pipeline['uuid'])

You are ready to go, only 4 lines of code to deploy your model to remote GPUs!

Since HuggingFace image preprocessors are fully compatible with Everinfer expected input format, you can feed tokenizer outputs directly to the deployed model:

from transformers import SegformerFeatureExtractor
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")

url = ""
image =, stream=True).raw)
inputs = feature_extractor(images=image, return_tensors="np")

preds = runner.predict([inputs]) # runs on remote hardware!

Everinfer is highly efficient as it is, even while transferring tensors over the network, let's check how fast the deployed model is:

It is possible to speed up that deployment even further by fusing pre-processing step into an ONNX graph and transferring only raw .jpg images over the network.

Check out our Faster-RCNN tutorial as an example

Last updated