Skip to Content

Inference & fine-tuning engineer (models, performance & security)

Stockholm, Sweden

We’re Hiring: Inference & Fine-tuning Engineer

Stockholm | Help Build Europe’s Sovereign AI Stack

Berget AI builds and operates sovereign AI infrastructure on our own GPU hardware. We run large-scale inference in production and are expanding into fine-tuning and post-training for real customer workloads. We’re looking for an inference engineer who masters model bring-up, performance tuning, and secure operation from model release to GPU.

What you’ll work on

You will rapidly evaluate, integrate, and bring new state-of-the-art models into production inference environments. This includes understanding model architectures, dependencies, runtime constraints, and quickly getting models running reliably on our GPU stack.

You’ll optimize inference runtimes and serving stacks (vLLM, Triton, SGLang, CUDA/ROCm), tuning metaparameters such as batching, parallelism, memory layouts, quantization strategies, and scheduling to maximize throughput, minimize latency, and control cost per token.

You will design and operate fine-tuning and post-training workflows: data preparation, training configuration, evaluation, model packaging, and safe rollout into production inference systems.

You’ll work extensively with caching strategies at multiple levels (model, KV/cache, request/result caching) to improve performance, efficiency, and isolation in multi-tenant environments.

You’ll extend Kubernetes-native orchestration for large-scale model serving, profile bottlenecks, benchmark improvements, and continuously harden reliability, observability, and security. Incident response, secure model handling, and controlled rollout are core parts of the role.

What you bring

You closely follow the latest open and commercial model releases and enjoy getting new models running fast in real systems.

You have hands-on experience with inference and ML runtimes such as vLLM, Triton, SGLang, CUDA or ROCm, and understand how model architecture, runtime configuration, and hardware interact.

You’re familiar with fine-tuning or post-training techniques and understand the operational and security implications of shipping trained models into production.

You’re comfortable working in Kubernetes-based environments, think practically about caching and isolation, and treat performance, reliability, and security as inseparable concerns.

Why Berget

You’ll have real ownership over how models are onboarded, optimized, and served in one of Europe’s most ambitious sovereign AI platforms. You’ll work close to hardware, platform, and product, with freedom to push performance and define best practices from day one.

Interested?

Drop us a short note about yourself and links to recent projects or contributions:

📬 jobs@berget.ai

Let’s build the future of sovereign AI in Europe—together.

Note: Only EU Citizens can apply