MLOS E2E Validation Report

Axon v3.1.9 + MLOS Core 7.0.0

Success Rate

100.0%

Models Tested

32

Inferences

64/64

Total Time

2925.0s
📋 Test Details 📦 All Models

Model Validation Results

NLP Models

  • GPT-2
  • BERT
  • RoBERTa
  • T5
Status: ✅ Passing

Vision Models

  • ResNet-50
  • ViT
  • ConvNeXt
  • MobileNet
  • DeiT
  • EfficientNet
Status: ✅ Passing

Multi-Modal

  • CLIP
  • Wav2Vec2
Status: ✅ Passing

Inference Performance

Kernel Module Performance

KERNEL ENABLED

Performance metrics from E2E tests run with the MLOS kernel module (mlos-ml.ko) loaded. The kernel module provides zero-copy tensor operations, ML-aware scheduling, and optimized memory management.

Kernel Module

v7.0.0
Mode: Memory + ML Scheduler

Test Environment

Linux (x86_64)
Ubuntu 24.04.2 LTS

CPU

Intel Xeon E3-12xx v2 (Ivy Bridge)
Cores: 16 | Threads: 16

Memory

31 GB
GPU: None

Performance Comparison: Kernel vs Userspace

Small Inference

0.97x
Userspace faster

Large Inference

1.02x
Kernel faster

Overall

1.02x
Combined average

Small Inference (Short Inputs)

Kernel faster: 8 models | Userspace faster: 22 models

Model Kernel Userspace Speedup
NLP Models
ALBERT 331.0 ms 456.0 ms +38.0%
ROBERTA 382.0 ms 412.0 ms +8.0%
GTE-BASE 149.0 ms 156.0 ms +5.0%
GTE-SMALL 116.0 ms 119.0 ms +3.0%
E5-SMALL 124.0 ms 125.0 ms +1.0%
REGNET 1243.0 ms 1241.0 ms 0.0%
POOLFORMER 1272.0 ms 1239.0 ms -3.0%
BEIT 1540.0 ms 1485.0 ms -4.0%
BGE-SMALL 122.0 ms 117.0 ms -4.0%
DISTILROBERTA 217.0 ms 208.0 ms -4.0%
SENTENCE-TRANSFORMERS 172.0 ms 165.0 ms -4.0%
BGE-BASE 166.0 ms 157.0 ms -5.0%
E5-BASE 150.0 ms 142.0 ms -5.0%
BART-BASE 206.0 ms 193.0 ms -6.0%
SQUEEZEBERT 140.0 ms 131.0 ms -6.0%
MINILM 187.0 ms 174.0 ms -7.0%
BERT 303.0 ms 273.0 ms -10.0%
DISTILBERT 156.0 ms 136.0 ms -13.0%
T5 168.0 ms 145.0 ms -14.0%
CONVNEXT-SMALL 1849.0 ms 1557.0 ms -16.0%
GPT2 199.0 ms 145.0 ms -27.0%
Vision Models
RESNET 1233.0 ms 1295.0 ms +5.0%
DEIT 1284.0 ms 1312.0 ms +2.0%
EFFICIENTNET 1186.0 ms 1202.0 ms +1.0%
MOBILENET 1172.0 ms 1168.0 ms 0.0%
VIT 1577.0 ms 1554.0 ms -1.0%
CONVNEXT 1412.0 ms 1344.0 ms -5.0%
Multimodal Models
CLIP 1369.0 ms 1311.0 ms -4.0%
LLM Models
DEEPSEEK-CODER-1.3B 3289.0 ms 3247.0 ms -1.0%
LLAMA-3.2-1B 505.0 ms 482.0 ms -5.0%
TINYLLAMA 707.0 ms 648.0 ms -8.0%
QWEN2-0.5B 368.0 ms 328.0 ms -11.0%
Average: 0.97x

Large Inference (Long Inputs)

Kernel faster: 15 models | Userspace faster: 16 models

Model Kernel Userspace Speedup
NLP Models
DISTILBERT 268.0 ms 418.0 ms +56.0%
MINILM 156.0 ms 181.0 ms +16.0%
E5-SMALL 108.0 ms 118.0 ms +9.0%
ROBERTA 1705.0 ms 1862.0 ms +9.0%
E5-BASE 151.0 ms 163.0 ms +8.0%
BGE-BASE 134.0 ms 144.0 ms +7.0%
REGNET 1228.0 ms 1289.0 ms +5.0%
T5 208.0 ms 219.0 ms +5.0%
GTE-BASE 144.0 ms 149.0 ms +3.0%
SENTENCE-TRANSFORMERS 178.0 ms 183.0 ms +3.0%
GTE-SMALL 117.0 ms 119.0 ms +2.0%
CONVNEXT-SMALL 1590.0 ms 1552.0 ms -2.0%
POOLFORMER 1320.0 ms 1288.0 ms -2.0%
SQUEEZEBERT 146.0 ms 141.0 ms -3.0%
BEIT 1517.0 ms 1460.0 ms -4.0%
BERT 1314.0 ms 1246.0 ms -5.0%
BGE-SMALL 120.0 ms 114.0 ms -5.0%
BART-BASE 189.0 ms 178.0 ms -6.0%
GPT2 380.0 ms 357.0 ms -6.0%
ALBERT 1546.0 ms 1439.0 ms -7.0%
DISTILROBERTA 201.0 ms 187.0 ms -7.0%
Vision Models
VIT 1591.0 ms 1725.0 ms +8.0%
DEIT 1301.0 ms 1359.0 ms +4.0%
CONVNEXT 1352.0 ms 1367.0 ms +1.0%
MOBILENET 1186.0 ms 1170.0 ms -1.0%
EFFICIENTNET 1205.0 ms 1181.0 ms -2.0%
RESNET 1466.0 ms 1232.0 ms -16.0%
Multimodal Models
CLIP 1360.0 ms 1388.0 ms +2.0%
LLM Models
QWEN2-0.5B 6392.0 ms 6414.0 ms 0.0%
DEEPSEEK-CODER-1.3B 11937.0 ms 11691.0 ms -2.0%
TINYLLAMA 9022.0 ms 8838.0 ms -2.0%
LLAMA-3.2-1B 10667.0 ms 10322.0 ms -3.0%
Average: 1.02x

Kernel Module Benefits

Zero-Copy Tensors

Direct GPU memory access without CPU copies

ML-Aware Scheduling

Priority-based inference queue management

Memory Management

LRU eviction and memory pool optimization

Secure Isolation

Kernel-level inference isolation

Kernel test timestamp: 2026-01-07T20:03:57.868110Z

Environment Details

Operating System

Linux Ubuntu 24.04.2 LTS
x86_64

CPU

Intel Xeon E3-12xx v2 (Ivy Bridge)
16 cores / 16 threads

Memory

31 GB

GPU

0 GPU(s) •

Runtime Mode

Unknown (scheduler)
Kernel: Yes

Disk

242G
Available: 166G

Resource Usage

MLOS Core (Idle)

CPU: 0%
Memory: 0 MB

MLOS Core (Load)

CPU: 0% avg
Peak: 0% • Mem: 0 MB

Axon

CPU: 0%
Memory: 0 MB

GPU Status

Not used (CPU-only inference)

Installation & Setup

Axon Download

0 ms

Core Download

0 ms

Core Startup

0 ms

Model Install

46.6 min

Detailed Model Metrics

NLP Models

ALBERT

Small Inference
456 ms
Large Inference
1.4s

E5-SMALL

Small Inference
125 ms
Large Inference
118 ms

GPT2

Small Inference
145 ms
Large Inference
357 ms

SQUEEZEBERT

Small Inference
131 ms
Large Inference
141 ms

T5

Small Inference
145 ms
Large Inference
219 ms

MINILM

Small Inference
174 ms
Large Inference
181 ms

POOLFORMER

Small Inference
1.2s
Large Inference
1.3s

BERT

Small Inference
273 ms
Large Inference
1.2s

BEIT

Small Inference
1.5s
Large Inference
1.5s

ROBERTA

Small Inference
412 ms
Large Inference
1.9s

SENTENCE-TRANSFORMERS

Small Inference
165 ms
Large Inference
183 ms

BART-BASE

Small Inference
193 ms
Large Inference
178 ms

GTE-BASE

Small Inference
156 ms
Large Inference
149 ms

BGE-BASE

Small Inference
157 ms
Large Inference
144 ms

DISTILROBERTA

Small Inference
208 ms
Large Inference
187 ms

DISTILBERT

Small Inference
136 ms
Large Inference
418 ms

BGE-SMALL

Small Inference
117 ms
Large Inference
114 ms

REGNET

Small Inference
1.2s
Large Inference
1.3s

GTE-SMALL

Small Inference
119 ms
Large Inference
119 ms

E5-BASE

Small Inference
142 ms
Large Inference
163 ms

CONVNEXT-SMALL

Small Inference
1.6s
Large Inference
1.6s

Vision Models

EFFICIENTNET

Small Inference
1.2s
Large Inference
1.2s

VIT

Small Inference
1.6s
Large Inference
1.7s

RESNET

Small Inference
1.3s
Large Inference
1.2s

MOBILENET

Small Inference
1.2s
Large Inference
1.2s

CONVNEXT

Small Inference
1.3s
Large Inference
1.4s

DEIT

Small Inference
1.3s
Large Inference
1.4s

Multimodal Models

CLIP

Small Inference
1.3s
Large Inference
1.4s

LLM Models

LLAMA-3.2-1B

Small Inference
482 ms
Large Inference
10.3s

TINYLLAMA

Small Inference
648 ms
Large Inference
8.8s

QWEN2-0.5B

Small Inference
328 ms
Large Inference
6.4s

DEEPSEEK-CODER-1.3B

Small Inference
3.2s
Large Inference
11.7s

NLP Models

ALBERT

Install Time
1.2 min
Register Time
1.0s

E5-SMALL

Install Time
27.3s
Register Time
730 ms

GPT2

Install Time
2.0 min
Register Time
1.6s

SQUEEZEBERT

Install Time
1.5 min
Register Time
1.3s

T5

Install Time
1.6 min
Register Time
1.5s

MINILM

Install Time
1.7 min
Register Time
1.3s

POOLFORMER

Install Time
57.9s
Register Time
563 ms

BERT

Install Time
46.0s
Register Time
1.8s

BEIT

Install Time
2.1 min
Register Time
2.0s

ROBERTA

Install Time
3.5 min
Register Time
3.6s

SENTENCE-TRANSFORMERS

Install Time
48.7s
Register Time
611 ms

BART-BASE

Install Time
5.0 min
Register Time
3.4s

GTE-BASE

Install Time
1.3 min
Register Time
1.7s

BGE-BASE

Install Time
48.0s
Register Time
1.6s

DISTILROBERTA

Install Time
2.3 min
Register Time
3.1s

DISTILBERT

Install Time
1.7 min
Register Time
1.5s

BGE-SMALL

Install Time
20.0s
Register Time
709 ms

REGNET

Install Time
1.3 min
Register Time
683 ms

GTE-SMALL

Install Time
28.0s
Register Time
839 ms

E5-BASE

Install Time
1.2 min
Register Time
1.8s

CONVNEXT-SMALL

Install Time
1.6 min
Register Time
1.1s

Vision Models

EFFICIENTNET

Install Time
1.0 min
Register Time
414 ms

VIT

Install Time
2.2 min
Register Time
1.5s

RESNET

Install Time
1.2 min
Register Time
815 ms

MOBILENET

Install Time
58.8s
Register Time
337 ms

CONVNEXT

Install Time
1.3 min
Register Time
733 ms

DEIT

Install Time
1.3 min
Register Time
614 ms

Multimodal Models

CLIP

Install Time
2.6 min
Register Time
2.5s

LLM Models

LLAMA-3.2-1B

Install Time
1.1 min
Register Time
2.0s

TINYLLAMA

Install Time
53.1s
Register Time
1.6s

QWEN2-0.5B

Install Time
31.9s
Register Time
886 ms

DEEPSEEK-CODER-1.3B

Install Time
1.2 min
Register Time
2.4s

Historical Statistics

26 RUNS

Performance data aggregated across 26 test runs (from 2025-12-26 18:50 to 2026-01-07 20:03)

Total Runs

26
Historical data points

Kernel Speedup (Historical)

1.01x
Median: 1.01x | StdDev: 0.03

Inference Time Statistics: Mean / Median / StdDev

Model Mean Median StdDev Range Runs Consistency
NLP Models
DEBERTA 102.0 ms 102.0 ms 0.0 ms 102.0 - 102.0 ms 1 Stable
DINOV2 1172.0 ms 1172.0 ms 0.0 ms 1172.0 - 1172.0 ms 1 Stable
EFFICIENTNET-B1 1173.0 ms 1173.0 ms 0.0 ms 1173.0 - 1173.0 ms 1 Stable
EFFICIENTNET-B2 1175.0 ms 1175.0 ms 0.0 ms 1175.0 - 1175.0 ms 1 Stable
ELECTRA 156.0 ms 156.0 ms 0.0 ms 156.0 - 156.0 ms 1 Stable
LEVIT 1283.0 ms 1283.0 ms 0.0 ms 1283.0 - 1283.0 ms 1 Stable
MPNET 128.0 ms 128.0 ms 0.0 ms 128.0 - 128.0 ms 1 Stable
XLM-ROBERTA 87.0 ms 87.0 ms 0.0 ms 87.0 - 87.0 ms 1 Stable
BEIT 1486.6 ms 1489.5 ms 40.5 ms 1407.0 - 1555.0 ms 14 Stable
CONVNEXT-SMALL 1536.1 ms 1543.5 ms 45.1 ms 1460.0 - 1601.0 ms 12 Stable
POOLFORMER 1276.9 ms 1263.5 ms 43.6 ms 1232.0 - 1380.0 ms 14 Stable
REGNET 1288.5 ms 1252.5 ms 81.0 ms 1202.0 - 1463.0 ms 14 Stable
GTE-SMALL 117.2 ms 119.0 ms 7.9 ms 100.0 - 129.0 ms 12 Stable
SQUEEZEBERT 137.1 ms 135.5 ms 9.9 ms 122.0 - 155.0 ms 14 Stable
BGE-SMALL 118.4 ms 121.0 ms 8.7 ms 98.0 - 128.0 ms 12 Stable
MINILM 173.9 ms 178.5 ms 13.4 ms 147.0 - 192.0 ms 14 Stable
BART-BASE 183.1 ms 185.5 ms 15.3 ms 153.0 - 219.0 ms 14 Stable
E5-SMALL 116.2 ms 117.5 ms 10.3 ms 92.0 - 130.0 ms 12 Stable
DISTILROBERTA 192.1 ms 195.5 ms 18.1 ms 163.0 - 217.0 ms 14 Stable
E5-BASE 139.2 ms 141.5 ms 13.8 ms 100.0 - 157.0 ms 12 Stable
ROBERTA 396.7 ms 379.5 ms 46.3 ms 322.0 - 510.0 ms 26 Moderate
BGE-BASE 146.4 ms 150.0 ms 17.2 ms 111.0 - 173.0 ms 12 Moderate
BERT 308.1 ms 295.5 ms 39.3 ms 273.0 - 447.0 ms 26 Moderate
GTE-BASE 151.6 ms 146.0 ms 24.6 ms 133.0 - 223.0 ms 12 Moderate
ALBERT 339.8 ms 307.5 ms 61.4 ms 271.0 - 502.0 ms 24 Moderate
T5 158.5 ms 148.5 ms 39.6 ms 118.0 - 295.0 ms 26 Variable
DISTILBERT 166.3 ms 149.0 ms 45.1 ms 127.0 - 308.0 ms 26 Variable
SENTENCE-TRANSFORMERS 187.5 ms 172.0 ms 53.9 ms 146.0 - 366.0 ms 26 Variable
GPT2 156.3 ms 141.5 ms 56.2 ms 90.0 - 383.0 ms 26 Variable
Vision Models
MOBILENET 1188.4 ms 1178.0 ms 42.8 ms 1150.0 - 1338.0 ms 26 Stable
EFFICIENTNET 1202.5 ms 1188.5 ms 47.3 ms 1142.0 - 1343.0 ms 26 Stable
VIT 1517.8 ms 1498.5 ms 66.6 ms 1342.0 - 1643.0 ms 26 Stable
CONVNEXT 1364.5 ms 1337.0 ms 75.3 ms 1280.0 - 1551.0 ms 26 Stable
DEIT 1326.4 ms 1306.0 ms 80.3 ms 1221.0 - 1550.0 ms 26 Stable
RESNET 1269.8 ms 1232.0 ms 77.6 ms 1185.0 - 1464.0 ms 26 Stable
Multimodal Models
CLIP 1342.8 ms 1319.0 ms 84.2 ms 1258.0 - 1672.0 ms 26 Stable
LLM Models
DEEPSEEK-CODER-1.3B 3330.3 ms 3314.0 ms 305.1 ms 2919.0 - 4118.0 ms 26 Stable
TINYLLAMA 670.2 ms 650.5 ms 70.6 ms 607.0 - 864.0 ms 26 Moderate
QWEN2-0.5B 360.5 ms 351.0 ms 44.3 ms 297.0 - 501.0 ms 26 Moderate
LLAMA-3.2-1B 533.6 ms 511.5 ms 70.6 ms 477.0 - 777.0 ms 26 Moderate
Total Models Tracked: 44

Understanding Consistency

Stable (CV < 10%)

Highly consistent performance across runs

Moderate (CV 10-25%)

Some variance, generally acceptable

Variable (CV > 25%)

High variance, investigate causes

Time Distribution

Model Installation Time

46.6 min

Includes HuggingFace download + ONNX conversion (~99% of total)