MLOS Release E2E Validation Report

Model Validation Results

NLP Models

✅ GPT-2
✅ BERT
✅ RoBERTa
✅ T5

Status: ✅ Passing

Vision Models

Status: ✅ Passing

Multi-Modal

✅ CLIP
⏳ Wav2Vec2

Status: ✅ Passing

Inference Performance

Kernel Module Performance

KERNEL ENABLED

Performance metrics from E2E tests run with the MLOS kernel module (mlos-ml.ko) loaded. The kernel module provides zero-copy tensor operations, ML-aware scheduling, and optimized memory management.

Kernel Module

v7.0.0

Mode: Memory + ML Scheduler

Test Environment

Linux (x86_64)

Ubuntu 24.04.2 LTS

CPU

Intel Xeon E3-12xx v2 (Ivy Bridge)

Cores: 16 | Threads: 16

Memory

31 GB

GPU: None

Performance Comparison: Kernel vs Userspace

Small Inference

0.97x

Userspace faster

Large Inference

1.02x

Kernel faster

Overall

1.02x

Combined average

Small Inference (Short Inputs)

Kernel faster: 8 models | Userspace faster: 22 models

Model	Kernel	Userspace	Speedup
NLP Models
ALBERT	331.0 ms ▼	456.0 ms ▲	+38.0%
ROBERTA	382.0 ms ▼	412.0 ms ▲	+8.0%
GTE-BASE	149.0 ms ▼	156.0 ms ▲	+5.0%
GTE-SMALL	116.0 ms ▼	119.0 ms ▲	+3.0%
E5-SMALL	124.0 ms ▼	125.0 ms ▲	+1.0%
REGNET	1243.0 ms ▲	1241.0 ms ▼	0.0%
POOLFORMER	1272.0 ms ▲	1239.0 ms ▼	-3.0%
BEIT	1540.0 ms ▲	1485.0 ms ▼	-4.0%
BGE-SMALL	122.0 ms ▲	117.0 ms ▼	-4.0%
DISTILROBERTA	217.0 ms ▲	208.0 ms ▼	-4.0%
SENTENCE-TRANSFORMERS	172.0 ms ▲	165.0 ms ▼	-4.0%
BGE-BASE	166.0 ms ▲	157.0 ms ▼	-5.0%
E5-BASE	150.0 ms ▲	142.0 ms ▼	-5.0%
BART-BASE	206.0 ms ▲	193.0 ms ▼	-6.0%
SQUEEZEBERT	140.0 ms ▲	131.0 ms ▼	-6.0%
MINILM	187.0 ms ▲	174.0 ms ▼	-7.0%
BERT	303.0 ms ▲	273.0 ms ▼	-10.0%
DISTILBERT	156.0 ms ▲	136.0 ms ▼	-13.0%
T5	168.0 ms ▲	145.0 ms ▼	-14.0%
CONVNEXT-SMALL	1849.0 ms ▲	1557.0 ms ▼	-16.0%
GPT2	199.0 ms ▲	145.0 ms ▼	-27.0%
Vision Models
RESNET	1233.0 ms ▼	1295.0 ms ▲	+5.0%
DEIT	1284.0 ms ▼	1312.0 ms ▲	+2.0%
EFFICIENTNET	1186.0 ms ▼	1202.0 ms ▲	+1.0%
MOBILENET	1172.0 ms ▲	1168.0 ms ▼	0.0%
VIT	1577.0 ms ▲	1554.0 ms ▼	-1.0%
CONVNEXT	1412.0 ms ▲	1344.0 ms ▼	-5.0%
Multimodal Models
CLIP	1369.0 ms ▲	1311.0 ms ▼	-4.0%
LLM Models
DEEPSEEK-CODER-1.3B	3289.0 ms ▲	3247.0 ms ▼	-1.0%
LLAMA-3.2-1B	505.0 ms ▲	482.0 ms ▼	-5.0%
TINYLLAMA	707.0 ms ▲	648.0 ms ▼	-8.0%
QWEN2-0.5B	368.0 ms ▲	328.0 ms ▼	-11.0%
Average:			0.97x

Large Inference (Long Inputs)

Kernel faster: 15 models | Userspace faster: 16 models

Model	Kernel	Userspace	Speedup
NLP Models
DISTILBERT	268.0 ms ▼	418.0 ms ▲	+56.0%
MINILM	156.0 ms ▼	181.0 ms ▲	+16.0%
E5-SMALL	108.0 ms ▼	118.0 ms ▲	+9.0%
ROBERTA	1705.0 ms ▼	1862.0 ms ▲	+9.0%
E5-BASE	151.0 ms ▼	163.0 ms ▲	+8.0%
BGE-BASE	134.0 ms ▼	144.0 ms ▲	+7.0%
REGNET	1228.0 ms ▼	1289.0 ms ▲	+5.0%
T5	208.0 ms ▼	219.0 ms ▲	+5.0%
GTE-BASE	144.0 ms ▼	149.0 ms ▲	+3.0%
SENTENCE-TRANSFORMERS	178.0 ms ▼	183.0 ms ▲	+3.0%
GTE-SMALL	117.0 ms ▼	119.0 ms ▲	+2.0%
CONVNEXT-SMALL	1590.0 ms ▲	1552.0 ms ▼	-2.0%
POOLFORMER	1320.0 ms ▲	1288.0 ms ▼	-2.0%
SQUEEZEBERT	146.0 ms ▲	141.0 ms ▼	-3.0%
BEIT	1517.0 ms ▲	1460.0 ms ▼	-4.0%
BERT	1314.0 ms ▲	1246.0 ms ▼	-5.0%
BGE-SMALL	120.0 ms ▲	114.0 ms ▼	-5.0%
BART-BASE	189.0 ms ▲	178.0 ms ▼	-6.0%
GPT2	380.0 ms ▲	357.0 ms ▼	-6.0%
ALBERT	1546.0 ms ▲	1439.0 ms ▼	-7.0%
DISTILROBERTA	201.0 ms ▲	187.0 ms ▼	-7.0%
Vision Models
VIT	1591.0 ms ▼	1725.0 ms ▲	+8.0%
DEIT	1301.0 ms ▼	1359.0 ms ▲	+4.0%
CONVNEXT	1352.0 ms ▼	1367.0 ms ▲	+1.0%
MOBILENET	1186.0 ms ▲	1170.0 ms ▼	-1.0%
EFFICIENTNET	1205.0 ms ▲	1181.0 ms ▼	-2.0%
RESNET	1466.0 ms ▲	1232.0 ms ▼	-16.0%
Multimodal Models
CLIP	1360.0 ms ▼	1388.0 ms ▲	+2.0%
LLM Models
QWEN2-0.5B	6392.0 ms ▼	6414.0 ms ▲	0.0%
DEEPSEEK-CODER-1.3B	11937.0 ms ▲	11691.0 ms ▼	-2.0%
TINYLLAMA	9022.0 ms ▲	8838.0 ms ▼	-2.0%
LLAMA-3.2-1B	10667.0 ms ▲	10322.0 ms ▼	-3.0%
Average:			1.02x

Kernel Module Benefits

Zero-Copy Tensors

Direct GPU memory access without CPU copies

ML-Aware Scheduling

Priority-based inference queue management

Memory Management

LRU eviction and memory pool optimization

Secure Isolation

Kernel-level inference isolation

Kernel test timestamp: 2026-01-07T20:03:57.868110Z

Environment Details

Operating System

Linux Ubuntu 24.04.2 LTS

x86_64

CPU

Intel Xeon E3-12xx v2 (Ivy Bridge)

16 cores / 16 threads

Memory

31 GB

GPU

0 GPU(s) •

Runtime Mode

Unknown (scheduler)

Kernel: Yes

Disk

242G

Available: 166G

Resource Usage

MLOS Core (Idle)

CPU: 0%

Memory: 0 MB

MLOS Core (Load)

CPU: 0% avg

Peak: 0% • Mem: 0 MB

Axon

CPU: 0%

Memory: 0 MB

GPU Status

Not used (CPU-only inference)

Installation & Setup

Axon Download

0 ms

Core Download

0 ms

Core Startup

0 ms

Model Install

46.6 min

Detailed Model Metrics

NLP Models

ALBERT

✅

Small Inference

456 ms

Large Inference

1.4s

E5-SMALL

✅

Small Inference

125 ms

Large Inference

118 ms

GPT2

✅

Small Inference

145 ms

Large Inference

357 ms

SQUEEZEBERT

✅

Small Inference

131 ms

Large Inference

141 ms

T5

✅

Small Inference

145 ms

Large Inference

219 ms

MINILM

✅

Small Inference

174 ms

Large Inference

181 ms

POOLFORMER

✅

Small Inference

1.2s

Large Inference

1.3s

BERT

✅

Small Inference

273 ms

Large Inference

1.2s

BEIT

✅

Small Inference

1.5s

Large Inference

1.5s

ROBERTA

✅

Small Inference

412 ms

Large Inference

1.9s

SENTENCE-TRANSFORMERS

✅

Small Inference

165 ms

Large Inference

183 ms

BART-BASE

✅

Small Inference

193 ms

Large Inference

178 ms

GTE-BASE

✅

Small Inference

156 ms

Large Inference

149 ms

BGE-BASE

✅

Small Inference

157 ms

Large Inference

144 ms

DISTILROBERTA

✅

Small Inference

208 ms

Large Inference

187 ms

DISTILBERT

✅

Small Inference

136 ms

Large Inference

418 ms

BGE-SMALL

✅

Small Inference

117 ms

Large Inference

114 ms

REGNET

✅

Small Inference

1.2s

Large Inference

1.3s

GTE-SMALL

✅

Small Inference

119 ms

Large Inference

119 ms

E5-BASE

✅

Small Inference

142 ms

Large Inference

163 ms

CONVNEXT-SMALL

✅

Small Inference

1.6s

Large Inference

1.6s

Vision Models

EFFICIENTNET

✅

Small Inference

1.2s

Large Inference

1.2s

VIT

✅

Small Inference

1.6s

Large Inference

1.7s

RESNET

✅

Small Inference

1.3s

Large Inference

1.2s

MOBILENET

✅

Small Inference

1.2s

Large Inference

1.2s

CONVNEXT

✅

Small Inference

1.3s

Large Inference

1.4s

DEIT

✅

Small Inference

1.3s

Large Inference

1.4s

Multimodal Models

CLIP

✅

Small Inference

1.3s

Large Inference

1.4s

LLM Models

LLAMA-3.2-1B

✅

Small Inference

482 ms

Large Inference

10.3s

TINYLLAMA

✅

Small Inference

648 ms

Large Inference

8.8s

QWEN2-0.5B

✅

Small Inference

328 ms

Large Inference

6.4s

DEEPSEEK-CODER-1.3B

✅

Small Inference

3.2s

Large Inference

11.7s

NLP Models

ALBERT

✅

Install Time

1.2 min

1.0s

E5-SMALL

✅

Install Time

27.3s

730 ms

GPT2

✅

Install Time

2.0 min

1.6s

SQUEEZEBERT

✅

Install Time

1.5 min

1.3s

T5

✅

Install Time

1.6 min

1.5s

MINILM

✅

Install Time

1.7 min

1.3s

POOLFORMER

✅

Install Time

57.9s

563 ms

BERT

✅

Install Time

46.0s

1.8s

BEIT

✅

Install Time

2.1 min

2.0s

ROBERTA

✅

Install Time

3.5 min

3.6s

SENTENCE-TRANSFORMERS

✅

Install Time

48.7s

611 ms

BART-BASE

✅

Install Time

5.0 min

3.4s

GTE-BASE

✅

Install Time

1.3 min

1.7s

BGE-BASE

✅

Install Time

48.0s

1.6s

DISTILROBERTA

✅

Install Time

2.3 min

3.1s

DISTILBERT

✅

Install Time

1.7 min

1.5s

BGE-SMALL

✅

Install Time

20.0s

709 ms

REGNET

✅

Install Time

1.3 min

683 ms

GTE-SMALL

✅

Install Time

28.0s

839 ms

E5-BASE

✅

Install Time

1.2 min

1.8s

CONVNEXT-SMALL

✅

Install Time

1.6 min

1.1s

Vision Models

EFFICIENTNET

✅

Install Time

1.0 min

414 ms

VIT

✅

Install Time

2.2 min

1.5s

RESNET

✅

Install Time

1.2 min

815 ms

MOBILENET

✅

Install Time

58.8s

337 ms

CONVNEXT

✅

Install Time

1.3 min

733 ms

DEIT

✅

Install Time

1.3 min

614 ms

Multimodal Models

CLIP

✅

Install Time

2.6 min

2.5s

LLM Models

LLAMA-3.2-1B

✅

Install Time

1.1 min

2.0s

TINYLLAMA

✅

Install Time

53.1s

1.6s

QWEN2-0.5B

✅

Install Time

31.9s

886 ms

DEEPSEEK-CODER-1.3B

✅

Install Time

1.2 min

2.4s

Historical Statistics

26 RUNS

Performance data aggregated across 26 test runs (from 2025-12-26 18:50 to 2026-01-07 20:03)

Total Runs

Historical data points

Kernel Speedup (Historical)

1.01x

Median: 1.01x | StdDev: 0.03

Inference Time Statistics: Mean / Median / StdDev

Model	Mean	Median	StdDev	Range	Runs	Consistency
NLP Models
DEBERTA	102.0 ms	102.0 ms	0.0 ms	102.0 - 102.0 ms	1	Stable
DINOV2	1172.0 ms	1172.0 ms	0.0 ms	1172.0 - 1172.0 ms	1	Stable
EFFICIENTNET-B1	1173.0 ms	1173.0 ms	0.0 ms	1173.0 - 1173.0 ms	1	Stable
EFFICIENTNET-B2	1175.0 ms	1175.0 ms	0.0 ms	1175.0 - 1175.0 ms	1	Stable
ELECTRA	156.0 ms	156.0 ms	0.0 ms	156.0 - 156.0 ms	1	Stable
LEVIT	1283.0 ms	1283.0 ms	0.0 ms	1283.0 - 1283.0 ms	1	Stable
MPNET	128.0 ms	128.0 ms	0.0 ms	128.0 - 128.0 ms	1	Stable
XLM-ROBERTA	87.0 ms	87.0 ms	0.0 ms	87.0 - 87.0 ms	1	Stable
BEIT	1486.6 ms	1489.5 ms	40.5 ms	1407.0 - 1555.0 ms	14	Stable
CONVNEXT-SMALL	1536.1 ms	1543.5 ms	45.1 ms	1460.0 - 1601.0 ms	12	Stable
POOLFORMER	1276.9 ms	1263.5 ms	43.6 ms	1232.0 - 1380.0 ms	14	Stable
REGNET	1288.5 ms	1252.5 ms	81.0 ms	1202.0 - 1463.0 ms	14	Stable
GTE-SMALL	117.2 ms	119.0 ms	7.9 ms	100.0 - 129.0 ms	12	Stable
SQUEEZEBERT	137.1 ms	135.5 ms	9.9 ms	122.0 - 155.0 ms	14	Stable
BGE-SMALL	118.4 ms	121.0 ms	8.7 ms	98.0 - 128.0 ms	12	Stable
MINILM	173.9 ms	178.5 ms	13.4 ms	147.0 - 192.0 ms	14	Stable
BART-BASE	183.1 ms	185.5 ms	15.3 ms	153.0 - 219.0 ms	14	Stable
E5-SMALL	116.2 ms	117.5 ms	10.3 ms	92.0 - 130.0 ms	12	Stable
DISTILROBERTA	192.1 ms	195.5 ms	18.1 ms	163.0 - 217.0 ms	14	Stable
E5-BASE	139.2 ms	141.5 ms	13.8 ms	100.0 - 157.0 ms	12	Stable
ROBERTA	396.7 ms	379.5 ms	46.3 ms	322.0 - 510.0 ms	26	Moderate
BGE-BASE	146.4 ms	150.0 ms	17.2 ms	111.0 - 173.0 ms	12	Moderate
BERT	308.1 ms	295.5 ms	39.3 ms	273.0 - 447.0 ms	26	Moderate
GTE-BASE	151.6 ms	146.0 ms	24.6 ms	133.0 - 223.0 ms	12	Moderate
ALBERT	339.8 ms	307.5 ms	61.4 ms	271.0 - 502.0 ms	24	Moderate
T5	158.5 ms	148.5 ms	39.6 ms	118.0 - 295.0 ms	26	Variable
DISTILBERT	166.3 ms	149.0 ms	45.1 ms	127.0 - 308.0 ms	26	Variable
SENTENCE-TRANSFORMERS	187.5 ms	172.0 ms	53.9 ms	146.0 - 366.0 ms	26	Variable
GPT2	156.3 ms	141.5 ms	56.2 ms	90.0 - 383.0 ms	26	Variable
Vision Models
MOBILENET	1188.4 ms	1178.0 ms	42.8 ms	1150.0 - 1338.0 ms	26	Stable
EFFICIENTNET	1202.5 ms	1188.5 ms	47.3 ms	1142.0 - 1343.0 ms	26	Stable
VIT	1517.8 ms	1498.5 ms	66.6 ms	1342.0 - 1643.0 ms	26	Stable
CONVNEXT	1364.5 ms	1337.0 ms	75.3 ms	1280.0 - 1551.0 ms	26	Stable
DEIT	1326.4 ms	1306.0 ms	80.3 ms	1221.0 - 1550.0 ms	26	Stable
RESNET	1269.8 ms	1232.0 ms	77.6 ms	1185.0 - 1464.0 ms	26	Stable
Multimodal Models
CLIP	1342.8 ms	1319.0 ms	84.2 ms	1258.0 - 1672.0 ms	26	Stable
LLM Models
DEEPSEEK-CODER-1.3B	3330.3 ms	3314.0 ms	305.1 ms	2919.0 - 4118.0 ms	26	Stable
TINYLLAMA	670.2 ms	650.5 ms	70.6 ms	607.0 - 864.0 ms	26	Moderate
QWEN2-0.5B	360.5 ms	351.0 ms	44.3 ms	297.0 - 501.0 ms	26	Moderate
LLAMA-3.2-1B	533.6 ms	511.5 ms	70.6 ms	477.0 - 777.0 ms	26	Moderate
Total Models Tracked:					44

Understanding Consistency

Stable (CV < 10%)

Highly consistent performance across runs

Moderate (CV 10-25%)

Some variance, generally acceptable

Variable (CV > 25%)

High variance, investigate causes

Time Distribution

Model Installation Time

46.6 min

Includes HuggingFace download + ONNX conversion (~99% of total)

Success Rate

Models Tested

Inferences

Total Time

Model Validation Results

NLP Models

Vision Models

Multi-Modal

Inference Performance

Kernel Module Performance

Kernel Module

Test Environment

CPU

Memory

Performance Comparison: Kernel vs Userspace

Small Inference

Large Inference

Overall

Small Inference (Short Inputs)

Large Inference (Long Inputs)

Kernel Module Benefits

Environment Details

Operating System

CPU

Memory

GPU

Runtime Mode

Disk

Resource Usage

MLOS Core (Idle)

MLOS Core (Load)

Axon

GPU Status

Installation & Setup

Axon Download

Core Download

Core Startup

Model Install

Detailed Model Metrics

NLP Models

ALBERT

E5-SMALL

GPT2

SQUEEZEBERT

T5

MINILM

POOLFORMER

BERT

BEIT

ROBERTA

SENTENCE-TRANSFORMERS

BART-BASE

GTE-BASE

BGE-BASE

DISTILROBERTA

DISTILBERT

BGE-SMALL

REGNET

GTE-SMALL

E5-BASE

CONVNEXT-SMALL

Vision Models

EFFICIENTNET

VIT

RESNET

MOBILENET

CONVNEXT

DEIT

Multimodal Models

CLIP

LLM Models

LLAMA-3.2-1B

TINYLLAMA

QWEN2-0.5B

DEEPSEEK-CODER-1.3B

NLP Models

ALBERT

E5-SMALL

GPT2

SQUEEZEBERT