Axon v3.1.9 + MLOS Core 7.0.0
Performance metrics from E2E tests run with the MLOS kernel module (mlos-ml.ko) loaded. The kernel module provides zero-copy tensor operations, ML-aware scheduling, and optimized memory management.
Kernel faster: 8 models | Userspace faster: 22 models
| Model | Kernel | Userspace | Speedup |
|---|---|---|---|
| NLP Models | |||
| ALBERT | 331.0 ms ▼ | 456.0 ms ▲ | +38.0% |
| ROBERTA | 382.0 ms ▼ | 412.0 ms ▲ | +8.0% |
| GTE-BASE | 149.0 ms ▼ | 156.0 ms ▲ | +5.0% |
| GTE-SMALL | 116.0 ms ▼ | 119.0 ms ▲ | +3.0% |
| E5-SMALL | 124.0 ms ▼ | 125.0 ms ▲ | +1.0% |
| REGNET | 1243.0 ms ▲ | 1241.0 ms ▼ | 0.0% |
| POOLFORMER | 1272.0 ms ▲ | 1239.0 ms ▼ | -3.0% |
| BEIT | 1540.0 ms ▲ | 1485.0 ms ▼ | -4.0% |
| BGE-SMALL | 122.0 ms ▲ | 117.0 ms ▼ | -4.0% |
| DISTILROBERTA | 217.0 ms ▲ | 208.0 ms ▼ | -4.0% |
| SENTENCE-TRANSFORMERS | 172.0 ms ▲ | 165.0 ms ▼ | -4.0% |
| BGE-BASE | 166.0 ms ▲ | 157.0 ms ▼ | -5.0% |
| E5-BASE | 150.0 ms ▲ | 142.0 ms ▼ | -5.0% |
| BART-BASE | 206.0 ms ▲ | 193.0 ms ▼ | -6.0% |
| SQUEEZEBERT | 140.0 ms ▲ | 131.0 ms ▼ | -6.0% |
| MINILM | 187.0 ms ▲ | 174.0 ms ▼ | -7.0% |
| BERT | 303.0 ms ▲ | 273.0 ms ▼ | -10.0% |
| DISTILBERT | 156.0 ms ▲ | 136.0 ms ▼ | -13.0% |
| T5 | 168.0 ms ▲ | 145.0 ms ▼ | -14.0% |
| CONVNEXT-SMALL | 1849.0 ms ▲ | 1557.0 ms ▼ | -16.0% |
| GPT2 | 199.0 ms ▲ | 145.0 ms ▼ | -27.0% |
| Vision Models | |||
| RESNET | 1233.0 ms ▼ | 1295.0 ms ▲ | +5.0% |
| DEIT | 1284.0 ms ▼ | 1312.0 ms ▲ | +2.0% |
| EFFICIENTNET | 1186.0 ms ▼ | 1202.0 ms ▲ | +1.0% |
| MOBILENET | 1172.0 ms ▲ | 1168.0 ms ▼ | 0.0% |
| VIT | 1577.0 ms ▲ | 1554.0 ms ▼ | -1.0% |
| CONVNEXT | 1412.0 ms ▲ | 1344.0 ms ▼ | -5.0% |
| Multimodal Models | |||
| CLIP | 1369.0 ms ▲ | 1311.0 ms ▼ | -4.0% |
| LLM Models | |||
| DEEPSEEK-CODER-1.3B | 3289.0 ms ▲ | 3247.0 ms ▼ | -1.0% |
| LLAMA-3.2-1B | 505.0 ms ▲ | 482.0 ms ▼ | -5.0% |
| TINYLLAMA | 707.0 ms ▲ | 648.0 ms ▼ | -8.0% |
| QWEN2-0.5B | 368.0 ms ▲ | 328.0 ms ▼ | -11.0% |
| Average: | 0.97x | ||
Kernel faster: 15 models | Userspace faster: 16 models
| Model | Kernel | Userspace | Speedup |
|---|---|---|---|
| NLP Models | |||
| DISTILBERT | 268.0 ms ▼ | 418.0 ms ▲ | +56.0% |
| MINILM | 156.0 ms ▼ | 181.0 ms ▲ | +16.0% |
| E5-SMALL | 108.0 ms ▼ | 118.0 ms ▲ | +9.0% |
| ROBERTA | 1705.0 ms ▼ | 1862.0 ms ▲ | +9.0% |
| E5-BASE | 151.0 ms ▼ | 163.0 ms ▲ | +8.0% |
| BGE-BASE | 134.0 ms ▼ | 144.0 ms ▲ | +7.0% |
| REGNET | 1228.0 ms ▼ | 1289.0 ms ▲ | +5.0% |
| T5 | 208.0 ms ▼ | 219.0 ms ▲ | +5.0% |
| GTE-BASE | 144.0 ms ▼ | 149.0 ms ▲ | +3.0% |
| SENTENCE-TRANSFORMERS | 178.0 ms ▼ | 183.0 ms ▲ | +3.0% |
| GTE-SMALL | 117.0 ms ▼ | 119.0 ms ▲ | +2.0% |
| CONVNEXT-SMALL | 1590.0 ms ▲ | 1552.0 ms ▼ | -2.0% |
| POOLFORMER | 1320.0 ms ▲ | 1288.0 ms ▼ | -2.0% |
| SQUEEZEBERT | 146.0 ms ▲ | 141.0 ms ▼ | -3.0% |
| BEIT | 1517.0 ms ▲ | 1460.0 ms ▼ | -4.0% |
| BERT | 1314.0 ms ▲ | 1246.0 ms ▼ | -5.0% |
| BGE-SMALL | 120.0 ms ▲ | 114.0 ms ▼ | -5.0% |
| BART-BASE | 189.0 ms ▲ | 178.0 ms ▼ | -6.0% |
| GPT2 | 380.0 ms ▲ | 357.0 ms ▼ | -6.0% |
| ALBERT | 1546.0 ms ▲ | 1439.0 ms ▼ | -7.0% |
| DISTILROBERTA | 201.0 ms ▲ | 187.0 ms ▼ | -7.0% |
| Vision Models | |||
| VIT | 1591.0 ms ▼ | 1725.0 ms ▲ | +8.0% |
| DEIT | 1301.0 ms ▼ | 1359.0 ms ▲ | +4.0% |
| CONVNEXT | 1352.0 ms ▼ | 1367.0 ms ▲ | +1.0% |
| MOBILENET | 1186.0 ms ▲ | 1170.0 ms ▼ | -1.0% |
| EFFICIENTNET | 1205.0 ms ▲ | 1181.0 ms ▼ | -2.0% |
| RESNET | 1466.0 ms ▲ | 1232.0 ms ▼ | -16.0% |
| Multimodal Models | |||
| CLIP | 1360.0 ms ▼ | 1388.0 ms ▲ | +2.0% |
| LLM Models | |||
| QWEN2-0.5B | 6392.0 ms ▼ | 6414.0 ms ▲ | 0.0% |
| DEEPSEEK-CODER-1.3B | 11937.0 ms ▲ | 11691.0 ms ▼ | -2.0% |
| TINYLLAMA | 9022.0 ms ▲ | 8838.0 ms ▼ | -2.0% |
| LLAMA-3.2-1B | 10667.0 ms ▲ | 10322.0 ms ▼ | -3.0% |
| Average: | 1.02x | ||
Direct GPU memory access without CPU copies
Priority-based inference queue management
LRU eviction and memory pool optimization
Kernel-level inference isolation
Kernel test timestamp: 2026-01-07T20:03:57.868110Z
Not used (CPU-only inference)
Performance data aggregated across 26 test runs (from 2025-12-26 18:50 to 2026-01-07 20:03)
| Model | Mean | Median | StdDev | Range | Runs | Consistency |
|---|---|---|---|---|---|---|
| NLP Models | ||||||
| DEBERTA | 102.0 ms | 102.0 ms | 0.0 ms | 102.0 - 102.0 ms | 1 | Stable |
| DINOV2 | 1172.0 ms | 1172.0 ms | 0.0 ms | 1172.0 - 1172.0 ms | 1 | Stable |
| EFFICIENTNET-B1 | 1173.0 ms | 1173.0 ms | 0.0 ms | 1173.0 - 1173.0 ms | 1 | Stable |
| EFFICIENTNET-B2 | 1175.0 ms | 1175.0 ms | 0.0 ms | 1175.0 - 1175.0 ms | 1 | Stable |
| ELECTRA | 156.0 ms | 156.0 ms | 0.0 ms | 156.0 - 156.0 ms | 1 | Stable |
| LEVIT | 1283.0 ms | 1283.0 ms | 0.0 ms | 1283.0 - 1283.0 ms | 1 | Stable |
| MPNET | 128.0 ms | 128.0 ms | 0.0 ms | 128.0 - 128.0 ms | 1 | Stable |
| XLM-ROBERTA | 87.0 ms | 87.0 ms | 0.0 ms | 87.0 - 87.0 ms | 1 | Stable |
| BEIT | 1486.6 ms | 1489.5 ms | 40.5 ms | 1407.0 - 1555.0 ms | 14 | Stable |
| CONVNEXT-SMALL | 1536.1 ms | 1543.5 ms | 45.1 ms | 1460.0 - 1601.0 ms | 12 | Stable |
| POOLFORMER | 1276.9 ms | 1263.5 ms | 43.6 ms | 1232.0 - 1380.0 ms | 14 | Stable |
| REGNET | 1288.5 ms | 1252.5 ms | 81.0 ms | 1202.0 - 1463.0 ms | 14 | Stable |
| GTE-SMALL | 117.2 ms | 119.0 ms | 7.9 ms | 100.0 - 129.0 ms | 12 | Stable |
| SQUEEZEBERT | 137.1 ms | 135.5 ms | 9.9 ms | 122.0 - 155.0 ms | 14 | Stable |
| BGE-SMALL | 118.4 ms | 121.0 ms | 8.7 ms | 98.0 - 128.0 ms | 12 | Stable |
| MINILM | 173.9 ms | 178.5 ms | 13.4 ms | 147.0 - 192.0 ms | 14 | Stable |
| BART-BASE | 183.1 ms | 185.5 ms | 15.3 ms | 153.0 - 219.0 ms | 14 | Stable |
| E5-SMALL | 116.2 ms | 117.5 ms | 10.3 ms | 92.0 - 130.0 ms | 12 | Stable |
| DISTILROBERTA | 192.1 ms | 195.5 ms | 18.1 ms | 163.0 - 217.0 ms | 14 | Stable |
| E5-BASE | 139.2 ms | 141.5 ms | 13.8 ms | 100.0 - 157.0 ms | 12 | Stable |
| ROBERTA | 396.7 ms | 379.5 ms | 46.3 ms | 322.0 - 510.0 ms | 26 | Moderate |
| BGE-BASE | 146.4 ms | 150.0 ms | 17.2 ms | 111.0 - 173.0 ms | 12 | Moderate |
| BERT | 308.1 ms | 295.5 ms | 39.3 ms | 273.0 - 447.0 ms | 26 | Moderate |
| GTE-BASE | 151.6 ms | 146.0 ms | 24.6 ms | 133.0 - 223.0 ms | 12 | Moderate |
| ALBERT | 339.8 ms | 307.5 ms | 61.4 ms | 271.0 - 502.0 ms | 24 | Moderate |
| T5 | 158.5 ms | 148.5 ms | 39.6 ms | 118.0 - 295.0 ms | 26 | Variable |
| DISTILBERT | 166.3 ms | 149.0 ms | 45.1 ms | 127.0 - 308.0 ms | 26 | Variable |
| SENTENCE-TRANSFORMERS | 187.5 ms | 172.0 ms | 53.9 ms | 146.0 - 366.0 ms | 26 | Variable |
| GPT2 | 156.3 ms | 141.5 ms | 56.2 ms | 90.0 - 383.0 ms | 26 | Variable |
| Vision Models | ||||||
| MOBILENET | 1188.4 ms | 1178.0 ms | 42.8 ms | 1150.0 - 1338.0 ms | 26 | Stable |
| EFFICIENTNET | 1202.5 ms | 1188.5 ms | 47.3 ms | 1142.0 - 1343.0 ms | 26 | Stable |
| VIT | 1517.8 ms | 1498.5 ms | 66.6 ms | 1342.0 - 1643.0 ms | 26 | Stable |
| CONVNEXT | 1364.5 ms | 1337.0 ms | 75.3 ms | 1280.0 - 1551.0 ms | 26 | Stable |
| DEIT | 1326.4 ms | 1306.0 ms | 80.3 ms | 1221.0 - 1550.0 ms | 26 | Stable |
| RESNET | 1269.8 ms | 1232.0 ms | 77.6 ms | 1185.0 - 1464.0 ms | 26 | Stable |
| Multimodal Models | ||||||
| CLIP | 1342.8 ms | 1319.0 ms | 84.2 ms | 1258.0 - 1672.0 ms | 26 | Stable |
| LLM Models | ||||||
| DEEPSEEK-CODER-1.3B | 3330.3 ms | 3314.0 ms | 305.1 ms | 2919.0 - 4118.0 ms | 26 | Stable |
| TINYLLAMA | 670.2 ms | 650.5 ms | 70.6 ms | 607.0 - 864.0 ms | 26 | Moderate |
| QWEN2-0.5B | 360.5 ms | 351.0 ms | 44.3 ms | 297.0 - 501.0 ms | 26 | Moderate |
| LLAMA-3.2-1B | 533.6 ms | 511.5 ms | 70.6 ms | 477.0 - 777.0 ms | 26 | Moderate |
| Total Models Tracked: | 44 | |||||
Highly consistent performance across runs
Some variance, generally acceptable
High variance, investigate causes
Includes HuggingFace download + ONNX conversion (~99% of total)