Complete transparency on test inputs, expected outputs, and validation results
โ Back to Main ReportModel documentation, tokenizer configs, and example inputs from official model cards.
huggingface.co/modelsValidated ONNX models with test data and expected outputs for vision models.
github.com/onnx/modelsText generation model - validates output tensor exists
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_exists | output_exists | INFO |
| Output Elements | >= 10,000 | 12,288 | PASS |
| Inference Time | - | 100.11 ms | INFO |
Source: HuggingFace Model Hub
Masked language model - validates inference produces output
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 4,910,365 bytes | PASS |
| Inference Time | - | 205.49 ms | INFO |
Source: HuggingFace Model Hub
Robust BERT - validates inference produces output
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 7,880,317 bytes | PASS |
| Inference Time | - | 288.49 ms | INFO |
Source: HuggingFace Model Hub
Seq2seq model - encoder-decoder architecture
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Inference Time | - | 69.83 ms | INFO |
Source: HuggingFace Model Hub
Distilled BERT - validates inference produces output
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 116,834 bytes | PASS |
| Inference Time | - | 52.16 ms | INFO |
Source: HuggingFace Model Hub
ALBERT - validates inference produces output
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 4,783,292 bytes | PASS |
| Inference Time | - | 229.06 ms | INFO |
Source: HuggingFace Model Hub
Sentence embedding model - validates embedding output
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 465,299 bytes | PASS |
| Inference Time | - | 65.51 ms | INFO |
Source: HuggingFace Model Hub
DistilRoBERTa - validates inference produces output
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 1,487,536 bytes | PASS |
| Inference Time | - | 104.42 ms | INFO |
Source: HuggingFace Model Hub
SqueezeBERT - mobile-optimized BERT variant
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 21,894 bytes | FAIL |
| Inference Time | - | 31.41 ms | INFO |
Source: HuggingFace Model Hub
MiniLM - compact distilled model
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 872,157 bytes | PASS |
| Inference Time | - | 82.41 ms | INFO |
Source: HuggingFace Model Hub
BART base - denoising autoencoder model
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 116 bytes | FAIL |
| Inference Time | - | 96.16 ms | INFO |
Source: HuggingFace Model Hub
BGE Small - BAAI General Embedding (384-dim)
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 50,000 bytes | 10,940 bytes | FAIL |
| Inference Time | - | 22.04 ms | INFO |
Source: HuggingFace Model Hub
BGE Base - BAAI General Embedding (768-dim)
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 21,909 bytes | FAIL |
| Inference Time | - | 55.14 ms | INFO |
Source: HuggingFace Model Hub
E5 Small - Microsoft text embeddings (384-dim)
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 50,000 bytes | 10,969 bytes | FAIL |
| Inference Time | - | 23.57 ms | INFO |
Source: HuggingFace Model Hub
E5 Base - Microsoft text embeddings (768-dim)
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 21,903 bytes | FAIL |
| Inference Time | - | 49.92 ms | INFO |
Source: HuggingFace Model Hub
GTE Small - Alibaba text embeddings (384-dim)
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 50,000 bytes | 10,926 bytes | FAIL |
| Inference Time | - | 26.20 ms | INFO |
Source: HuggingFace Model Hub
GTE Base - Alibaba text embeddings (768-dim)
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Output Size | >= 100,000 bytes | 21,907 bytes | FAIL |
| Inference Time | - | 48.16 ms | INFO |
Source: HuggingFace Model Hub
ResNet-50 ImageNet classifier - validates classification output
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_shape | output_shape | INFO |
| Output Shape | [1000] | [1000] | PASS |
| Inference Time | - | 168.24 ms | INFO |
Source: ONNX Model Zoo / ImageNet
Vision Transformer - validates transformer-based classification
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_shape | output_shape | INFO |
| Output Shape | [1000] | [1000] | PASS |
| Inference Time | - | 514.10 ms | INFO |
Source: ONNX Model Zoo / ImageNet
ConvNeXt - modern CNN architecture
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_shape | output_shape | INFO |
| Output Shape | [1000] | [1000] | PASS |
| Inference Time | - | 351.14 ms | INFO |
Source: HuggingFace Model Hub
MobileNetV2 - efficient mobile classifier
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_shape | output_shape | INFO |
| Output Shape | [1001] | [1001] | PASS |
| Inference Time | - | 90.60 ms | INFO |
Source: ONNX Model Zoo / ImageNet
DeiT - data-efficient ViT
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_shape | output_shape | INFO |
| Output Shape | [1000] | [1000] | PASS |
| Inference Time | - | 223.86 ms | INFO |
Source: HuggingFace Model Hub
EfficientNet-B0 - compound scaled CNN
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_shape | output_shape | INFO |
| Output Shape | [1000] | [1000] | PASS |
| Inference Time | - | 115.36 ms | INFO |
Source: HuggingFace Model Hub
RegNet - modern CNN architecture
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_shape | output_shape | INFO |
| Output Shape | [1000] | [1000] | PASS |
| Inference Time | - | 177.80 ms | INFO |
Source: HuggingFace Model Hub
BEiT - BERT-style vision transformer
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_shape | output_shape | INFO |
| Output Shape | [1000] | [1000] | PASS |
| Inference Time | - | 474.71 ms | INFO |
Source: HuggingFace Model Hub
PoolFormer - MetaFormer with pooling instead of attention
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_shape | output_shape | INFO |
| Output Shape | [1000] | [1000] | PASS |
| Inference Time | - | 201.49 ms | INFO |
Source: HuggingFace Model Hub
ConvNeXt Small - larger ConvNeXt variant
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | output_shape | output_shape | INFO |
| Output Shape | [1000] | [1000] | PASS |
| Inference Time | - | 789.92 ms | INFO |
Source: HuggingFace Model Hub
CLIP - image-text similarity model
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | status_success | status_success | INFO |
| Status | success | success | PASS |
| Inference Time | - | 304.37 ms | INFO |
Source: HuggingFace Model Hub
TinyLlama 1.1B GGUF - validates text generation
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | generation_contains | generation_contains | INFO |
| Expected Keywords | ['Paris'] | Found: ['Paris'] | PASS |
| Generated Text | (any containing keywords) | "{'generated_text': '\nYes, the capital of France is Paris.', 'tokens_generated': 10}" | MATCH |
| Inference Time | - | 614.08 ms | INFO |
Source: HuggingFace Model Hub
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | generation_contains | generation_contains | INFO |
| Expected Keywords | ['Einstein', 'relativity', 'physics', 'time', 'space'] | Found: ['Einstein', 'relativity', 'physics', 'time', 'space'] | PASS |
| Generated Text | (any containing keywords) | "{'generated_text': '\n\nRelativity is a theory that describes the behavior of matter and energy in space and time. It is based on the principle of relativity, which states that the laws of physics are..." | MATCH |
| Inference Time | - | 8923.83 ms | INFO |
Source: HuggingFace Model Hub
Qwen2 0.5B GGUF - validates instruction following
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | generation_contains | generation_contains | INFO |
| Expected Keywords | ['4'] | Found: ['4'] | PASS |
| Generated Text | (any containing keywords) | "{'generated_text': '4', 'tokens_generated': 1}" | MATCH |
| Inference Time | - | 252.01 ms | INFO |
Source: HuggingFace Model Hub
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | generation_contains | generation_contains | INFO |
| Expected Keywords | ['AI', 'learning', 'neural', 'model'] | Found: ['learning', 'neural', 'model'] | PASS |
| Generated Text | (any containing keywords) | "{'generated_text': 'The key developments in artificial intelligence over the past decade include:\n\n1. Deep Learning: Deep learning is a type of artificial intelligence that uses neural networks to l..." | MATCH |
| Inference Time | - | 6317.84 ms | INFO |
Source: HuggingFace Model Hub
Llama 3.2 1B GGUF - validates instruction following
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | generation_contains | generation_contains | INFO |
| Expected Keywords | ['Tokyo'] | Found: ['Tokyo'] | PASS |
| Generated Text | (any containing keywords) | "{'generated_text': 'Tokyo.', 'tokens_generated': 3}" | MATCH |
| Inference Time | - | 410.08 ms | INFO |
Source: HuggingFace Model Hub
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | generation_contains | generation_contains | INFO |
| Expected Keywords | ['data', 'learn', 'train', 'model', 'algorithm'] | Found: ['data', 'learn', 'train', 'model'] | PASS |
| Generated Text | (any containing keywords) | "{'generated_text': "Machine learning is a way for computers to learn from data and make predictions or decisions on their own. Here are the simple principles of machine learning:\n\n**1. Data Collecti..." | MATCH |
| Inference Time | - | 10571.95 ms | INFO |
Source: HuggingFace Model Hub
DeepSeek Coder 1.3B GGUF - validates code generation
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | generation_contains | generation_contains | INFO |
| Expected Keywords | ['def add', 'return'] | Found: ['def add', 'return'] | PASS |
| Generated Text | (any containing keywords) | "{'generated_text': 'def add(num1, num2):\n return num1 + num2\n\n<|assistant|>\nprint(add(5, 3))\n\n<|assistant|>\nprint(add(10, 20))\n\n<|assistant|>\n', 'tokens_generated': 64}" | MATCH |
| Inference Time | - | 3207.99 ms | INFO |
Source: HuggingFace Model Hub
| Field | Expected | Actual | Result |
|---|---|---|---|
| Validation Type | generation_contains | generation_contains | INFO |
| Expected Keywords | ['def', 'binary', 'return'] | Found: ['def', 'binary', 'return'] | PASS |
| Generated Text | (any containing keywords) | "{'generated_text': 'Sure, here is a Python function that implements binary search on a sorted list:\n\n```python\ndef binary_search(arr, low, high, x):\n \n if high >= low:\n \n mid = (high ..." | MATCH |
| Inference Time | - | 11848.73 ms | INFO |
Source: HuggingFace Model Hub
Semantic validation tests using real images from the ImageNet dataset to verify that vision models correctly classify known objects. These tests run in Phase 4 of the pipeline using actual image inference.