RAG Retrieval Benchmark

1116 scored results · 36 configurations (3 chunkers × 3 embedders × 4 retrieval strategies) · 31 questions across 4 types · corpus: 69 ArXiv papers · evaluated with RAGAS via Groq openai/gpt-oss-120b

Top 3 Configurations

#1
Chunker
Sentence
Embedder
OpenAI
Retrieval
Semantic
0.818
Context Precision
0.793
Context Recall
0.769
Faithfulness
0.894
Answer Relevance
0.814
Avg Retrieval
269 ms
#2
Chunker
Fixed-size
Embedder
BGE-M3
Retrieval
Semantic
0.816
Context Precision
0.810
Context Recall
0.856
Faithfulness
0.826
Answer Relevance
0.773
Avg Retrieval
215 ms
#3
Chunker
Fixed-size
Embedder
OpenAI
Retrieval
Hybrid (RRF)
0.814
Context Precision
0.822
Context Recall
0.778
Faithfulness
0.837
Answer Relevance
0.819
Avg Retrieval
218 ms

Retrieval Quality vs. Per-Query Latency

All 36 configurations · hover any dot for details

0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0 100 200 300 400 500 600 700 Per-query retrieval latency (ms)* Overall score 1 2 3 Fixed-size chunker Sentence chunker Semantic chunker * Per-query retrieval latency only — excludes index-build time (chunking and embedding).

Aggregate Results by Pipeline Component

Scores below are averaged across all configurations sharing that component. The scatter plot above shows per-configuration variance.

Chunking Strategy

Fixed-size 0.770
Sentence 0.767
Semantic 0.586

Embedding Model

OpenAI 0.732
BGE-M3 0.712
MiniLM 0.677

Retrieval Strategy

Reranker 0.729  ·  465 ms avg
Hybrid (RRF) 0.716  ·  127 ms avg
Semantic 0.700  ·  153 ms avg
BM25 0.685  ·  27 ms avg

All Configurations

# Chunker Embedder Retriever Overall Context Precision Context Recall Faithfulness Answer Relevance Latency (ms)
1 Sentence OpenAI Semantic 0.818 0.793 0.769 0.894 0.814 269.8
2 Fixed-size BGE-M3 Semantic 0.816 0.810 0.856 0.826 0.773 215.9
3 Fixed-size OpenAI Hybrid (RRF) 0.814 0.822 0.778 0.837 0.819 218.8
4 Fixed-size OpenAI Semantic 0.809 0.786 0.778 0.861 0.811 249.5
5 Sentence OpenAI Hybrid (RRF) 0.806 0.771 0.754 0.881 0.817 197.4
6 Fixed-size OpenAI Reranker 0.798 0.753 0.756 0.874 0.810 606.4
7 Sentence BGE-M3 Hybrid (RRF) 0.797 0.803 0.737 0.883 0.767 88.8
8 Sentence OpenAI Reranker 0.795 0.770 0.765 0.840 0.806 664.7
9 Sentence MiniLM BM25 0.788 0.717 0.754 0.920 0.761 8.2
10 Sentence OpenAI BM25 0.787 0.714 0.706 0.906 0.821 6.4
11 Fixed-size BGE-M3 Hybrid (RRF) 0.783 0.838 0.689 0.837 0.768 96.8
12 Fixed-size MiniLM Reranker 0.781 0.727 0.735 0.898 0.764 377.1
13 Fixed-size BGE-M3 Reranker 0.767 0.736 0.747 0.816 0.771 533.9
14 Sentence BGE-M3 Reranker 0.764 0.720 0.725 0.846 0.766 414.0
15 Sentence BGE-M3 BM25 0.762 0.690 0.709 0.875 0.772 4.4
16 Sentence MiniLM Reranker 0.761 0.677 0.735 0.875 0.758 341.5
17 Sentence BGE-M3 Semantic 0.760 0.722 0.720 0.839 0.760 171.8
18 Fixed-size MiniLM BM25 0.755 0.725 0.709 0.830 0.757 9.3
19 Fixed-size OpenAI BM25 0.754 0.726 0.677 0.800 0.813 19.5
20 Fixed-size MiniLM Hybrid (RRF) 0.746 0.672 0.746 0.787 0.779 20.7
21 Fixed-size BGE-M3 BM25 0.739 0.721 0.620 0.858 0.759 11.9
22 Sentence MiniLM Hybrid (RRF) 0.722 0.598 0.701 0.794 0.794 22.9
23 Fixed-size MiniLM Semantic 0.677 0.538 0.604 0.756 0.810 20.6
24 Semantic OpenAI Reranker 0.650 0.621 0.280 0.888 0.811 560.3
25 Sentence MiniLM Semantic 0.641 0.473 0.483 0.823 0.784 13.3
26 Semantic BGE-M3 Reranker 0.627 0.597 0.342 0.821 0.748 411.1
27 Semantic MiniLM Reranker 0.618 0.511 0.280 0.869 0.810 279.2
28 Semantic OpenAI Hybrid (RRF) 0.615 0.551 0.247 0.874 0.789 289.6
29 Semantic BGE-M3 Semantic 0.609 0.498 0.317 0.867 0.753 186.6
30 Semantic OpenAI Semantic 0.608 0.448 0.289 0.884 0.811 233.5
31 Semantic BGE-M3 Hybrid (RRF) 0.598 0.544 0.223 0.883 0.742 134.8
32 Semantic MiniLM Hybrid (RRF) 0.562 0.325 0.205 0.882 0.837 72.3
33 Semantic MiniLM Semantic 0.562 0.364 0.205 0.849 0.831 16.2
34 Semantic OpenAI BM25 0.531 0.342 0.163 0.832 0.787 63.4
35 Semantic BGE-M3 BM25 0.529 0.388 0.168 0.840 0.722 57.2
36 Semantic MiniLM BM25 0.518 0.339 0.161 0.857 0.714 63.1
# Chunker Embedder Retriever Overall Context Precision Context Recall Faithfulness Answer Relevance Latency (ms)
1 Fixed-size BGE-M3 Semantic 0.858 0.887 0.922 0.844 0.779 399.4
2 Fixed-size OpenAI Hybrid (RRF) 0.842 0.880 0.781 0.890 0.816 222.9
3 Sentence OpenAI Semantic 0.838 0.832 0.796 0.910 0.815 372.2
4 Sentence BGE-M3 Hybrid (RRF) 0.833 0.841 0.797 0.915 0.777 85.5
5 Fixed-size BGE-M3 Hybrid (RRF) 0.828 0.864 0.760 0.913 0.776 88.1
6 Fixed-size OpenAI Reranker 0.827 0.799 0.831 0.869 0.810 725.1
7 Fixed-size OpenAI Semantic 0.821 0.789 0.806 0.887 0.803 368.9
8 Sentence OpenAI Hybrid (RRF) 0.805 0.762 0.768 0.884 0.807 199.9
9 Sentence BGE-M3 Semantic 0.797 0.763 0.769 0.904 0.752 306.6
10 Sentence OpenAI BM25 0.795 0.721 0.682 0.964 0.814 5.7
11 Fixed-size MiniLM Reranker 0.793 0.691 0.842 0.889 0.750 517.5
12 Sentence OpenAI Reranker 0.793 0.754 0.781 0.834 0.804 748.3
13 Fixed-size BGE-M3 Reranker 0.778 0.713 0.853 0.765 0.781 681.9
14 Fixed-size MiniLM BM25 0.778 0.801 0.732 0.836 0.743 7.3
15 Fixed-size BGE-M3 BM25 0.776 0.803 0.651 0.882 0.769 9.7
16 Fixed-size OpenAI BM25 0.776 0.833 0.668 0.789 0.814 17.3
17 Sentence MiniLM BM25 0.776 0.709 0.701 0.955 0.741 5.9
18 Fixed-size MiniLM Hybrid (RRF) 0.770 0.665 0.856 0.787 0.772 19.4
19 Sentence BGE-M3 Reranker 0.762 0.642 0.755 0.873 0.778 560.0
20 Sentence BGE-M3 BM25 0.761 0.670 0.668 0.930 0.776 3.5
21 Sentence MiniLM Reranker 0.747 0.618 0.752 0.862 0.754 478.6
22 Semantic BGE-M3 Semantic 0.684 0.602 0.479 0.893 0.764 351.4
23 Sentence MiniLM Hybrid (RRF) 0.680 0.607 0.581 0.754 0.777 22.1
24 Semantic OpenAI Reranker 0.661 0.617 0.313 0.931 0.782 649.2
25 Fixed-size MiniLM Semantic 0.654 0.483 0.567 0.760 0.805 27.1
26 Semantic BGE-M3 Reranker 0.639 0.598 0.395 0.833 0.731 541.5
27 Semantic OpenAI Semantic 0.626 0.454 0.383 0.873 0.795 312.3
28 Semantic BGE-M3 Hybrid (RRF) 0.617 0.616 0.277 0.848 0.725 126.3
29 Semantic MiniLM Reranker 0.617 0.542 0.292 0.858 0.777 377.3
30 Sentence MiniLM Semantic 0.615 0.421 0.378 0.869 0.792 12.5
31 Semantic OpenAI Hybrid (RRF) 0.574 0.382 0.241 0.869 0.805 316.3
32 Semantic OpenAI BM25 0.556 0.333 0.237 0.858 0.797 55.0
33 Semantic BGE-M3 BM25 0.536 0.321 0.237 0.868 0.719 49.2
34 Semantic MiniLM BM25 0.527 0.342 0.209 0.831 0.725 53.2
35 Semantic MiniLM Hybrid (RRF) 0.517 0.193 0.181 0.845 0.849 63.5
36 Semantic MiniLM Semantic 0.516 0.243 0.141 0.856 0.823 25.4
# Chunker Embedder Retriever Overall Context Precision Context Recall Faithfulness Answer Relevance Latency (ms)
1 Sentence OpenAI BM25 0.867 0.867 0.850 0.929 0.822 6.6
2 Sentence MiniLM BM25 0.860 0.872 0.841 0.932 0.795 8.8
3 Sentence BGE-M3 Hybrid (RRF) 0.850 0.900 0.831 0.892 0.777 91.2
4 Sentence BGE-M3 BM25 0.848 0.841 0.850 0.918 0.783 4.3
5 Fixed-size BGE-M3 Semantic 0.839 0.879 0.844 0.850 0.784 87.0
6 Sentence MiniLM Reranker 0.833 0.706 0.892 0.971 0.761 253.7
7 Fixed-size MiniLM Reranker 0.831 0.859 0.799 0.902 0.765 270.9
8 Sentence OpenAI Hybrid (RRF) 0.828 0.844 0.774 0.866 0.828 193.4
9 Fixed-size OpenAI Hybrid (RRF) 0.827 0.886 0.885 0.701 0.834 200.3
10 Sentence BGE-M3 Reranker 0.826 0.788 0.819 0.914 0.781 323.4
11 Fixed-size BGE-M3 Hybrid (RRF) 0.823 0.899 0.813 0.793 0.785 93.8
12 Fixed-size BGE-M3 Reranker 0.823 0.841 0.804 0.865 0.781 433.4
13 Sentence OpenAI Reranker 0.822 0.773 0.828 0.870 0.818 517.1
14 Sentence OpenAI Semantic 0.817 0.776 0.803 0.863 0.827 212.1
15 Fixed-size OpenAI Reranker 0.815 0.827 0.715 0.897 0.820 552.1
16 Fixed-size OpenAI Semantic 0.815 0.851 0.831 0.762 0.814 179.6
17 Sentence MiniLM Hybrid (RRF) 0.810 0.669 0.889 0.855 0.826 21.5
18 Fixed-size MiniLM BM25 0.805 0.751 0.811 0.873 0.787 8.5
19 Fixed-size OpenAI BM25 0.804 0.704 0.861 0.835 0.816 22.5
20 Fixed-size MiniLM Hybrid (RRF) 0.800 0.809 0.780 0.812 0.798 17.2
21 Sentence BGE-M3 Semantic 0.786 0.735 0.835 0.795 0.778 84.2
22 Fixed-size BGE-M3 BM25 0.784 0.729 0.736 0.896 0.776 11.3
23 Fixed-size MiniLM Semantic 0.736 0.677 0.655 0.782 0.832 14.8
24 Sentence MiniLM Semantic 0.681 0.485 0.635 0.826 0.779 10.0
25 Semantic OpenAI Reranker 0.651 0.569 0.325 0.869 0.843 503.5
26 Semantic OpenAI Hybrid (RRF) 0.650 0.620 0.283 0.928 0.770 264.8
27 Semantic MiniLM Semantic 0.627 0.467 0.327 0.846 0.869 10.7
28 Semantic MiniLM Hybrid (RRF) 0.621 0.388 0.327 0.928 0.840 72.6
29 Semantic MiniLM Reranker 0.603 0.436 0.238 0.878 0.860 202.1
30 Semantic BGE-M3 Reranker 0.598 0.504 0.310 0.798 0.781 351.8
31 Semantic OpenAI Semantic 0.598 0.294 0.291 0.954 0.854 183.3
32 Semantic BGE-M3 Hybrid (RRF) 0.593 0.454 0.246 0.906 0.767 136.1
33 Semantic BGE-M3 Semantic 0.555 0.324 0.270 0.863 0.762 82.9
34 Semantic BGE-M3 BM25 0.513 0.300 0.179 0.810 0.763 58.0
35 Semantic OpenAI BM25 0.512 0.198 0.179 0.848 0.821 64.8
36 Semantic MiniLM BM25 0.499 0.156 0.193 0.947 0.699 68.7
# Chunker Embedder Retriever Overall Context Precision Context Recall Faithfulness Answer Relevance Latency (ms)
1 Fixed-size OpenAI Semantic 0.782 0.698 0.663 0.946 0.821 173.5
2 Sentence OpenAI Semantic 0.781 0.747 0.678 0.904 0.796 206.7
3 Sentence OpenAI Hybrid (RRF) 0.779 0.691 0.706 0.897 0.821 198.9
4 Sentence OpenAI Reranker 0.764 0.794 0.656 0.811 0.796 768.6
5 Fixed-size OpenAI Hybrid (RRF) 0.750 0.640 0.634 0.921 0.805 238.6
6 Fixed-size OpenAI Reranker 0.728 0.580 0.682 0.851 0.799 540.6
7 Fixed-size BGE-M3 Semantic 0.714 0.588 0.756 0.765 0.748 98.5
8 Sentence MiniLM BM25 0.714 0.529 0.732 0.846 0.750 10.6
9 Fixed-size MiniLM Reranker 0.696 0.620 0.469 0.908 0.788 291.2
10 Sentence MiniLM Reranker 0.696 0.743 0.505 0.774 0.760 246.0
11 Sentence BGE-M3 Reranker 0.690 0.767 0.554 0.714 0.726 324.3
12 Sentence MiniLM Hybrid (RRF) 0.682 0.492 0.667 0.786 0.782 20.5
13 Fixed-size BGE-M3 Reranker 0.679 0.640 0.495 0.842 0.740 465.7
14 Sentence BGE-M3 Hybrid (RRF) 0.670 0.612 0.513 0.818 0.738 88.6
15 Sentence OpenAI BM25 0.669 0.505 0.562 0.776 0.833 7.7
16 Sentence BGE-M3 Semantic 0.665 0.637 0.487 0.786 0.748 84.6
17 Fixed-size BGE-M3 Hybrid (RRF) 0.655 0.714 0.408 0.763 0.735 97.3
18 Fixed-size MiniLM BM25 0.652 0.562 0.539 0.764 0.743 10.1
19 Fixed-size OpenAI BM25 0.652 0.570 0.455 0.775 0.807 21.2
20 Sentence BGE-M3 BM25 0.651 0.529 0.599 0.727 0.750 4.6
21 Fixed-size MiniLM Semantic 0.643 0.456 0.605 0.717 0.792 11.6
22 Semantic BGE-M3 Reranker 0.643 0.715 0.290 0.832 0.735 281.3
23 Semantic OpenAI Hybrid (RRF) 0.640 0.751 0.211 0.812 0.787 281.3
24 Semantic MiniLM Reranker 0.637 0.557 0.312 0.877 0.800 201.5
25 Fixed-size MiniLM Hybrid (RRF) 0.636 0.508 0.513 0.756 0.767 18.7
26 Sentence MiniLM Semantic 0.633 0.548 0.468 0.740 0.776 10.3
27 Semantic OpenAI Reranker 0.630 0.695 0.165 0.840 0.820 469.3
28 Fixed-size BGE-M3 BM25 0.618 0.570 0.418 0.767 0.718 15.7
29 Semantic OpenAI Semantic 0.589 0.636 0.124 0.814 0.784 188.4
30 Semantic BGE-M3 Hybrid (RRF) 0.572 0.536 0.101 0.911 0.738 144.8
31 Semantic MiniLM Hybrid (RRF) 0.565 0.471 0.089 0.886 0.812 78.4
32 Semantic MiniLM Semantic 0.559 0.438 0.157 0.843 0.798 9.5
33 Semantic BGE-M3 Semantic 0.548 0.545 0.100 0.826 0.722 81.2
34 Semantic BGE-M3 BM25 0.539 0.615 0.036 0.832 0.674 65.8
35 Semantic MiniLM BM25 0.527 0.571 0.036 0.787 0.714 69.3
36 Semantic OpenAI BM25 0.513 0.543 0.018 0.766 0.726 72.6
# Chunker Embedder Retriever Overall Context Precision Context Recall Faithfulness Answer Relevance Latency (ms)
1 Fixed-size BGE-M3 BM25 <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 13.9
2 Fixed-size BGE-M3 Hybrid (RRF) <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 138.8
3 Fixed-size BGE-M3 Reranker <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 402.4
4 Fixed-size BGE-M3 Semantic <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 143.0
5 Fixed-size MiniLM BM25 <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 17.9
6 Fixed-size MiniLM Hybrid (RRF) <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 41.6
7 Fixed-size MiniLM Reranker <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 335.0
8 Fixed-size MiniLM Semantic <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 33.2
9 Fixed-size OpenAI BM25 <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 15.2
10 Fixed-size OpenAI Hybrid (RRF) <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 211.1
11 Fixed-size OpenAI Reranker <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 447.6
12 Fixed-size OpenAI Semantic <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 159.5
13 Semantic BGE-M3 BM25 <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 67.2
14 Semantic BGE-M3 Hybrid (RRF) <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 141.3
15 Semantic BGE-M3 Reranker <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 370.2
16 Semantic BGE-M3 Semantic <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 84.0
17 Semantic MiniLM BM25 <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 71.4
18 Semantic MiniLM Hybrid (RRF) <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 92.3
19 Semantic MiniLM Reranker <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 299.5
20 Semantic MiniLM Semantic <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 11.2
21 Semantic OpenAI BM25 <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 70.9
22 Semantic OpenAI Hybrid (RRF) <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 277.0
23 Semantic OpenAI Reranker <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 587.1
24 Semantic OpenAI Semantic <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 173.9
25 Sentence BGE-M3 BM25 <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 7.2
26 Sentence BGE-M3 Hybrid (RRF) <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 95.2
27 Sentence BGE-M3 Reranker <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 311.3
28 Sentence BGE-M3 Semantic <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 98.7
29 Sentence MiniLM BM25 <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 10.1
30 Sentence MiniLM Hybrid (RRF) <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 36.3
31 Sentence MiniLM Reranker <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 279.4
32 Sentence MiniLM Semantic <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 33.4
33 Sentence OpenAI BM25 <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 6.2
34 Sentence OpenAI Hybrid (RRF) <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 195.2
35 Sentence OpenAI Reranker <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 530.3
36 Sentence OpenAI Semantic <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> <span class="dash">—</span> 179.9

Corpus: 69 ArXiv papers on RAG, dense retrieval, and embedding models. Questions: 31 questions across four types — factual, conceptual, multi-hop, and unanswerable. Configurations: 3 chunkers × 3 embedders × 4 retrieval strategies = 36 configurations, each evaluated across all 31 questions (1116 scored results total). Metrics: Context Precision, Context Recall, and Faithfulness are LLM-judged via RAGAS; Answer Relevance is cosine similarity (no LLM call). All metrics are 0–1, higher is better. Green cells mark the best value per column. RAGAS scores are model-dependent — absolute values may shift with a different judge model, but relative rankings between configurations are stable. Judge: Groq openai/gpt-oss-120b at temperature 0. Generated 2026-04-10 13:53 UTC.