All 36 configurations · hover any dot for details
Scores below are averaged across all configurations sharing that component. The scatter plot above shows per-configuration variance.
| # | Chunker | Embedder | Retriever | Overall | Context Precision | Context Recall | Faithfulness | Answer Relevance | Latency (ms) |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Sentence | OpenAI | Semantic | 0.818 | 0.793 | 0.769 | 0.894 | 0.814 | 269.8 |
| 2 | Fixed-size | BGE-M3 | Semantic | 0.816 | 0.810 | 0.856 | 0.826 | 0.773 | 215.9 |
| 3 | Fixed-size | OpenAI | Hybrid (RRF) | 0.814 | 0.822 | 0.778 | 0.837 | 0.819 | 218.8 |
| 4 | Fixed-size | OpenAI | Semantic | 0.809 | 0.786 | 0.778 | 0.861 | 0.811 | 249.5 |
| 5 | Sentence | OpenAI | Hybrid (RRF) | 0.806 | 0.771 | 0.754 | 0.881 | 0.817 | 197.4 |
| 6 | Fixed-size | OpenAI | Reranker | 0.798 | 0.753 | 0.756 | 0.874 | 0.810 | 606.4 |
| 7 | Sentence | BGE-M3 | Hybrid (RRF) | 0.797 | 0.803 | 0.737 | 0.883 | 0.767 | 88.8 |
| 8 | Sentence | OpenAI | Reranker | 0.795 | 0.770 | 0.765 | 0.840 | 0.806 | 664.7 |
| 9 | Sentence | MiniLM | BM25 | 0.788 | 0.717 | 0.754 | 0.920 | 0.761 | 8.2 |
| 10 | Sentence | OpenAI | BM25 | 0.787 | 0.714 | 0.706 | 0.906 | 0.821 | 6.4 |
| 11 | Fixed-size | BGE-M3 | Hybrid (RRF) | 0.783 | 0.838 | 0.689 | 0.837 | 0.768 | 96.8 |
| 12 | Fixed-size | MiniLM | Reranker | 0.781 | 0.727 | 0.735 | 0.898 | 0.764 | 377.1 |
| 13 | Fixed-size | BGE-M3 | Reranker | 0.767 | 0.736 | 0.747 | 0.816 | 0.771 | 533.9 |
| 14 | Sentence | BGE-M3 | Reranker | 0.764 | 0.720 | 0.725 | 0.846 | 0.766 | 414.0 |
| 15 | Sentence | BGE-M3 | BM25 | 0.762 | 0.690 | 0.709 | 0.875 | 0.772 | 4.4 |
| 16 | Sentence | MiniLM | Reranker | 0.761 | 0.677 | 0.735 | 0.875 | 0.758 | 341.5 |
| 17 | Sentence | BGE-M3 | Semantic | 0.760 | 0.722 | 0.720 | 0.839 | 0.760 | 171.8 |
| 18 | Fixed-size | MiniLM | BM25 | 0.755 | 0.725 | 0.709 | 0.830 | 0.757 | 9.3 |
| 19 | Fixed-size | OpenAI | BM25 | 0.754 | 0.726 | 0.677 | 0.800 | 0.813 | 19.5 |
| 20 | Fixed-size | MiniLM | Hybrid (RRF) | 0.746 | 0.672 | 0.746 | 0.787 | 0.779 | 20.7 |
| 21 | Fixed-size | BGE-M3 | BM25 | 0.739 | 0.721 | 0.620 | 0.858 | 0.759 | 11.9 |
| 22 | Sentence | MiniLM | Hybrid (RRF) | 0.722 | 0.598 | 0.701 | 0.794 | 0.794 | 22.9 |
| 23 | Fixed-size | MiniLM | Semantic | 0.677 | 0.538 | 0.604 | 0.756 | 0.810 | 20.6 |
| 24 | Semantic | OpenAI | Reranker | 0.650 | 0.621 | 0.280 | 0.888 | 0.811 | 560.3 |
| 25 | Sentence | MiniLM | Semantic | 0.641 | 0.473 | 0.483 | 0.823 | 0.784 | 13.3 |
| 26 | Semantic | BGE-M3 | Reranker | 0.627 | 0.597 | 0.342 | 0.821 | 0.748 | 411.1 |
| 27 | Semantic | MiniLM | Reranker | 0.618 | 0.511 | 0.280 | 0.869 | 0.810 | 279.2 |
| 28 | Semantic | OpenAI | Hybrid (RRF) | 0.615 | 0.551 | 0.247 | 0.874 | 0.789 | 289.6 |
| 29 | Semantic | BGE-M3 | Semantic | 0.609 | 0.498 | 0.317 | 0.867 | 0.753 | 186.6 |
| 30 | Semantic | OpenAI | Semantic | 0.608 | 0.448 | 0.289 | 0.884 | 0.811 | 233.5 |
| 31 | Semantic | BGE-M3 | Hybrid (RRF) | 0.598 | 0.544 | 0.223 | 0.883 | 0.742 | 134.8 |
| 32 | Semantic | MiniLM | Hybrid (RRF) | 0.562 | 0.325 | 0.205 | 0.882 | 0.837 | 72.3 |
| 33 | Semantic | MiniLM | Semantic | 0.562 | 0.364 | 0.205 | 0.849 | 0.831 | 16.2 |
| 34 | Semantic | OpenAI | BM25 | 0.531 | 0.342 | 0.163 | 0.832 | 0.787 | 63.4 |
| 35 | Semantic | BGE-M3 | BM25 | 0.529 | 0.388 | 0.168 | 0.840 | 0.722 | 57.2 |
| 36 | Semantic | MiniLM | BM25 | 0.518 | 0.339 | 0.161 | 0.857 | 0.714 | 63.1 |
| # | Chunker | Embedder | Retriever | Overall | Context Precision | Context Recall | Faithfulness | Answer Relevance | Latency (ms) |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Fixed-size | BGE-M3 | Semantic | 0.858 | 0.887 | 0.922 | 0.844 | 0.779 | 399.4 |
| 2 | Fixed-size | OpenAI | Hybrid (RRF) | 0.842 | 0.880 | 0.781 | 0.890 | 0.816 | 222.9 |
| 3 | Sentence | OpenAI | Semantic | 0.838 | 0.832 | 0.796 | 0.910 | 0.815 | 372.2 |
| 4 | Sentence | BGE-M3 | Hybrid (RRF) | 0.833 | 0.841 | 0.797 | 0.915 | 0.777 | 85.5 |
| 5 | Fixed-size | BGE-M3 | Hybrid (RRF) | 0.828 | 0.864 | 0.760 | 0.913 | 0.776 | 88.1 |
| 6 | Fixed-size | OpenAI | Reranker | 0.827 | 0.799 | 0.831 | 0.869 | 0.810 | 725.1 |
| 7 | Fixed-size | OpenAI | Semantic | 0.821 | 0.789 | 0.806 | 0.887 | 0.803 | 368.9 |
| 8 | Sentence | OpenAI | Hybrid (RRF) | 0.805 | 0.762 | 0.768 | 0.884 | 0.807 | 199.9 |
| 9 | Sentence | BGE-M3 | Semantic | 0.797 | 0.763 | 0.769 | 0.904 | 0.752 | 306.6 |
| 10 | Sentence | OpenAI | BM25 | 0.795 | 0.721 | 0.682 | 0.964 | 0.814 | 5.7 |
| 11 | Fixed-size | MiniLM | Reranker | 0.793 | 0.691 | 0.842 | 0.889 | 0.750 | 517.5 |
| 12 | Sentence | OpenAI | Reranker | 0.793 | 0.754 | 0.781 | 0.834 | 0.804 | 748.3 |
| 13 | Fixed-size | BGE-M3 | Reranker | 0.778 | 0.713 | 0.853 | 0.765 | 0.781 | 681.9 |
| 14 | Fixed-size | MiniLM | BM25 | 0.778 | 0.801 | 0.732 | 0.836 | 0.743 | 7.3 |
| 15 | Fixed-size | BGE-M3 | BM25 | 0.776 | 0.803 | 0.651 | 0.882 | 0.769 | 9.7 |
| 16 | Fixed-size | OpenAI | BM25 | 0.776 | 0.833 | 0.668 | 0.789 | 0.814 | 17.3 |
| 17 | Sentence | MiniLM | BM25 | 0.776 | 0.709 | 0.701 | 0.955 | 0.741 | 5.9 |
| 18 | Fixed-size | MiniLM | Hybrid (RRF) | 0.770 | 0.665 | 0.856 | 0.787 | 0.772 | 19.4 |
| 19 | Sentence | BGE-M3 | Reranker | 0.762 | 0.642 | 0.755 | 0.873 | 0.778 | 560.0 |
| 20 | Sentence | BGE-M3 | BM25 | 0.761 | 0.670 | 0.668 | 0.930 | 0.776 | 3.5 |
| 21 | Sentence | MiniLM | Reranker | 0.747 | 0.618 | 0.752 | 0.862 | 0.754 | 478.6 |
| 22 | Semantic | BGE-M3 | Semantic | 0.684 | 0.602 | 0.479 | 0.893 | 0.764 | 351.4 |
| 23 | Sentence | MiniLM | Hybrid (RRF) | 0.680 | 0.607 | 0.581 | 0.754 | 0.777 | 22.1 |
| 24 | Semantic | OpenAI | Reranker | 0.661 | 0.617 | 0.313 | 0.931 | 0.782 | 649.2 |
| 25 | Fixed-size | MiniLM | Semantic | 0.654 | 0.483 | 0.567 | 0.760 | 0.805 | 27.1 |
| 26 | Semantic | BGE-M3 | Reranker | 0.639 | 0.598 | 0.395 | 0.833 | 0.731 | 541.5 |
| 27 | Semantic | OpenAI | Semantic | 0.626 | 0.454 | 0.383 | 0.873 | 0.795 | 312.3 |
| 28 | Semantic | BGE-M3 | Hybrid (RRF) | 0.617 | 0.616 | 0.277 | 0.848 | 0.725 | 126.3 |
| 29 | Semantic | MiniLM | Reranker | 0.617 | 0.542 | 0.292 | 0.858 | 0.777 | 377.3 |
| 30 | Sentence | MiniLM | Semantic | 0.615 | 0.421 | 0.378 | 0.869 | 0.792 | 12.5 |
| 31 | Semantic | OpenAI | Hybrid (RRF) | 0.574 | 0.382 | 0.241 | 0.869 | 0.805 | 316.3 |
| 32 | Semantic | OpenAI | BM25 | 0.556 | 0.333 | 0.237 | 0.858 | 0.797 | 55.0 |
| 33 | Semantic | BGE-M3 | BM25 | 0.536 | 0.321 | 0.237 | 0.868 | 0.719 | 49.2 |
| 34 | Semantic | MiniLM | BM25 | 0.527 | 0.342 | 0.209 | 0.831 | 0.725 | 53.2 |
| 35 | Semantic | MiniLM | Hybrid (RRF) | 0.517 | 0.193 | 0.181 | 0.845 | 0.849 | 63.5 |
| 36 | Semantic | MiniLM | Semantic | 0.516 | 0.243 | 0.141 | 0.856 | 0.823 | 25.4 |
| # | Chunker | Embedder | Retriever | Overall | Context Precision | Context Recall | Faithfulness | Answer Relevance | Latency (ms) |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Sentence | OpenAI | BM25 | 0.867 | 0.867 | 0.850 | 0.929 | 0.822 | 6.6 |
| 2 | Sentence | MiniLM | BM25 | 0.860 | 0.872 | 0.841 | 0.932 | 0.795 | 8.8 |
| 3 | Sentence | BGE-M3 | Hybrid (RRF) | 0.850 | 0.900 | 0.831 | 0.892 | 0.777 | 91.2 |
| 4 | Sentence | BGE-M3 | BM25 | 0.848 | 0.841 | 0.850 | 0.918 | 0.783 | 4.3 |
| 5 | Fixed-size | BGE-M3 | Semantic | 0.839 | 0.879 | 0.844 | 0.850 | 0.784 | 87.0 |
| 6 | Sentence | MiniLM | Reranker | 0.833 | 0.706 | 0.892 | 0.971 | 0.761 | 253.7 |
| 7 | Fixed-size | MiniLM | Reranker | 0.831 | 0.859 | 0.799 | 0.902 | 0.765 | 270.9 |
| 8 | Sentence | OpenAI | Hybrid (RRF) | 0.828 | 0.844 | 0.774 | 0.866 | 0.828 | 193.4 |
| 9 | Fixed-size | OpenAI | Hybrid (RRF) | 0.827 | 0.886 | 0.885 | 0.701 | 0.834 | 200.3 |
| 10 | Sentence | BGE-M3 | Reranker | 0.826 | 0.788 | 0.819 | 0.914 | 0.781 | 323.4 |
| 11 | Fixed-size | BGE-M3 | Hybrid (RRF) | 0.823 | 0.899 | 0.813 | 0.793 | 0.785 | 93.8 |
| 12 | Fixed-size | BGE-M3 | Reranker | 0.823 | 0.841 | 0.804 | 0.865 | 0.781 | 433.4 |
| 13 | Sentence | OpenAI | Reranker | 0.822 | 0.773 | 0.828 | 0.870 | 0.818 | 517.1 |
| 14 | Sentence | OpenAI | Semantic | 0.817 | 0.776 | 0.803 | 0.863 | 0.827 | 212.1 |
| 15 | Fixed-size | OpenAI | Reranker | 0.815 | 0.827 | 0.715 | 0.897 | 0.820 | 552.1 |
| 16 | Fixed-size | OpenAI | Semantic | 0.815 | 0.851 | 0.831 | 0.762 | 0.814 | 179.6 |
| 17 | Sentence | MiniLM | Hybrid (RRF) | 0.810 | 0.669 | 0.889 | 0.855 | 0.826 | 21.5 |
| 18 | Fixed-size | MiniLM | BM25 | 0.805 | 0.751 | 0.811 | 0.873 | 0.787 | 8.5 |
| 19 | Fixed-size | OpenAI | BM25 | 0.804 | 0.704 | 0.861 | 0.835 | 0.816 | 22.5 |
| 20 | Fixed-size | MiniLM | Hybrid (RRF) | 0.800 | 0.809 | 0.780 | 0.812 | 0.798 | 17.2 |
| 21 | Sentence | BGE-M3 | Semantic | 0.786 | 0.735 | 0.835 | 0.795 | 0.778 | 84.2 |
| 22 | Fixed-size | BGE-M3 | BM25 | 0.784 | 0.729 | 0.736 | 0.896 | 0.776 | 11.3 |
| 23 | Fixed-size | MiniLM | Semantic | 0.736 | 0.677 | 0.655 | 0.782 | 0.832 | 14.8 |
| 24 | Sentence | MiniLM | Semantic | 0.681 | 0.485 | 0.635 | 0.826 | 0.779 | 10.0 |
| 25 | Semantic | OpenAI | Reranker | 0.651 | 0.569 | 0.325 | 0.869 | 0.843 | 503.5 |
| 26 | Semantic | OpenAI | Hybrid (RRF) | 0.650 | 0.620 | 0.283 | 0.928 | 0.770 | 264.8 |
| 27 | Semantic | MiniLM | Semantic | 0.627 | 0.467 | 0.327 | 0.846 | 0.869 | 10.7 |
| 28 | Semantic | MiniLM | Hybrid (RRF) | 0.621 | 0.388 | 0.327 | 0.928 | 0.840 | 72.6 |
| 29 | Semantic | MiniLM | Reranker | 0.603 | 0.436 | 0.238 | 0.878 | 0.860 | 202.1 |
| 30 | Semantic | BGE-M3 | Reranker | 0.598 | 0.504 | 0.310 | 0.798 | 0.781 | 351.8 |
| 31 | Semantic | OpenAI | Semantic | 0.598 | 0.294 | 0.291 | 0.954 | 0.854 | 183.3 |
| 32 | Semantic | BGE-M3 | Hybrid (RRF) | 0.593 | 0.454 | 0.246 | 0.906 | 0.767 | 136.1 |
| 33 | Semantic | BGE-M3 | Semantic | 0.555 | 0.324 | 0.270 | 0.863 | 0.762 | 82.9 |
| 34 | Semantic | BGE-M3 | BM25 | 0.513 | 0.300 | 0.179 | 0.810 | 0.763 | 58.0 |
| 35 | Semantic | OpenAI | BM25 | 0.512 | 0.198 | 0.179 | 0.848 | 0.821 | 64.8 |
| 36 | Semantic | MiniLM | BM25 | 0.499 | 0.156 | 0.193 | 0.947 | 0.699 | 68.7 |
| # | Chunker | Embedder | Retriever | Overall | Context Precision | Context Recall | Faithfulness | Answer Relevance | Latency (ms) |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Fixed-size | OpenAI | Semantic | 0.782 | 0.698 | 0.663 | 0.946 | 0.821 | 173.5 |
| 2 | Sentence | OpenAI | Semantic | 0.781 | 0.747 | 0.678 | 0.904 | 0.796 | 206.7 |
| 3 | Sentence | OpenAI | Hybrid (RRF) | 0.779 | 0.691 | 0.706 | 0.897 | 0.821 | 198.9 |
| 4 | Sentence | OpenAI | Reranker | 0.764 | 0.794 | 0.656 | 0.811 | 0.796 | 768.6 |
| 5 | Fixed-size | OpenAI | Hybrid (RRF) | 0.750 | 0.640 | 0.634 | 0.921 | 0.805 | 238.6 |
| 6 | Fixed-size | OpenAI | Reranker | 0.728 | 0.580 | 0.682 | 0.851 | 0.799 | 540.6 |
| 7 | Fixed-size | BGE-M3 | Semantic | 0.714 | 0.588 | 0.756 | 0.765 | 0.748 | 98.5 |
| 8 | Sentence | MiniLM | BM25 | 0.714 | 0.529 | 0.732 | 0.846 | 0.750 | 10.6 |
| 9 | Fixed-size | MiniLM | Reranker | 0.696 | 0.620 | 0.469 | 0.908 | 0.788 | 291.2 |
| 10 | Sentence | MiniLM | Reranker | 0.696 | 0.743 | 0.505 | 0.774 | 0.760 | 246.0 |
| 11 | Sentence | BGE-M3 | Reranker | 0.690 | 0.767 | 0.554 | 0.714 | 0.726 | 324.3 |
| 12 | Sentence | MiniLM | Hybrid (RRF) | 0.682 | 0.492 | 0.667 | 0.786 | 0.782 | 20.5 |
| 13 | Fixed-size | BGE-M3 | Reranker | 0.679 | 0.640 | 0.495 | 0.842 | 0.740 | 465.7 |
| 14 | Sentence | BGE-M3 | Hybrid (RRF) | 0.670 | 0.612 | 0.513 | 0.818 | 0.738 | 88.6 |
| 15 | Sentence | OpenAI | BM25 | 0.669 | 0.505 | 0.562 | 0.776 | 0.833 | 7.7 |
| 16 | Sentence | BGE-M3 | Semantic | 0.665 | 0.637 | 0.487 | 0.786 | 0.748 | 84.6 |
| 17 | Fixed-size | BGE-M3 | Hybrid (RRF) | 0.655 | 0.714 | 0.408 | 0.763 | 0.735 | 97.3 |
| 18 | Fixed-size | MiniLM | BM25 | 0.652 | 0.562 | 0.539 | 0.764 | 0.743 | 10.1 |
| 19 | Fixed-size | OpenAI | BM25 | 0.652 | 0.570 | 0.455 | 0.775 | 0.807 | 21.2 |
| 20 | Sentence | BGE-M3 | BM25 | 0.651 | 0.529 | 0.599 | 0.727 | 0.750 | 4.6 |
| 21 | Fixed-size | MiniLM | Semantic | 0.643 | 0.456 | 0.605 | 0.717 | 0.792 | 11.6 |
| 22 | Semantic | BGE-M3 | Reranker | 0.643 | 0.715 | 0.290 | 0.832 | 0.735 | 281.3 |
| 23 | Semantic | OpenAI | Hybrid (RRF) | 0.640 | 0.751 | 0.211 | 0.812 | 0.787 | 281.3 |
| 24 | Semantic | MiniLM | Reranker | 0.637 | 0.557 | 0.312 | 0.877 | 0.800 | 201.5 |
| 25 | Fixed-size | MiniLM | Hybrid (RRF) | 0.636 | 0.508 | 0.513 | 0.756 | 0.767 | 18.7 |
| 26 | Sentence | MiniLM | Semantic | 0.633 | 0.548 | 0.468 | 0.740 | 0.776 | 10.3 |
| 27 | Semantic | OpenAI | Reranker | 0.630 | 0.695 | 0.165 | 0.840 | 0.820 | 469.3 |
| 28 | Fixed-size | BGE-M3 | BM25 | 0.618 | 0.570 | 0.418 | 0.767 | 0.718 | 15.7 |
| 29 | Semantic | OpenAI | Semantic | 0.589 | 0.636 | 0.124 | 0.814 | 0.784 | 188.4 |
| 30 | Semantic | BGE-M3 | Hybrid (RRF) | 0.572 | 0.536 | 0.101 | 0.911 | 0.738 | 144.8 |
| 31 | Semantic | MiniLM | Hybrid (RRF) | 0.565 | 0.471 | 0.089 | 0.886 | 0.812 | 78.4 |
| 32 | Semantic | MiniLM | Semantic | 0.559 | 0.438 | 0.157 | 0.843 | 0.798 | 9.5 |
| 33 | Semantic | BGE-M3 | Semantic | 0.548 | 0.545 | 0.100 | 0.826 | 0.722 | 81.2 |
| 34 | Semantic | BGE-M3 | BM25 | 0.539 | 0.615 | 0.036 | 0.832 | 0.674 | 65.8 |
| 35 | Semantic | MiniLM | BM25 | 0.527 | 0.571 | 0.036 | 0.787 | 0.714 | 69.3 |
| 36 | Semantic | OpenAI | BM25 | 0.513 | 0.543 | 0.018 | 0.766 | 0.726 | 72.6 |
| # | Chunker | Embedder | Retriever | Overall | Context Precision | Context Recall | Faithfulness | Answer Relevance | Latency (ms) |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Fixed-size | BGE-M3 | BM25 | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 13.9 |
| 2 | Fixed-size | BGE-M3 | Hybrid (RRF) | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 138.8 |
| 3 | Fixed-size | BGE-M3 | Reranker | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 402.4 |
| 4 | Fixed-size | BGE-M3 | Semantic | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 143.0 |
| 5 | Fixed-size | MiniLM | BM25 | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 17.9 |
| 6 | Fixed-size | MiniLM | Hybrid (RRF) | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 41.6 |
| 7 | Fixed-size | MiniLM | Reranker | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 335.0 |
| 8 | Fixed-size | MiniLM | Semantic | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 33.2 |
| 9 | Fixed-size | OpenAI | BM25 | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 15.2 |
| 10 | Fixed-size | OpenAI | Hybrid (RRF) | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 211.1 |
| 11 | Fixed-size | OpenAI | Reranker | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 447.6 |
| 12 | Fixed-size | OpenAI | Semantic | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 159.5 |
| 13 | Semantic | BGE-M3 | BM25 | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 67.2 |
| 14 | Semantic | BGE-M3 | Hybrid (RRF) | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 141.3 |
| 15 | Semantic | BGE-M3 | Reranker | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 370.2 |
| 16 | Semantic | BGE-M3 | Semantic | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 84.0 |
| 17 | Semantic | MiniLM | BM25 | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 71.4 |
| 18 | Semantic | MiniLM | Hybrid (RRF) | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 92.3 |
| 19 | Semantic | MiniLM | Reranker | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 299.5 |
| 20 | Semantic | MiniLM | Semantic | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 11.2 |
| 21 | Semantic | OpenAI | BM25 | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 70.9 |
| 22 | Semantic | OpenAI | Hybrid (RRF) | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 277.0 |
| 23 | Semantic | OpenAI | Reranker | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 587.1 |
| 24 | Semantic | OpenAI | Semantic | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 173.9 |
| 25 | Sentence | BGE-M3 | BM25 | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 7.2 |
| 26 | Sentence | BGE-M3 | Hybrid (RRF) | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 95.2 |
| 27 | Sentence | BGE-M3 | Reranker | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 311.3 |
| 28 | Sentence | BGE-M3 | Semantic | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 98.7 |
| 29 | Sentence | MiniLM | BM25 | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 10.1 |
| 30 | Sentence | MiniLM | Hybrid (RRF) | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 36.3 |
| 31 | Sentence | MiniLM | Reranker | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 279.4 |
| 32 | Sentence | MiniLM | Semantic | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 33.4 |
| 33 | Sentence | OpenAI | BM25 | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 6.2 |
| 34 | Sentence | OpenAI | Hybrid (RRF) | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 195.2 |
| 35 | Sentence | OpenAI | Reranker | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 530.3 |
| 36 | Sentence | OpenAI | Semantic | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | <span class="dash">—</span> | 179.9 |
Corpus: 69 ArXiv papers on RAG, dense retrieval, and embedding models. Questions: 31 questions across four types — factual, conceptual, multi-hop, and unanswerable. Configurations: 3 chunkers × 3 embedders × 4 retrieval strategies = 36 configurations, each evaluated across all 31 questions (1116 scored results total). Metrics: Context Precision, Context Recall, and Faithfulness are LLM-judged via RAGAS; Answer Relevance is cosine similarity (no LLM call). All metrics are 0–1, higher is better. Green cells mark the best value per column. RAGAS scores are model-dependent — absolute values may shift with a different judge model, but relative rankings between configurations are stable. Judge: Groq openai/gpt-oss-120b at temperature 0. Generated 2026-04-10 13:53 UTC.