Optimizing Intelligence: Why We Combined KANs, XGBoost, and Unbiased Sampling
In our quest to make Chuchu the most intelligent autonomous coding agent, we faced a classic dilemma: Speed vs. Intelligence.
We solved this by building a Model Recommender that dynamically chooses between a “Fast Model” (cheap, quick) and a “Smart Model” (expensive, capable) based on task complexity.
But how do we make that decision? And more importantly, how do we combine different signals to be sure?
The Ensemble Approach
We didn’t want to rely on a single algorithm. Instead, we chose an Ensemble approach, combining two very different powerful models:
- XGBoost: The industry standard for tabular data. It’s fast, robust, and incredibly accurate for feature-based classification.
- KAN (Kolmogorov-Arnold Networks): A novel neural network architecture based on the Kolmogorov-Arnold representation theorem1. Unlike traditional MLPs, KANs learn activation functions on edges, offering superior interpretability2. We want to know why a task is considered complex.
The Problem: How to Mix Them?
Having two models is great, but how do you combine their votes?
- 50% XGBoost + 50% KAN?
- 90% XGBoost because it’s older?
Guessing these weights ($w_1, w_2$) is prone to bias. We needed a mathematical way to find the optimal balance.
The Solution: Unbiased Sampling of N+1 Summative Weights
We implemented a sophisticated sampling algorithm (often called “stick-breaking”) to generate unbiased random weights that always sum to 1.0.
Mathematically, we are sampling uniformly from the standard $(N-1)$-simplex. This approach is based on the work by Henzi et al. (2021) on unbiased sampling of summative weights3.
How it Works in Chuchu
- Generate Hypotheses: We generate 50+ sets of random, unbiased weights (e.g.,
[0.7, 0.3],[0.4, 0.6],[0.1, 0.9]). - Evaluate: We test each combination against our validation set.
- Optimize: We select the weight vector that maximizes accuracy.
// internal/intelligence/recommender/sampling.go
func GenerateUnbiasedWeights(n int) []float64 {
// ... implementation of uniform simplex sampling ...
}
Why This Matters
This isn’t just math for math’s sake. This approach gives us:
- Adaptability: If KAN starts performing better on our specific dataset (codebase metrics), the optimizer will automatically shift more weight to it.
- Robustness: By not relying on a single model, we reduce the risk of one model hallucinating complexity.
- Future-Proofing: We can easily add a 3rd or 4th model (e.g., a Graph Neural Network) and the sampling logic scales instantly ($N+1$).
Conclusion
By combining the raw performance of XGBoost, the explainability of KANs, and the mathematical rigor of Unbiased Sampling, Chuchu’s Intelligence Layer doesn’t just guess—it learns the optimal strategy for your specific codebase.
This is how we move from “Artificial Intelligence” to “Optimized Intelligence”.
References
-
Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., Hou, T. Y., & Tegmark, M. (2024). KAN: Kolmogorov-Arnold Networks. arXiv:2404.19756 [cs.LG]. https://arxiv.org/abs/2404.19756 ↩
-
Liu, Z., Ma, P., Wang, Y., Matusik, W., & Tegmark, M. (2024). KAN 2.0: Kolmogorov-Arnold Networks Meet Science. arXiv:2408.10205 [cs.LG]. https://arxiv.org/abs/2408.10205 ↩
-
Henzi, A., Ziegel, J. F., & Gneiting, T. (2021). The pie sharing problem: Unbiased sampling of N+1 summative weights. Environmental Modelling & Software, 145, 105185. https://doi.org/10.1016/j.envsoft.2021.105185 ↩