
CDRAG: RAG with LLM-guided document retrieval
CDRAG CDRAG (Clustered Dynamic Retrieval-Augmented Generation): LLM-selected cluster retrieval for RAG — outperforms standard cosine retrieval on legal QAStandard RAG systems retrieve the top-K most
I’m Bart Amin, a data scientist working in industry, where I apply statistical modeling and machine learning to solve real-world problems. My work focuses on translating complex business challenges into well-defined analytical questions and delivering insights that drive practical decision-making.
I hold a Master’s degree in Methodology & Statistics (Behavioural Data Science) from the University of Amsterdam, where I developed a strong foundation in (Bayesian) statistical inference, machine learning, forecasting, and programming. During my time as a visiting student at Harvard University, I further strengthened my understanding of research methodology under the guidance of leading experts in the field.
I believe an effective data scientist combines two key strengths: the ability to rigorously frame problems using the right data and methods, and the ability to communicate results in a way that leads to clear, actionable outcomes.

CDRAG CDRAG (Clustered Dynamic Retrieval-Augmented Generation): LLM-selected cluster retrieval for RAG — outperforms standard cosine retrieval on legal QAStandard RAG systems retrieve the top-K most

Confidence intervals (CIs) are a powerful and elegant way to quantify uncertainty in estimates. Instead of providing a single-point estimate—such as an average revenue increase

Clustering is a popular (unsupervised) technique in data science, widely used for discovering hidden patterns in data without having prior understanding of what such patterns

*Disclaimer: The values and predictions presented in this blog post are entirely random and do not reflect actual findings. Additionally, this project was not conducted on behalf
is proudly powered by WordPress