R&D

The path to AGI is boring

Alex Chen

//

Aug 12, 2025

It's not about magic algorithms. It's about data pipelines, cleaning, and boring infrastructure scaling. The real work of building intelligence happens in the trenches of data cleaning.

Garbage In, Garbage Out

We spent 6 months building a deduplication pipeline using MinHash LSH. It wasn't glamorous. No one tweeted about it. It didn't involve a new Transformer architecture. But it increased our model evaluation scores by 12 points—more than any architectural change to the transformer blocks we experimented with in the last year.

Synthetic Data & Alignment

Real-world data is messy and biased. We are moving towards training on synthetic data generated by larger, reasoned models. This allows us to curate the "curriculum" for our models, ensuring they learn reasoning patterns rather than just memorizing internet noise. We treat data creation as a software engineering problem, with unit tests for data quality.

Conclusion

AGI won't be achieved by a single "eureka" moment. It will be achieved by millions of hours of cleaning CSV files, fixing JSON syntax errors, and optimizing GPU utilization. Boring is effective.

Create a free website with Framer, the website builder loved by startups, designers and agencies.