sriramgonella 3 hours ago

One thing I’ve noticed working with AI systems in production is that the engineering mindset often dominates because teams are under pressure to ship working systems quickly.

That means the focus becomes: Does it work on the dataset we have? Can we deploy it? rather than What exactly is the model learning and why?

This gap is where a lot of issues appear — especially with LLM-based systems. Many pipelines look impressive in demos but behave unpredictably in real workflows because evaluation is often shallow or poorly defined.

It feels like we’re currently rediscovering the importance of verification and evaluation frameworks for AI systems, similar to how traditional software engineering evolved testing disciplines decades ago