AI Engineering
·
May 9, 2026
·
14 min read Build an eval pipeline with golden datasets, scoring, smoke/full modes, and CI gates to catch prompt regressions before release.
Deep Dive
·
Apr 12, 2026
·
18 min read Revised with .NET examples — A newer version of this article, covering both Python and .NET, is available as part of the MAF v1: Python and .NET series: MAF v1 — 23-evaluation-framework. The newer version applies three substantive fixes to the framework below — canonical AgentRunResponse extraction (no more hasattr chain), word-boundary alias matching (the original false-positives "profit" against the "price" alias), and a smoke / full tier split for CI vs nightly runs. Read this article for the conceptual ground; read the new one for the production-grade implementation.
Deep Dive
·
Jul 22, 2023
·
4 min read Introduction # Docker Hub is a cloud-based repository service for container images, allowing developers to store, share, and manage their Docker images. This article provides a step-by-step guide to building Docker images and pushing them to Docker Hub, making your containerized applications accessible to your team or the wider community.
Deep Dive
·
Jul 22, 2023
·
4 min read Introduction # Automating Docker image builds and deployments can significantly improve development workflows. This article demonstrates how to use GitHub Actions to automatically build and push Docker images to Docker Hub whenever changes are committed to your repository, saving time and ensuring consistent builds across your team.