Ludmal De Silva
Notes on AI, .NET, and the code I ship anyway. github.com/ludmal.
PostsProjectsAbout
Year
2023202220212020201920182017201620152014201320122011201020092008200720062005
  • Apr 8, 2026·Future of AI·3 min read

    Agents That Pay for Themselves: The Economic Loop That Will Redefine Software in 2027

    Most agents lose money on every call. A handful have started covering their own inference cost by generating measurable value per run. The structural difference is worth understanding before you build the next one.

  • Mar 5, 2026·RAG·2 min read

    From RAG to Production — Observability, Cost Controls, and the Reality No Demo Shows

    Everything the tutorials skip. The instrumentation, the kill switches, and the 3am pager habits that turn a RAG demo into something you can keep on call.

  • Mar 4, 2026·Vibe Coding·3 min read

    The Vibe Coder's Dilemma: When to Read the Code You Just Generated

    Reading every AI-generated line is slow. Reading none is reckless. The honest answer is "it depends," and the dependency is more predictable than people admit.

  • Feb 14, 2026·Spec-Driven Development·3 min read

    OpenSpec in a Monorepo: Keeping AI-Generated Code Consistent Across 12 Services

    Specs work great for one feature. They start to fight each other in a monorepo with a dozen services, each prompting AI tools with its own conventions. Here's how to share specs without making every team write YAML by hand.

  • Jan 27, 2026·Microsoft Agent Framework·3 min read

    From Prototype to Production: Deploying Microsoft Agent Framework on Azure

    Your console app works on a developer laptop. Production needs auth, telemetry, secrets, scaling, and a way to deploy without `dotnet run`. Here's the smallest Azure setup that actually works.

  • Jan 14, 2026·RAG·2 min read

    GraphRAG in .NET — When Vector Search Can't Reason Across Documents

    Vector search is great at "find me a relevant chunk." It's bad at "find me chunks that mention X and Y in a particular relationship." That's what GraphRAG is for.

  • Jan 9, 2026·LLM Integration·2 min read

    Caching LLM Responses: When It Saves Money and When It Silently Breaks UX

    Caching LLM calls is irresistible. Same input, same output, free. Except outputs aren't always supposed to be the same, and a stale cache hit looks like a broken product.

  • Dec 22, 2025·Vibe Coding·3 min read

    I Vibe-Coded a SaaS in a Weekend. Here's What Broke in Week Two.

    Saturday: I shipped a working SaaS with AI tools and almost no manual code. Sunday: 14 paying users. The following Wednesday: my Stripe webhook had been creating duplicate subscriptions for 36 hours.

  • Dec 8, 2025·RAG·2 min read

    The Hidden Cost of Re-Ranking: Benchmarking Cross-Encoders in Production RAG

    Cross-encoder rerankers boost recall on paper. In production, they doubled our p95 latency and the lift didn't show up in user metrics. The benchmark we wish we'd run first.

  • Nov 24, 2025·Spec-Driven Development·3 min read

    Why I Stopped Writing Prompts and Started Writing Specs (with OpenSpec)

    A year of writing increasingly clever prompts ended with the same problem on every feature. I could not reliably reproduce what the AI built yesterday. Specs solved it. Prompts didn't.

  • Nov 19, 2025·RAG·2 min read

    RAG Caching Strategies — Semantic Caching, Embedding Reuse, and the Cost Math

    Three layers of caching that cut 60-80% of your LLM bill in a busy RAG system. Plus the one cache that will silently break your UX.

  • Nov 9, 2025·Microsoft Agent Framework·3 min read

    Building a Multi-Agent Workflow with Microsoft Agent Framework in C#

    A single agent can do a lot. Two agents that hand off cleanly can do more, and you don't have to invent message routing to make it work.

  • Oct 27, 2025·LLM Integration·3 min read

    The Feature Flag Playbook for Rolling Out LLM Features Safely

    LLM features fail differently to normal code. They get slow, they get expensive, they get weird. Three flag patterns that let you ship them without a 3am rollback.

  • Oct 13, 2025·Future of AI·2 min read

    Small Models, Big Impact: The Quiet Shift From Frontier Models to Specialized Ones

    Two years of "use the biggest model you can afford" is ending. Smaller, specialised models are quietly matching frontier performance on the narrow tasks that pay the bills.

  • Sep 30, 2025·Spec-Driven Development·3 min read

    Spec-Driven Development with OpenSpec: Killing the 'It Works on My Machine' of AI Coding

    AI coding tools generate code that compiles, looks right, and quietly disagrees with what you actually wanted. Specs in version control turn vibes into contracts.

  • Sep 17, 2025·RAG·3 min read

    From Naive RAG to Agentic RAG: A Migration Story in 5 Steps

    Naive RAG works fine until your users start asking compound questions. Here's how we turned a single-shot retriever into something that plans, retrieves, and verifies. Without a rewrite.

  • Sep 8, 2025·RAG·2 min read

    Building an Agentic RAG with Microsoft Semantic Kernel

    When single-shot retrieval isn't enough, an agent that decides whether to retrieve — and critiques its own answer — earns the extra latency. Here's how it looks in SK.

  • Sep 2, 2025·Vibe Coding·3 min read

    Vibe Coding Isn't a Skill Issue — It's a Verification Issue

    The popular dunk on "vibe coders" is that they can't read code. The real problem is they can't verify it. Two different failures, two very different fixes.

  • Aug 19, 2025·Microsoft Agent Framework·2 min read

    Microsoft Agent Framework vs Semantic Kernel: What Changed and Why It Matters

    If you bet on Semantic Kernel a year ago, you noticed Microsoft quietly drifting toward "Agent Framework." Here's what actually changed, what carries over, and where the migration hurts.

  • Aug 8, 2025·LLM Integration·3 min read

    Streaming LLM Responses Through Your Existing REST API: Patterns That Actually Work

    Your frontend wants tokens as they generate. Your API gateway only speaks JSON. Here's how to bridge the two without rewriting the stack you spent two years getting stable.

  • Jul 29, 2025·Future of AI·2 min read

    The Post-Chat Era: Why Conversational UIs Are a Local Maximum

    Every AI product has a chat box. Most of them shouldn't. Chat is the worst interface for most LLM use cases. We just defaulted there because OpenAI shipped it first.

  • Jul 26, 2025·RAG·2 min read

    Evaluating RAG Systems — RAGAS, Faithfulness, and Setting Up an Eval Harness in .NET

    "Looks right to me" is not evaluation. Four metrics that catch regressions before users do, and how to wire them into your test suite.

  • Jul 15, 2025·RAG·3 min read

    Hybrid Search in RAG: When Vector Similarity Alone Isn't Enough

    Pure vector search misses exact-match queries. Product SKUs, error codes, function names. Hybrid search fixes that without giving up the semantic recall you actually like.

  • Jun 22, 2025·LLM Integration·3 min read

    Bolting an LLM Onto a Legacy .NET App Without Breaking Production

    You've got a 12-year-old .NET Framework app, an SLA, and a director who wants AI features by Q3. Here's how to bolt on an LLM without touching the monolith.

  • Jun 18, 2025·RAG·2 min read

    Hybrid Search in RAG with Azure AI Search and BM25 — The .NET Implementation

    Vector search alone misses product codes, error messages, and proper nouns. Hybrid search with Reciprocal Rank Fusion fixes that. Here's how it looks in C#.

  • Jun 4, 2025·RAG·3 min read

    Why Your RAG Pipeline Returns Garbage (And It's Probably Your Chunking)

    Your retriever pulls the right documents and the answers still come out wrong. Nine times out of ten the problem isn't where you're looking. It's upstream.

  • Apr 30, 2025·RAG·2 min read

    Kernel Memory in .NET — Microsoft's Out-of-the-Box RAG Service Reviewed

    Kernel Memory packages "do RAG" into a service you can run in-process or stand alone. Useful at the start. Constraining when you grow out of it.

  • Mar 12, 2025·RAG·2 min read

    Naive RAG vs Advanced RAG vs Agentic RAG vs GraphRAG vs Adaptive RAG — Which One Do You Actually Need?

    Five RAG architectures. Each one earns its complexity in a specific situation. Most teams pick the wrong one because they read about the fanciest one first.

  • Feb 22, 2025·.NET·2 min read

    EF Core 10 — Named Query Filters, JSON Columns, and Bulk Updates That Actually Work

    Three EF Core 10 features that quietly fix long-running pain. Plus the migration trap on each one.

  • Feb 19, 2025·.NET·2 min read

    The New Built-in Validation in .NET 10 Minimal APIs — Goodbye, FluentValidation Boilerplate?

    .NET 10 finally wires data annotations into Minimal API endpoints with a single AddValidation call. FluentValidation still has a job — just a smaller one.

  • Feb 15, 2025·RAG·2 min read

    Embeddings Are Not Created Equal — Choosing the Right Model for Your RAG Domain

    A legal-tech RAG needs different embeddings to a customer-support one. The MTEB leaderboard hides this. Three things to check before you commit.

  • Feb 11, 2025·Career·2 min read

    From Senior Engineer to AI-Augmented Senior Engineer — What 20 Years of Experience Means Now

    Twenty years of code does not make you obsolete. It does change which parts of the job pay you back. An honest take from someone who's been around long enough to see a few of these waves.

  • Feb 8, 2025·.NET·2 min read

    Minimal APIs in .NET 10 — Everything That Actually Changed

    The headline features in one place, with honest takes on which ones you should adopt now and which can wait.

  • Feb 2, 2025·.NET·2 min read

    Native AOT for ASP.NET Core APIs in .NET 10 — When Cold-Start Latency Actually Matters

    Native AOT trades some flexibility for a 50ms cold start and a 30MB image. Worth it for serverless and edge. Not worth it for the average internal API.

  • Jan 30, 2025·Architecture·2 min read

    The .NET Aspire Production Reality Check

    A year of Aspire in real projects. It earns its keep at dev time. The "deploy to prod" story is better than people think and not as complete as Microsoft's slides suggest.

  • Jan 28, 2025·.NET·2 min read

    Server-Sent Events in .NET 10 — Streaming LLM Responses Without WebSockets

    .NET 10 ships SSE as a first-class Minimal API result. Streaming OpenAI tokens to a browser is now a one-line endpoint.

  • Jan 22, 2025·RAG·2 min read

    Why Your RAG Hallucinates — A Debugging Checklist

    Ten reasons RAG systems lie, in order of how often I see them. Each one has a symptom you can spot and a fix that takes less than a day.

  • Jan 12, 2025·.NET·2 min read

    Building Your Own MCP Server in .NET — Exposing Your APIs to Claude, ChatGPT, and Cursor

    MCP turns "my LLM client should be able to use my internal API" from a custom integration into a 50-line server. Here's how it looks in C#.

  • Dec 17, 2024·C#·2 min read

    C# 14 Field-Backed Properties and Null-Conditional Assignment — The Quietly Useful Upgrades

    Two C# 14 features that pay for themselves in your next PR. No grand pattern, just less boilerplate where you had it.

  • Dec 2, 2024·RAG·2 min read

    Vector Databases Compared — Qdrant, pgvector, Azure AI Search, Pinecone, Weaviate

    Five vector stores, one .NET engineer's honest take. The right choice depends on your scale, your ops appetite, and whether you already pay an Azure bill.

  • Nov 4, 2024·RAG·2 min read

    Chunking Strategies for RAG — Fixed-Size, Recursive, Semantic, and Document-Aware

    Four ways to split documents. Each one is the right answer for some doc type, and the wrong answer for others. The mistake is using the same chunker for everything.

  • Oct 28, 2024·Architecture·2 min read

    Designing AI Features That Survive Real Users

    Demos pass because there's one user, one prompt, and no outages. Real users break all three. The operational layer that turns an AI demo into something you can keep on call.

  • Oct 9, 2024·RAG·2 min read

    Reranking in RAG — When Top-K Vector Search Isn't Enough

    Vector search has a precision ceiling. A cross-encoder reranker breaks through it for the cost of one extra API call. Worth the 200ms more often than you'd guess.

  • Sep 3, 2024·.NET·2 min read

    FastEndpoints vs Minimal APIs vs Controllers in .NET — The Honest Comparison

    Three ways to write the same endpoint in 2024. Each has a real downside. Here's what they actually cost — beyond the marketing.

  • Aug 21, 2024·RAG·2 min read

    Building Your First RAG Pipeline in .NET with Semantic Kernel and Qdrant

    A document Q&A app in C# that doesn't depend on Python and doesn't take a week. Semantic Kernel, Qdrant in Docker, Azure OpenAI (or Ollama if you're cheap).

  • Jul 23, 2024·C#·2 min read

    LINQ Performance in .NET 9 / 10 — Where the JIT Helps You and Where It Doesn't

    The runtime quietly made LINQ a lot faster. The patterns that still cost you are the ones the JIT can't fix on your behalf.

  • Jun 11, 2024·Architecture·2 min read

    Microservices, Modular Monoliths, and 'Just Build a Monolith' — A Decision Framework

    The pendulum has swung back. Most teams should start with a modular monolith. Here's how to tell when you actually need microservices, and when you're just cosplaying as one of the FAANGs.

  • May 14, 2024·.NET·2 min read

    Structuring Minimal APIs Without the Program.cs Monster

    Minimal APIs are wonderful for the first 200 lines. Then Program.cs starts looking like a fanfic. Here's the pattern that keeps it sane.

  • Apr 9, 2024·C#·2 min read

    The Real Cost of async/await in C# — When You're Allocating More Than You Think

    Every async method generates a state machine. Most of the time it costs nothing. The hot path where it costs you a lot is smaller than people think, and bigger than they want to admit.

  • Mar 25, 2024·C#·1 min read

    Source Generators in C# — Stop Writing Boilerplate, Start Generating It

    Hand-writing DTO mappers and tool descriptors is a tax. Incremental source generators pay it for you, compile-time, with no reflection cost at runtime.

© 2026 Ludmal De SilvaBuilt late at night. Find more at github.com/ludmal.