Dev Systems
Build a multi-tenant configuration system with tagged storage patterns
In modern microservices architectures, configuration management remains one of the most challenging operational concerns. Two gaps emerge as organizations scale: handling tenant metadata that changes faster than cache TTL allows, and scaling the metadata service itself without creating a performance bottleneck. Traditional caching strategies force an uncomfortable trade-off: either accept stale tenant context (risking incorrect data isolation or feature flags), or implement aggressive cache inva
Trust But Canary: Configuration Safety at Scale
As AI increases developer speed and productivity it also increases the need for safeguards.On this episode of the Meta Tech Podcast, Pascal Hartig sits down with Ishwari and Joe from Meta’s Configurations team to discuss how Meta makes config rollouts safe at scale. Listen in to learn about canarying and progressive rollouts, the health checks and monitoring signals used to catch regressions early, and how incident reviews focus on improving systems rather than blaming people.They also tal
Ably Python SDK v3: realtime for Python, built for AI
Python dominates AI development. It's where teams build their agents, orchestration layers, and the backend systems that turn LLM calls into products people actually use. Over the past year, those systems have matured rapidly. What used to live in notebooks and prototypes is now running in production, serving real users with real expectations around reliability and performance.That maturity brings infrastructure requirements. Tokens need to stream in order. Sessions need to survive refreshes, re
Laid Off from Oracle(OCI). Looking for Software Roles (USA)
10+ yrs of experience working in distributed backend systems(Java). Founding Engineer in early stage cyber security startup, Worked on tier 1 service in oracle cloud infrastructure (OCI) which handled 295~ millions requests / operations. Scaled services for Series B Startup.
Show HN: Composer – AI architect / MCP for software architecture diagrams
Hi everyone!I built Composer, which is a tool turns your ideas into architecture diagrams. You can also use MCP to turn your EXISTING codebase into a visual diagram!It connects to all possible tools using MCP (Claude Code, Codex, OpenCode, etc.)The goal was to make system design easier and to be able to draw out what I wanted to make before I started / to explain to others.Its currently live at usecomposer.com and free!I’d love feedback on whether it feels useful for real projects and where
Unlock efficient model deployment: Simplified Inference Operator setup on Amazon SageMaker HyperPod
Amazon SageMaker HyperPod offers an end-to-end experience supporting the full lifecycle of AI development—from interactive experimentation and training to inference and post-training workflows. The SageMaker HyperPod Inference Operator is a Kubernetes controller that manages the deployment and lifecycle of models on HyperPod clusters, offering flexible deployment interfaces (kubectl, Python SDK, SageMaker Studio UI, or HyperPod CLI), advanced autoscaling with dynamic resource allocation, and com
How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines
AI coding assistants are powerful but only as good as their understanding of your codebase. When we pointed AI agents at one of Meta’s large-scale data processing pipelines – spanning four repositories, three languages, and over 4,100 files – we quickly found that they weren’t making useful edits quickly enough. We fixed this by building a pre-compute engine: a swarm of 50+ specialized AI agents that systematically read every file and produced 59 concise context files encoding tribal
Ask HN: Academic study on AI's impact on software development – want to join?
Would you like to participate in a study on AI’s impact on software development? We are researchers at New York University and City, University of London conducting an interview study on how new AI tools are changing the work of software developers. We are looking to speak with developers of all seniority levels, including those in leadership roles, who can share their experiences and perspectives on using (or choosing not to use) AI in their day-to-day work.Interviews will last 45 to 60 minutes
Secure HTTP‑Only AKS Ingress with Azure Front Door Premium, Firewall DNAT, and Private AGIC
Reference architecture and runbook (Part 1: HTTP-only) for Hub-Spoke networking with private Application Gateway (AGIC), Azure Firewall DNAT, and Azure Front Door Premium (WAF)0. When and Why to Use This ArchitectureSeries note: This document is Part 1 and uses HTTP to keep the focus on routing and control points. A follow-up Part 2 will extend the same architecture to HTTPS (end-to-end TLS) with the recommended certificate and policy configuration.What this document containsScope: Archite
Blue‑Green Strategy for Always‑On TCP Workloads on Azure Container Apps
Scenario: Always‑on workloads in Azure Container Apps continuously pull from a TCP source, process the stream, and push into Azure Managed Redis, which is then consumed by another always‑on Container Apps workload that writes to a database. Challenge: Standard revision traffic splitting isn’t a fit because there’s no HTTP ingress-based routing for this workload pattern as defined here; instead, the approach uses a flag‑controlled activation plus a temporary/mock Redis path to validate a new
AKS cluster with AGIC hits the Azure Application Gateway backend pool limit (100)
I’m writing this article to document a real-world scaling issue we hit while exposing many applications from an Azure Kubernetes Service (AKS) cluster using Application Gateway Ingress Controller (AGIC). The problem is easy to miss because Kubernetes resources keep applying successfully, but the underlying Azure Application Gateway has a hard platform limit of 100 backend pools—so once your deployment pattern requires the 101st pool, AGIC can’t reconcile the gateway configuration and traffic sto
Learn distributed systems by building real infrastructure on your laptop
DistSim is an open-source distributed systems simulator. Each machine is a real Docker container with Ubuntu, a terminal, and full
networking. You install services (Nginx, PostgreSQL, Redis, Kafka, etc.), configure them, connect them, and then break them with
chaos engineering.The motivation: I wanted to learn distributed systems hands-on but didn't want to pay for multiple VPS instances. Blog posts explain
concepts well but you never actually configure a load balancer, set up replication,
Show HN: I couldn't compare storage topologies without 3 forks, so I built this
I was reading Designing Data-Intensive Applications and kept wanting to run the examples, not just read them. I understand systems through code. So I started building one.Sandstore is a hyperconverged distributed file system in Go. Every node runs control plane, data plane, and Raft consensus together. BoltDB metadata, full POSIX semantics, 2PC chunk lifecycle, gRPC, Kubernetes. The problem I kept hitting was simpler than any of that: I wanted to compare this design against a disaggregated one u
Show HN: Open-source distributed quantum compute network
Hey HN. I'm Colton (YC S21, ex-Acorns), one of the founders of Postquant Labs. My cofounder Richard is a cryptographer out of Draper Labs and DARPA. We're building Quip.Network, the first distributed quantum compute network. We just opened our testnet and wanted to share it here.The basic problem: quantum hardware is here and already competitive on certain optimization problems, but for most people, there's no way to access it. The machines cost millions and the hardware and resea
Show HN: Vulnerabilities in a Multi-Million ARR Corp as 17(my 5-month journey)
HI I am Dhanush, I have an Hard tech infra to be future protocol ,that's all basically I am poor 17M self taught(by piracy) solo guy I made multiplayer 3d games and now I used Burp Suite on random to understand Communications to services when using a service from a company named "B"(an AI using company for neural phase locking) (they are multi-million ARR company premium only model with trials)I saw some problems here they areTechnical Findings:(All actions are for educational
Ask HN: Distributed data centers in our basements
This is likely a bit unrealistic, but why can't we make a half rack server to go in someones basement that can also heat up their hot water and use the basement floor as a heat sink as well?<p>It seems like a lot of the blight of data centers is the energy to remove the heat. By distributing them into cool basements and even connecting them into the home heating system we could reduce that making them more efficient.
Show HN: C4 – Separate file description from file content
C4 is a standardized identification system that separates file description from file content so they can be stored or transmitted independently. You can send a small text-based filesystem description by email while the file content is fulfilled by content ID from any source.Running c4 on any path produces a c4m file and stores the content: $ c4 ./projects/HERO/
drwxr-xr-x 2026-03-20T14:30:00Z 12058624 renders/
-rw-r--r-- 2026-03-20T14:30:00Z 4718 scripts/ren
KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure
This is the second post in the Ranking Engineer Agent blog series exploring the autonomous AI capabilities accelerating Meta’s Ads Ranking innovation. The previous post introduced Ranking Engineer Agent’s ML exploration capability, which autonomously designs, executes, and analyzes ranking model experiments. This post covers how to optimize the low-level infrastructure that makes those models run efficiently at scale. We introduce KernelEvolve, an agentic kernel authoring system used
Create, edit and share videos at no cost in Google Vids
<img src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/Createeditandshare_Google_Vids_.max-600x600.format-webp.webp">New AI capabilities are coming to Google Vids, powered by Lyria 3 and Veo 3.1, like high-quality video generation at no cost and more.
Why we're betting on Durable Sessions
Over the past year, I've spoken to more than 40 engineering teams building production AI agents. Different companies, different frameworks, different use cases. The same conversation kept happening."Our streams break when users switch tabs." "We can't tell if the agent crashed or is still thinking." "We built a custom reconnection layer and it took three months." "Our users can't switch from laptop to phone mid-conversation." Every team described it differently, but they were all describing the