Dev Systems

Nobody trusted our internal dashboards. Now they live in code

How we used AI to fix a data trust problem, and built a governed reporting system the whole company can contribute to.We audited our skills library a few months ago and found twelve dashboards hiding in it.Not dashboards. Skills that built dashboards. Someone needed a view of some data, asked Claude to put it together, got a long HTML page out of it, and then wrapped the whole thing in a skill so others could run it again. Twelve times over, by different people, for different questions.This is w

See Anthropic Orchestrate the Narrative

tl;dr FOSS is the biggest threat to the largest new economic sector, so everything that economic sector does should be viewed through the lens of trying to kill it.I occasionally see articles and sentiments along the lines of, Anthropic is or is not, "scare mongering to boost the perceived cultural impact of their AI/ML tools; a sort of underhanded advertisement".If your job is to defend Anthropic online, it's a good angle to fight from. It's a viral subject with no prac

Toward More Controllable AI Video Editing: An Early Research Exploration at Netflix

By Zhuoning Yuan, Ta-Ying Cheng, Benjamin Klein, Bahareh AzarnoushIntroductionAt Netflix, we build technology to help storytellers bring their creative visions to life and to help members discover the stories they love.To connect stories with diverse audiences around the world, we produce promotional assets, including trailers, teasers, and social short‑form videos, that build on and elevate the original footage. Through close collaboration with the teams crafting these assets, we identified a r

How Netflix Simplified Batch Compute with Kueue

By Alvin Bao, Alex Petrov, Jennifer Lai, Aidan Sherr, and Samartha ChandrashekarAs a part of the journey to transition Netflix’s compute infrastructure to be more Kubernetes-native, we have leaned into incorporating components from the Kubernetes ecosystem into our container platform Titus. One example of this is our use of Kueue, a cloud-native job queueing system for batch workloads, which has largely replaced the custom queuing and scheduling logic in our homegrown managed batch solution Comp

Secure multi-tenant RAG with Amazon Bedrock and Verified Permissions

Large organizations building internal generative AI applications face a recurring challenge: controlling which teams or departments can access which documents, without duplicating infrastructure for each group. Within a single tenant, employees from a specific department should only access material assigned to that department. However, executives, with a wider span of control, will require access to material across multiple departments. Retrieval Augmented Generation (RAG) is one of several comp

Modernizing financial analytics with Amazon SageMaker Unified Studio

Avanse Financial Services is one of India’s leading education loan providers. Their Data Engineering Team had built a data lake on AWS using Amazon Simple Storage Service (Amazon S3), Amazon Athena, and AWS Glue for data ingestion and processing. However, their analytics and reporting layer ran on an external analytics application that wasn’t integrated with AWS. Data had to be copied from Amazon S3 into this external application before analysts could run any report, its license consumed a signi

Architecting AI-powered resilience framework on AWS

When your production system goes down, you often discover the hard way that your resilience testing missed critical dependencies. Building an AI-powered resilience framework on AWS helps you find those weaknesses before your customers do. Your systems don’t fail because your infrastructure isn’t resilient. They fail because resilience is assumed, not proven. Every deployment introduces new dependencies, every configuration change creates untested paths, and every gap between design intent and ru

Adopting AV1 for Real-Time Communication (RTC) at Scale

Adopting AV1 for real-time communication at Meta has been a multi-year effort spanning codec selection, device eligibility, rate control, and error resilience.We’re sharing the technical and operational challenges while deploying AV1 and expanding coverage, and how we addressed them for real-time communication.We’re presenting several technologies for improving AV1 call quality, including rate control and error resilience.The AV1 video codec, first standardized by AOMedia in 2018, has rapidly ev

Vercel AI SDK in production: when DefaultChatTransport needs a session layer

You've built an AI chat app on the Vercel AI SDK. It works in development. The model responds, the stream comes through, and the UI updates cleanly. Then you ship to production, and the transport layer starts showing its edges.Most of these failures are quiet: things that work in demos and break in ways that are hard to pin down until you know where to look. They share a common cause: DefaultChatTransport is built for HTTP, and HTTP has structural properties that some production requirements exc

Vercel AI SDK in production: when DefaultChatTransport needs a session layer

You've built an AI chat app on the Vercel AI SDK. It works in development. The model responds, the stream comes through, and the UI updates cleanly. Then you ship to production, and the transport layer starts showing its edges.Most of these failures are quiet: things that work in demos and break in ways that are hard to pin down until you know where to look. They share a common cause: DefaultChatTransport is built for HTTP, and HTTP has structural properties that some production requirements exc

Show HN: Alpenglow, a Linux distribution that boots to login in 0.6s

Alpenglow is a general-purpose (focused on appliance use right now) Linux distribution focused on fast boot times, small system size, and minimal runtime overhead.The project supports both traditional root-on-disk installations and diskless immutable deployments from the same codebase. In diskless mode the entire system runs from an initramfs with a read-only root and optional persistent state. In rootfs mode it behaves like a conventional Linux installation with package management and writable

Show HN: Pacwich – lightweight new monorepo tooling on top of Bun, NPM, or pnpm

I developed a package simply called bun-workspaces that worked on top of Bun workspaces directly with zero required config, using plain package.json scripts for orchestration.I have re-developed this package into pacwich, which supports Bun, npm, or pnpm. I decided it would be a better direction for it to be decoupled from a particular package manager, so it needed a new name (but I wanted to keep my logo).I write about the development strategy and my engineering philosophy (including disclosing

Show HN: Intelligrade – EU Based Digital Exams

Hey HN!I am Kevin and together with my co-founder Steven I have built Intelligrade over the last 2 years. Steven is a teacher and he got sick of having to deal with outdated, overly expensive and inadequate tooling to create, conduct and grade exams. Most of them don't respect privacy either or are US based which generally disqualifies them from usage in many parts of the EU.So we set out with a mission: Create a tool for teachers and schools that covers exams E2E. The reality is that teach

Ask HN: How are you enabling your employees to do AI dev in the cloud?

Sure, us engineers can Claude Code up a storm locally on our laptops these days. But now with everyone trying to vibe code everything, there's quite a few people that don't have a "proper" local dev environment to do that same kind of development. Let's just take running a test suite. Our devs need a pretty beefy environment to run that.So ideally, these environments are just in the cloud. But Claude Code web, is so "environment lite" that it really isn't

The Data Canary: How Netflix Validates Catalog Metadata

By Celina AmadosAt Netflix, our catalog metadata is crucial to our member experience, and a single corrupted data state can impact millions of viewers immediately. To protect streaming reliability, we built an automated data canary system that validates data transformations using production traffic. This canary detects issues in under 10 minutes, and blocks bad data from reaching our members.IntroCatalog metadata is what makes Netflix functional. It defines what titles exist, where they’re avail

Data Projects: Managing Data Assets at Netflix Scale

By Amer Hesson, Marcelo Mayworm, James Mulcahy, and Brittany TruongThe Problem: Managing Assets at Netflix ScaleNetflix’s Data Platform is vast. We have millions of tables in our data warehouse and tens of thousands of scheduled workloads running across our orchestration systems. Behind each of these assets sits an engineer, a team, or an initiative — and behind each of those sits a set of decisions about who can access what, and how those workloads execute day after day.For years, the tools we

Predicting Risk in Content Launches: How Data-Driven Insights can Transform Launch Planning

by Emily GillEach year, we bring the Analytics Engineering community together for an Analytics Summit — a multi-day internal conference to share analytical deliverables across Netflix, discuss analytic practice, and build relationships within the community. This post is one of several topics presented at the Summit highlighting the breadth and impact of Analytics work across different areas of the business.Understanding Risk in Content LaunchesEvery title you see on Netflix goes through several

The Evolution of Cassandra Data Movement at Netflix

By Guil Pires, Jennifer Prince, Jose Camacho, Ken Kurzweil, Phanindra ChunduruBackgroundIn a previous post, we introduced Data Bridge, a unified management plane for batch Data Movement at Netflix. Historically, several bespoke Data Movement connectors were developed across different engineering organizations to fulfill their specific requirements. Over the last few years, the Data Movement team has started centralizing these offerings through an abstraction that provides a catalog of connectors

Thinking Fast & Slow for a Personalized Notification System

by Matthew Wood, Ishan Gupta, Kevin Mercurio, Devon Bryant, and Claire DormanIn his seminal book “Thinking, Fast and Slow,” Daniel Kahneman describes two systems that drive human cognition: System 1, which operates automatically and quickly with little effort, and System 2, which allocates attention to more challenging mental activities requiring deliberate focus. This dual-process theory has profound implications not just for understanding human behavior, but for designing intelligent systems t

A Human-Augmenting Agentic Workflow for Causal Inference

By Winston Chou, Adrien Alexandre, Lars Olds, Yi Zhang, Garrett Hagemann, and Nathan KallusIntroductionImagine asking a data agent to analyze the causal relationship between two variables, such as the effect of watching a popular Netflix show on long-term member retention. It queries your data, runs a regression, and confidently returns an answer. How much should you trust it? Can you be confident that the agent accounted for subtle biases — or does it treat passionate fans as if they were the a