Designing Dendrite: a Windows-native disk usage analyzer in Rust

Constraint

Why Windows, deliberately

Most disk-usage tools that exist already are cross-platform. They have to be: they were built by people who use Linux. Cross-platform on a tool like this means a portable file walker that works the same way on every operating system, which means doing the same work the OS shell does, and inheriting the same performance ceiling. On a multi-terabyte NTFS volume, that ceiling is low.

Dendrite is Windows-only on purpose. Once you give yourself permission to use the platform's actual file-system APIs — and, where available, NTFS-specific fast paths — the design space changes. The job stops being "iterate every directory entry" and starts being "ask NTFS for what you want". A tool that behaves the same on every OS is a tool that doesn't take advantage of any of them.

Form factor

Why a desktop app

Disk usage is local data. A web app would mean a daemon that scans, a server that serves the result, and a browser that renders it; all on the same machine. The web detour adds latency, removes context, and means you can't use your file manager's right-click menu against any of the files the tool surfaces.

A desktop app keeps the data and the UI in the same process. "Show me the biggest folder under C:\Users\…\AppData" is one click; "open it in Explorer" is the next. There is no API surface, no auth, no firewall rule, no remote attack surface. The tool runs locally because the data is local.

Scanning

The NTFS-preferred path

The single biggest performance lever on Windows is using the NTFS-specific enumeration paths instead of generic file walkers. Reading the Master File Table or the change journal can be orders of magnitude faster than walking directories one at a time, because you stop paying for per-entry kernel transitions and start reading metadata in bulk.

Two consequences:

The fast path needs administrator privileges. The app asks for them, and explains why. Without them, the scanner falls back to the portable path and the user gets a slower scan; nothing breaks, but you can feel the difference.
The fast path is volume-level, not directory-level. You scan a drive, then derive subtree views from that scan. That happens to be the right shape for a tool whose primary job is "show me where the space went on this drive".

"NTFS-preferred" is the right way to describe the choice: when NTFS is available and we have the privileges, we use it; otherwise we degrade gracefully.

Data structure

The memory-optimized tree

Once a scan finishes, the entire filesystem is in memory as a tree. That's the most consequential single decision in the program. Every view — searches, sorts, category breakdowns, biggest-files, biggest-folders, comparisons — is derived from this one tree. The tree is the model; everything else is a projection.

Naive trees explode in memory on real volumes. Multi-terabyte drives can hold millions of entries. So the tree is built to be cheap per node:

Names are interned where possible; common prefixes share storage.
Sizes and metadata pack into fixed-width fields, not boxed structs.
Children are stored contiguously, not as Vec<Box<Node>>.
The tree is built bottom-up so the size of every interior node is known when the node is created — no second pass.

The numerical goal is simple: a scan of a typical Windows volume fits in a fraction of system memory, with headroom to keep the UI responsive. That goal is what justifies all of the per-node fiddling.

Architecture

Orchestration / core / UI

The codebase is split into three layers:

Core — the scanner, the tree, the analyses. No UI, no orchestration. Pure functions over immutable inputs where possible.
Orchestration — the channel-based background-task layer. Scans, comparisons, exports, all run as background tasks and emit progress and results onto channels.
UI — a thin renderer over whatever the orchestration layer publishes. The UI never calls into the core directly; it asks orchestration to do something and waits for the result.

That split exists for one reason: it makes "keep the UI responsive" a property of the architecture, not a discipline I have to remember. The next post in this series gets into how the channels actually work.

Roadmap

What's next

Duplicate-file detection across the tree, with a content-hash pass that runs as a background task on demand.
An optional native VHDX export for "snapshot today's state, compare to it next month".
A polished "first scan" experience that explains the privilege prompt clearly.

None of these change the core data structure; they all consume it.

POST · WINDOWS DESKTOP

Designing Dendrite