Constraint
Why Windows, deliberately
Most disk-usage tools that exist already are cross-platform. They have to be:
they were built by people who use Linux. Cross-platform on a tool like this
means a portable file walker that works the same way on every operating system,
which means doing the same work the OS shell does, and inheriting the same
performance ceiling. On a multi-terabyte NTFS volume, that ceiling is low.
Dendrite is Windows-only on purpose. Once you give yourself permission to use
the platform's actual file-system APIs — and, where available, NTFS-specific
fast paths — the design space changes. The job stops being "iterate every
directory entry" and starts being "ask NTFS for what you want". A tool that
behaves the same on every OS is a tool that doesn't take advantage of any of
them.
Form factor
Why a desktop app
Disk usage is local data. A web app would mean a daemon that scans, a server
that serves the result, and a browser that renders it; all on the same
machine. The web detour adds latency, removes context, and means you can't
use your file manager's right-click menu against any of the files the tool
surfaces.
A desktop app keeps the data and the UI in the same process. "Show me the
biggest folder under C:\Users\…\AppData" is one click; "open it in
Explorer" is the next. There is no API surface, no auth, no firewall rule, no
remote attack surface. The tool runs locally because the data is local.
Scanning
The NTFS-preferred path
The single biggest performance lever on Windows is using the NTFS-specific
enumeration paths instead of generic file walkers. Reading the Master File
Table or the change journal can be orders of magnitude faster than walking
directories one at a time, because you stop paying for per-entry kernel
transitions and start reading metadata in bulk.
Two consequences:
- The fast path needs administrator privileges. The app asks for them, and explains why. Without them, the scanner falls back to the portable path and the user gets a slower scan; nothing breaks, but you can feel the difference.
- The fast path is volume-level, not directory-level. You scan a drive, then derive subtree views from that scan. That happens to be the right shape for a tool whose primary job is "show me where the space went on this drive".
"NTFS-preferred" is the right way to describe the choice: when NTFS is
available and we have the privileges, we use it; otherwise we degrade
gracefully.
Data structure
The memory-optimized tree
Once a scan finishes, the entire filesystem is in memory as a tree. That's the
most consequential single decision in the program. Every view — searches, sorts,
category breakdowns, biggest-files, biggest-folders, comparisons — is derived
from this one tree. The tree is the model; everything else is a projection.
Naive trees explode in memory on real volumes. Multi-terabyte drives can hold
millions of entries. So the tree is built to be cheap per node:
- Names are interned where possible; common prefixes share storage.
- Sizes and metadata pack into fixed-width fields, not boxed structs.
- Children are stored contiguously, not as
Vec<Box<Node>>.
- The tree is built bottom-up so the size of every interior node is known when the node is created — no second pass.
The numerical goal is simple: a scan of a typical Windows volume fits in a
fraction of system memory, with headroom to keep the UI responsive. That goal
is what justifies all of the per-node fiddling.
Architecture
Orchestration / core / UI
The codebase is split into three layers:
- Core — the scanner, the tree, the analyses. No UI, no orchestration. Pure functions over immutable inputs where possible.
- Orchestration — the channel-based background-task layer. Scans, comparisons, exports, all run as background tasks and emit progress and results onto channels.
- UI — a thin renderer over whatever the orchestration layer publishes. The UI never calls into the core directly; it asks orchestration to do something and waits for the result.
That split exists for one reason: it makes "keep the UI responsive" a property
of the architecture, not a discipline I have to remember. The next post in this
series gets into how the channels actually work.
Roadmap
What's next
- Duplicate-file detection across the tree, with a content-hash pass that runs as a background task on demand.
- An optional native VHDX export for "snapshot today's state, compare to it next month".
- A polished "first scan" experience that explains the privilege prompt clearly.
None of these change the core data structure; they all consume it.
Related
Related reading