Privacy model

No worker ever hears a full recording.

Before any audio reaches a worker, the coordinator slices it into 60-second fragments. Each fragment goes to a different machine. No single worker receives enough audio to reconstruct the original.

How sharding works

Dispatch without exposing the full picture.

How sharding protects you

  • A 30-minute meeting becomes roughly 30 one-minute shards.
  • Each shard includes only a small overlap for clean transcript stitching.
  • Adjacent shards are sent to different Macs, not the same machine twice in a row.
  • No worker hears enough contiguous audio to follow the full conversation.

Sharding diagram

Mac A
Mac B
Mac C
Mac D
Anti-correlation

The coordinator avoids adjacent shard reuse.

Even if a worker is compromised, it should only ever see non-adjacent slices. That makes reconstruction materially harder.

Why adjacency matters

  • Shard 12 and shard 13 together reveal more context than shard 12 and shard 27.
  • Scheduling spreads neighboring audio across different machines on purpose.
  • A worker can be accurate on its clip without learning what was said before or after.

Anti-correlation visualization

Shard Assigned worker
01 Mac A
02 Mac B
03 Mac C
04 Mac A
05 Mac D
06 Mac B
Validation

What the coordinator does to validate workers.

Hidden canaries and reputation scoring let HiveCompute validate output without assuming any worker is trustworthy on day one.

Double-check new workers

New workers start at zero trust. Their early output can be verified by a second independent worker before it is accepted.

Canary injection

The coordinator occasionally swaps in known-answer audio. Workers do not know when they are being tested, which makes gaming the system harder.

Reputation scoring

Workers build up from new to provisional to trusted to elite. Bad output lowers reputation and reduces future work.

Visibility

What a worker receives, and what stays hidden.

Workers see
Workers do not see
A single 60-second audio clip
Your name, company, or email
The language to transcribe
The full recording
A result endpoint for that clip
Other shards from your job
Only the current shard timing
The final merged transcript
Transport over HTTPS
Any account metadata about you
Comparison

How HiveCompute compares.

HiveCompute
OpenAI Whisper API
Self-hosted
Your audio goes to
Distributed Macs, one shard each
OpenAI servers, full file
Your own machine or cluster
Who can access it
No single worker hears the full recording
OpenAI systems handling the job
Your team and infrastructure
Cost
Around $0.003 per minute
Around $0.006 per minute
Hardware and ops overhead
Speed
Parallel shards across a fleet
Single hosted request path
Bound by your own capacity
Retention
Deleted after completion
30-day retention by default
You decide
FAQ

Direct answers to the obvious questions.

Can a worker reconstruct my full recording?

No. Shards are short, adjacent clips are intentionally split across different workers, and a worker does not receive the merged transcript.

What if a worker saves the audio?

The worker still only has a short clip with no identifying metadata. Hidden canaries and reputation scoring make low-quality or malicious behavior easier to detect and remove.

Is the transcript encrypted?

In transit, yes. Audio and transcripts move over HTTPS. At rest, the transcript sits in the coordinator database behind your API access controls.

Can I use this for HIPAA-regulated audio?

Not yet. The pilot is designed for teams that want lower cost and good privacy boundaries, but it is not represented as HIPAA-certified today.

Open source

The worker is open. Audit it yourself.

The local worker code is public. You can read exactly what runs on your machine — how it polls, what it sends back, and how audio is handled.

View source on GitHub →