Local models

Local transcription and summary models, clear boundaries.

HushMemo separates speech-to-text models from summary models. Transcription models turn audio into text; summary models turn reviewed transcripts into notes, TODOs, emails, and templates.

Transcription

Speech-to-text models

Choose a transcription model based on storage, speed, and accuracy. Smaller models are faster; larger models can handle more complex recordings with better results.

Fastest transcription

Whisper Tiny

Small speech-to-text model for quick drafts and short voice notes.

Size~75 MB

LanguagesMultilingual

SpeedFastest

DownloadManual

Best for: Quick thoughts, short reminders, rough drafts, and devices with limited storage.

Fast and compact, but less accurate with noise, accents, long meetings, or domain vocabulary.

Balanced transcription

Whisper Base

Balanced local transcription model for everyday recordings.

Size~142 MB

LanguagesMultilingual

SpeedFast

DownloadAutomatic

Best for: Meetings, calls, lectures, interviews, and voice notes where speed still matters.

A practical default after download. Important transcripts should still be reviewed before summarizing or sharing.

Higher accuracy

Whisper Small

Higher-accuracy local transcription model for longer and messier audio.

Size~466 MB

LanguagesMultilingual

SpeedSlower

DownloadManual

Best for: Longer meetings, lectures, interviews, mixed speakers, and recordings with more detail.

Uses more storage and processing time. Long recordings may still need section-by-section review.

Summaries

Summary and template models

After transcription, local summary models organize the reviewed text into structured outputs such as meeting notes, tasks, emails, and study notes.

Default recommended

Gemma 3 1B

Balanced local summary model for everyday notes.

Size529 MB

Context2,048 tokens

RuntimeGPU preferred

DownloadAutomatic

Best for: Meetings, quick summaries, TODO extraction, and short-to-medium transcripts.

Runs locally after download. Long recordings may need shorter summaries or chunked review.

Chinese preferred

Qwen 2.5 1.5B

Higher-quality Chinese summaries with a larger context window.

Size1.49 GB

Context4,096 tokens

RuntimeCPU preferred

DownloadManual

Best for: Chinese meetings, consultation notes, and templates with more structure.

Optional manual download. Larger file size means more storage and longer processing time.

High quality

Gemma 4 E2B

Opt-in high-quality model for longer context and complex templates.

Size2.41 GB

Context8,192 tokens

RuntimeGPU preferred

DownloadManual

Best for: Longer transcripts, deeper speech analysis, and multi-section templates.

Optional manual download. It can be slower and should be used on devices with enough free storage and RAM.

Offline boundary

What uses the network, and what does not.

Network access is mainly for model downloads, store licensing, and explicit website link-outs. After a model is downloaded, transcription and summaries run locally unless you export or share content yourself.

Model downloads Uses network

Recording and transcription Local after model setup

Summaries and templates Local after download

Export and share User initiated