Local models

Local transcription and summary models, clear boundaries.

HushMemo separates speech-to-text models from summary models. Transcription models turn audio into text; summary models turn reviewed transcripts into notes, TODOs, emails, and templates.

Transcription

Speech-to-text models

Choose a transcription model based on storage, speed, and accuracy. Smaller models are faster; larger models can handle more complex recordings with better results.

Fastest transcription

Whisper Tiny

Small speech-to-text model for quick drafts and short voice notes.

Size~75 MB
LanguagesMultilingual
SpeedFastest
DownloadManual

Best for: Quick thoughts, short reminders, rough drafts, and devices with limited storage.

Fast and compact, but less accurate with noise, accents, long meetings, or domain vocabulary.

Balanced transcription

Whisper Base

Balanced local transcription model for everyday recordings.

Size~142 MB
LanguagesMultilingual
SpeedFast
DownloadAutomatic

Best for: Meetings, calls, lectures, interviews, and voice notes where speed still matters.

A practical default after download. Important transcripts should still be reviewed before summarizing or sharing.

Higher accuracy

Whisper Small

Higher-accuracy local transcription model for longer and messier audio.

Size~466 MB
LanguagesMultilingual
SpeedSlower
DownloadManual

Best for: Longer meetings, lectures, interviews, mixed speakers, and recordings with more detail.

Uses more storage and processing time. Long recordings may still need section-by-section review.

Summaries

Summary and template models

After transcription, local summary models organize the reviewed text into structured outputs such as meeting notes, tasks, emails, and study notes.

Default recommended

Gemma 3 1B

Balanced local summary model for everyday notes.

Size529 MB
Context2,048 tokens
RuntimeGPU preferred
DownloadAutomatic

Best for: Meetings, quick summaries, TODO extraction, and short-to-medium transcripts.

Runs locally after download. Long recordings may need shorter summaries or chunked review.

Chinese preferred

Qwen 2.5 1.5B

Higher-quality Chinese summaries with a larger context window.

Size1.49 GB
Context4,096 tokens
RuntimeCPU preferred
DownloadManual

Best for: Chinese meetings, consultation notes, and templates with more structure.

Optional manual download. Larger file size means more storage and longer processing time.

High quality

Gemma 4 E2B

Opt-in high-quality model for longer context and complex templates.

Size2.41 GB
Context8,192 tokens
RuntimeGPU preferred
DownloadManual

Best for: Longer transcripts, deeper speech analysis, and multi-section templates.

Optional manual download. It can be slower and should be used on devices with enough free storage and RAM.