Frequently Asked Questions

Everything you need to know about Verbatim Studio — from individual use to enterprise deployment.

Privacy & Data Security

No. Verbatim Studio runs entirely on your machine — transcription, OCR, speaker identification, search, and AI are all processed locally. Nothing is sent to external servers. You can optionally connect cloud storage providers (Google Drive, OneDrive, Dropbox, S3-compatible, Azure Blob, Google Cloud Storage, SMB, or NFS) to back up your files, but these use your own accounts and credentials. Even the built-in AI runs a local model on your hardware.

No. Verbatim Enterprise runs entirely on your infrastructure. Transcription, OCR, search, and diarization are all processed within your Docker containers. For cloud storage, you can configure S3-compatible or Azure Blob storage — but these are your accounts on your network. The only optional outbound call is the LLM integration, which can be pointed at a self-hosted model server (Ollama, vLLM, LocalAI) for fully air-gapped deployments.

Verbatim Enterprise is designed to support HIPAA-compliant deployments. All processing happens on your infrastructure with no external data transmission. Audit logging tracks every API request with user, action, timestamp, and IP address. JWT authentication is enforced on every endpoint. That said, HIPAA compliance depends on your overall infrastructure and organizational policies — Verbatim provides the technical controls, and your team manages the environment.

All stored credentials (cloud storage tokens, API keys) are encrypted using Fernet symmetric encryption (AES-128-CBC) before being written to disk. Encryption keys are stored in your operating system's native keychain (macOS Keychain or Windows Credential Manager), so they're protected by your OS-level security. No credentials are ever stored in plain text.

User passwords are hashed with Argon2, a memory-hard algorithm resistant to brute-force attacks. API keys use a vst_ prefix for easy identification and are stored as SHA-256 hashes — the raw key is shown exactly once at creation and cannot be retrieved afterward. Keys are scoped (read, write) and can be revoked instantly. All API key usage is captured in the audit log.

Yes, both editions work fully offline. The desktop app requires no internet after the initial model download — all models run locally. For Enterprise, transfer Docker images to your air-gapped network using docker save/load. Point the LLM integration at a self-hosted model server, and the entire stack operates without any internet connectivity.

Credentials and tokens are encrypted with AES-128 and stored via your OS keychain. Your transcripts and recordings are stored in a local SQLite database and file system — encryption at rest depends on your OS-level disk encryption (FileVault on macOS, BitLocker on Windows). We recommend enabling full-disk encryption for maximum protection.

Transcription & AI

Verbatim uses OpenAI's Whisper models. On macOS with Apple Silicon, transcription runs through MLX Whisper with Metal GPU acceleration for fast, native performance. On Windows, it uses CTranslate2 (faster-whisper) with NVIDIA CUDA acceleration. CPU transcription is always available as a fallback on both platforms. Enterprise uses the same Whisper engine inside Docker containers.

Five Whisper model sizes are available: tiny (71 MB), base (137 MB), small (~460 MB), medium (~1.5 GB), and large-v3 (~3 GB). The base model is bundled by default and offers a good balance of speed and accuracy for most use cases. Larger models improve accuracy — especially for accented speech, technical jargon, and noisy audio — but require more memory and processing time. You can switch models anytime in Settings.

Yes. On macOS, Apple Silicon GPUs are used via Metal acceleration (M1/M2/M3/M4 chips). On Windows, NVIDIA GPUs are used via CUDA. GPU acceleration significantly speeds up transcription — a one-hour recording might take 2–3 minutes with GPU versus 15–20 minutes on CPU. No special configuration is needed; Verbatim auto-detects your hardware and selects the optimal backend.

Whisper supports 99+ languages including English, Spanish, French, German, Japanese, Chinese, Arabic, Hindi, Portuguese, and many more. Language can be auto-detected or manually selected in Settings. Accuracy varies by language — English, Spanish, and other widely-spoken languages have the highest accuracy, while less-resourced languages may benefit from using larger model sizes.

Automatic speaker diarization uses pyannote.audio to identify who said what in multi-speaker recordings. After transcription, each segment is labeled by speaker. You can rename speakers (e.g., 'Speaker 1' → 'Dr. Martinez'), merge accidentally-split speakers, and edit assignments — all from the transcript editor. Diarization models are downloaded automatically on first use.

The desktop app includes a built-in local AI powered by Granite 3.3 (8B parameters) running through llama.cpp. It supports GPU acceleration on both Apple Silicon (Metal) and NVIDIA (CUDA). Features include transcript summarization, action item extraction, key point identification, and conversational Q&A with your transcripts. Everything runs locally — no API keys or internet needed.

Enterprise supports OpenAI, Anthropic (Claude), and any OpenAI-compatible API endpoint — including self-hosted options like Ollama, vLLM, and LocalAI. Configure your preferred provider and model directly from the admin settings. All LLM calls are made server-side from your backend container, so API keys never reach the browser. For fully air-gapped deployments, point at a model server on your own network.

Documents, Search & Export

Verbatim accepts all common audio formats (MP3, WAV, M4A, OGG, FLAC, AAC, WMA) and video formats (MP4, MKV, AVI, WebM, MOV). Audio is automatically extracted from video files before transcription. Just drag and drop files or use the file picker — no format conversion needed.

Verbatim's OCR engine (powered by Qwen2-VL, a 2B parameter vision model) processes PDFs, DOCX, XLSX, PPTX, plain text, Markdown, and images (PNG, JPEG, TIFF, WebP). Extracted text is indexed alongside your audio transcripts, so everything is searchable from a single search bar. OCR runs locally with GPU acceleration support on both Apple Silicon and NVIDIA hardware.

Verbatim provides both keyword search and semantic search. Keyword search finds exact matches across all transcripts and documents. Semantic search uses the nomic-embed-text-v1.5 model to understand meaning — so searching 'budget discussion' will find segments about 'financial planning' or 'cost review' even if those exact words weren't used. Both search modes work across transcripts and OCR-processed documents simultaneously.

Transcripts can be exported as plain text (TXT), subtitles (SRT, VTT for video captioning), Word documents (DOCX), and PDF. Exports include speaker labels and timestamps. Subtitle formats are compatible with YouTube, Vimeo, and other video platforms that accept caption files.

Storage & Infrastructure

All data is stored locally in a SQLite database on your machine. Transcripts, metadata, search indexes, and settings are all in this database. Media files (recordings, documents) are stored on your local file system by default. You can optionally configure cloud storage from Settings — Google Drive, OneDrive, Dropbox, S3-compatible, Azure Blob, Google Cloud Storage, SMB, or NFS — to sync your media files to your preferred provider.

Enterprise supports S3-compatible storage (AWS S3, MinIO, Backblaze B2, Wasabi, etc.) and Azure Blob Storage for media file storage. This offloads large audio and video files from your Docker host while keeping metadata in PostgreSQL. Configure storage providers from the admin settings panel — no manual configuration file edits required.

macOS with Apple Silicon (M1 or later) or Windows (x64). At least 4 GB of RAM is recommended, with 8 GB+ preferred for larger Whisper models. The base transcription model is ~137 MB; larger models range up to ~3 GB. GPU acceleration is automatic — no additional drivers or setup needed on macOS. On Windows, NVIDIA GPU users benefit from CUDA acceleration; the CUDA runtime is bundled with the app.

Docker Engine 24+ with Docker Compose v2 is the only requirement. The backend container needs at least 4 GB RAM (8 GB+ recommended for larger Whisper models). For GPU-accelerated transcription, an NVIDIA GPU with CUDA support and the nvidia-container-toolkit is required. The stack runs on Linux, macOS, and Windows (via WSL2/Docker Desktop). Apple Silicon Metal acceleration is not available in Docker.

Enterprise uses PostgreSQL for concurrent multi-user access. A PostgreSQL container is included in the Docker Compose stack for convenience, but you can point the application at any external PostgreSQL 16+ instance — managed services like AWS RDS, Google Cloud SQL, or Azure Database for PostgreSQL work perfectly. Database connection settings are configurable from the admin panel.

Yes. The Docker Compose stack maps directly to Kubernetes: each service (nginx, backend, PostgreSQL) becomes a Kubernetes deployment. Use persistent volume claims for PostgreSQL data and an external managed database for production. We provide the Docker images — orchestration is flexible to your infrastructure preferences.

Enterprise Teams & Administration

The first user to register becomes the admin. From the admin panel, you can create teams, assign roles (admin or member), approve pending user registrations, and manage access. New users who sign up are placed in a pending state until an admin approves them — no one gains access without explicit authorization.

Enterprise uses JWT-based authentication with short-lived access tokens and refresh tokens for seamless session renewal. Passwords are hashed with Argon2. Every API request is validated by authentication middleware before processing. SSO/SAML integration is on the roadmap for organizations that need centralized identity management.

Yes. Enterprise provides a full REST API covering recordings, transcripts, documents, teams, users, and admin functions. API keys with scoped permissions (read, write) let you integrate with CI/CD pipelines, internal tools, or custom applications. Webhooks deliver real-time event notifications (signed with HMAC-SHA256) to any HTTP endpoint — with automatic retries and exponential backoff for reliability.

Yes. Every mutating API request (creating, updating, or deleting any resource) is automatically logged with the user who performed the action, a timestamp, the endpoint accessed, and the originating IP address. Audit logs are accessible from the admin panel and can be used for compliance reporting, security reviews, and operational troubleshooting.

The backend is built on FastAPI with async I/O, backed by PostgreSQL for concurrent data access. A single backend container comfortably handles dozens of concurrent users. For larger organizations, scale horizontally by adding backend container replicas behind the included nginx reverse proxy — no architectural changes needed.

Licensing & Pricing

Yes. The desktop app is MIT-licensed, fully functional, and completely free. It includes live transcription, speaker identification, OCR, semantic search, local AI, nine cloud storage integrations, and every feature listed on this page. No account, no trial period, no internet required after setup. Enterprise is a separate product for organizations that need multi-user access, team management, API integrations, and centralized deployment.

Basic is a standalone desktop app for individual use — local SQLite database, local or personal cloud storage, built-in local AI, single user. Enterprise adds multi-user JWT authentication, team management with admin approval, PostgreSQL database, S3/Azure storage, REST API with scoped API keys, HMAC-signed webhooks, audit logging, admin dashboards, configurable LLM providers (OpenAI, Anthropic, self-hosted), and Docker-based deployment. Both editions share the same transcription engine, OCR, and search capabilities.

Enterprise licenses are signed JWTs containing your organization name, seat count, expiry date, and feature set. License validation is double-gated: first when pulling Docker images from our container registry, and again at runtime by the application middleware on every request. This ensures only licensed deployments can operate.

When a license expires, the system enters a 14-day grace period with read-only access — your team can still view and export existing content, but new recordings and uploads are blocked. After the grace period, all API requests are blocked. To renew, update your license key from the admin settings and restart the application. No data is ever deleted due to license expiry.

Yes. Contact us for a trial license with a limited seat count and duration. The trial is fully functional with no features restricted — it's the same product your team would use in production. This lets you evaluate the full deployment, test integrations, and verify it meets your requirements before purchasing.

Updates & Support

The app checks for updates automatically and notifies you when a new version is available. Updates are downloaded in the background and applied on next launch — one click, no manual steps. You can also download any version directly from our GitHub Releases page.

Pull the latest images and restart: docker compose pull && docker compose up -d. Database migrations run automatically on startup. You can pin to specific versions to control your rollout schedule. Rollbacks work the same way — pin the previous version and restart. No data migration scripts needed between versions.

The desktop app uses SQLite while Enterprise uses PostgreSQL, so there's no direct database migration. However, both editions use the same file formats for recordings and documents — you can re-import your media files into Enterprise and they'll be re-processed with the same transcription engine. Your content is never locked into either edition.

The desktop app is community-supported via GitHub Issues — bug reports, feature requests, and discussions are all welcome. Enterprise licenses include priority support with faster response times. Full documentation is available at verbatimstudio.app/docs covering setup, configuration, architecture, API reference, and troubleshooting.

Absolutely. The desktop app is MIT-licensed and contributions are welcome via GitHub pull requests. The core transcription engine, UI components, and backend are all open source. Enterprise is a separate, proprietary plugin — but the foundation it builds on is fully open.

Still have questions?

Check the docs or reach out — we're happy to help.