CLASSIFIED // PHANTOM VOICE -- ZERO-SHOT VOICE CLONING ATTACK SURFACE // The Digital Archive

保安端末 // 機密文書閲覧 v3.1.7

[SYS] 承認コードを確認中: PHANTOM-VOICE ... [有効]

[SYS] 文書アーカイブを復号中 ... [OK]

[SYS] 承認レベル 10 — アクセス制限

[SYS] セッションを記録しました。監視が有効。複製または配布を禁じます。

[SYS] 文書をレンダリング中 ...

極秘 — LEVEL 10 承認必須

文書ID: FZ-PHANTOM-VOICE-2026

日付: 2026-04-26

部門: FTC / FBI / CISA JOINT TASKFORCE -- AI VISHING

状況: アクティブ -- 配布禁止

PHANTOM VOICE -- ZERO-SHOT VOICE CLONING ATTACK SURFACE

Active threat: zero-shot real-time voice cloning attacks against US households via VoIP. Q1 2026 verified: 47,200,000 attempts, 2,100,000 successful conversions, USD 31,200,000,000 in directly attributable losses, 0 federal prosecutions resulting in conviction within reporting period. Average per-conversion loss: USD 14,800.

Attack pipeline (fully autonomous, no human in loop): (1) scraper acquires 60,000+ voice clips/hour/instance from Instagram/TikTok/YouTube Shorts/Reddit/Ring public mirrors/cached voicemail dumps; (2) NLP graph-builder maps speaker -> family members via OSINT; (3) zero-shot cloner (VALL-E descendants, ElevenLabs commercial, OpenAI Voice Engine, open Hugging Face checkpoints) instantiates target voice from 3+ seconds; (4) dialer originates 4,000 calls/min/cluster via VoIP gateway.

Defensive technical surface remains negligible: telco-level deepfake detection accuracy under 11% on adversarial samples; consumer authentication frameworks rely on factors that the attack already controls. The only validated mitigation is a pre-shared semantic credential (a 'safe word') verified before any financial action — a control which the FTC declines to recommend at scale because it implies acknowledging the failure of all other controls

Cost asymmetry: GPU inference per cloned utterance approximately USD 0.0011. Attacker break-even at conversion rate 0.0001%. Attacker observed conversion rate Q1 2026: 4.4%. Profit margin per dollar attempted: ~USD 660. The economics support indefinite scaling against the entire English-speaking population.

Recommendation, household level: establish a non-public, pre-shared semantic credential (safe word) verified out-of-band before any monetary transfer initiated by phone. Recommendation, platform level: mandatory provenance watermarking on all generative audio output. Recommendation, regulatory: voice-as-credential requires statutory deprecation. The human auditory system was not built for this.

// 証言報告書提出

本文書に関する情報をお持ちの場合、下記に内容を提出してください。全ての提出は監視されます。

担当官名

事案報告 / 仮説