CLASSIFIED // PHANTOM VOICE -- ZERO-SHOT VOICE CLONING ATTACK SURFACE // The Digital Archive

보안 터미널 // 기밀 문서 뷰어 v3.1.7

[SYS] 인가 코드 확인 중: PHANTOM-VOICE ... [유효함]

[SYS] 문서 아카이브 복호화 중 ... [OK]

[SYS] 인가 등급 10 — 접근 제한

[SYS] 세션 기록됨. 모니터링 활성. 복사 또는 배포 금지.

[SYS] 문서 렌더링 중 ...

기밀 — LEVEL 10 인가 필요

문서 ID: FZ-PHANTOM-VOICE-2026

날짜: 2026-04-26

부서: FTC / FBI / CISA JOINT TASKFORCE -- AI VISHING

상태: 활성 -- 배포 금지

PHANTOM VOICE -- ZERO-SHOT VOICE CLONING ATTACK SURFACE

Active threat: zero-shot real-time voice cloning attacks against US households via VoIP. Q1 2026 verified: 47,200,000 attempts, 2,100,000 successful conversions, USD 31,200,000,000 in directly attributable losses, 0 federal prosecutions resulting in conviction within reporting period. Average per-conversion loss: USD 14,800.

Attack pipeline (fully autonomous, no human in loop): (1) scraper acquires 60,000+ voice clips/hour/instance from Instagram/TikTok/YouTube Shorts/Reddit/Ring public mirrors/cached voicemail dumps; (2) NLP graph-builder maps speaker -> family members via OSINT; (3) zero-shot cloner (VALL-E descendants, ElevenLabs commercial, OpenAI Voice Engine, open Hugging Face checkpoints) instantiates target voice from 3+ seconds; (4) dialer originates 4,000 calls/min/cluster via VoIP gateway.

Defensive technical surface remains negligible: telco-level deepfake detection accuracy under 11% on adversarial samples; consumer authentication frameworks rely on factors that the attack already controls. The only validated mitigation is a pre-shared semantic credential (a 'safe word') verified before any financial action — a control which the FTC declines to recommend at scale because it implies acknowledging the failure of all other controls

Cost asymmetry: GPU inference per cloned utterance approximately USD 0.0011. Attacker break-even at conversion rate 0.0001%. Attacker observed conversion rate Q1 2026: 4.4%. Profit margin per dollar attempted: ~USD 660. The economics support indefinite scaling against the entire English-speaking population.

Recommendation, household level: establish a non-public, pre-shared semantic credential (safe word) verified out-of-band before any monetary transfer initiated by phone. Recommendation, platform level: mandatory provenance watermarking on all generative audio output. Recommendation, regulatory: voice-as-credential requires statutory deprecation. The human auditory system was not built for this.

// 목격 보고서 제출

본 문서와 관련된 정보가 있을 경우, 아래에 진술을 제출하십시오. 모든 제출은 모니터링됩니다.

요원 명칭

사건 보고서 / 이론