A deepfake-enabled social-engineering attack follows a recognizable chain. First, the attacker selects a target organization and identifies a high-trust impersonation target, typically an executive or finance officer whose voice and face are publicly available through earnings calls, conference talks, podcast appearances, or company videos. As little as three seconds of clean audio is sufficient to produce a convincing voice clone with modern tools, and video deepfakes require only modest amounts of source footage.
Second, the attacker builds a pretext that fits the impersonated person's role. The most common pattern is a confidential, time-sensitive financial transaction, an acquisition, a settlement, a regulatory matter, that justifies bypassing normal approval procedures and asking the employee not to discuss it with colleagues. This pretext exploits two psychological levers simultaneously: authority (the request comes from someone senior) and urgency (there is no time for normal verification).
Third, the attacker initiates contact, often starting with text-based channels and escalating to voice or video when the target hesitates. The Hong Kong Arup attack and the Singapore variant that followed both used this pattern: an initial email raised suspicion, the target asked for verification, and the attacker offered a video call. The deepfake on the video call provided the verification the target was looking for, which is precisely why it was effective. The defensive instinct that should have caught the attack was the very thing the attacker had prepared for.
Fourth, the actual exploit happens after the deepfake has established trust. The target executes the requested wire transfer, shares credentials, or grants access. Funds move quickly through correspondent banks and money mules, and by the time the fraud is discovered, the money is typically beyond recovery. In the Arup case, 15 separate transactions totaling $25 million were processed before the fraud was identified.