Privacy & compliance

PII masking

Definition

PII masking is the practice of replacing personal identifiers such as names, emails, phone numbers and credit cards with placeholders before the data ever leaves the visitor's browser. In session replay and analytics tools, masking is the primary control that keeps recordings compliant with GDPR, CCPA and PCI-DSS.

Also called: data masking, PII redaction, input scrubbing

What does PII masking actually replace?

PII masking intercepts text content in the DOM before it is serialized into a session recording or event payload and substitutes a placeholder. The replacement happens client-side, so the raw value never reaches the analytics server. Typical placeholders are asterisks (********), block characters, or a length-preserving hash.

The categories most tools mask by default:

Form input values (<input>, <textarea>, <select>)
Text content inside specific elements marked with a class or attribute
Whole subtrees flagged as "sensitive"
Network request bodies and URL query strings

What is not masked by default in most tools: text rendered into divs, image alt text, tooltip content, and anything written to the DOM after the masker has finished walking the tree.

Which inputs are masked out of the box?

Reasonable defaults across the industry — including rrweb-based tools — automatically mask:

Input type	Why
`type="password"`	Always sensitive
`type="hidden"`	May contain CSRF tokens or user IDs
Inputs with `autocomplete="cc-number"`	Credit card primary account numbers
Inputs with `autocomplete="cc-csc"`	Card security codes
Inputs with `autocomplete="one-time-code"`	OTP / 2FA codes

Everything else — including email addresses, names, addresses and phone numbers — typically requires explicit configuration. That gap is where most accidental PII leaks happen.

Selector-based masking vs regex masking

There are two strategies for masking everything beyond the defaults.

Selector-based masking marks specific DOM nodes as sensitive using a class, attribute, or CSS selector. The recorder checks each node against the selector list and masks the matching subtrees.

<input name="email" data-mask="true" />
<div class="lt-mask">{{ user.full_name }}</div>

Pros: deterministic, fast, easy to audit. Cons: requires you to know in advance which fields are sensitive.

Regex masking scans text content for patterns that look like PII (email regex, Luhn-valid card numbers, E.164 phone numbers) and redacts matches.

Pros: catches PII in fields you forgot to tag. Cons: false positives are constant — order numbers that look like phone numbers, support tickets that contain example emails, internal IDs that pass Luhn. Regex masking is also expensive on large recordings and can miss split tokens (a card number rendered across two adjacent spans).

Most production setups use selector-based masking as the primary defense and regex as a safety net for free-text fields like search bars and comments.

GDPR Article 4(1) defines personal data as any information relating to an identified or identifiable natural person. That definition is broader than most engineers expect:

An IP address is personal data
A device fingerprint is personal data
A user-agent combined with a timestamp can be personal data
A pseudonymous user ID linked to an account is personal data

Masking the obvious fields (name, email, phone) is necessary but not sufficient. You also need to consider what data is implied by the recording: a user's billing ZIP code rendered in their account settings is personal data even if it is not in a form field.

Recital 26 explicitly says pseudonymisation does not take data outside the scope of GDPR. Masking reduces risk; it does not exempt you from controller obligations.

When does PII masking break?

Common gotchas worth flagging:

Autofill: the browser populates an input after the masker walks the tree. Selector masks usually catch this because they fire on input mutations; regex masks may miss it.
Server-side rendering: PII baked into the initial HTML payload (e.g. a user's name in a header) is in the DOM before any JavaScript runs, including the recorder's masker. The mask must apply on the initial snapshot, not just on subsequent mutations.
Shadow DOM: components inside closed shadow roots are invisible to most maskers.
Canvas and SVG: text rendered as graphics rather than DOM nodes cannot be masked by DOM walkers.
Iframes: cross-origin iframes are inaccessible; same-origin iframes need explicit recursion.
Copy-paste: a user pasting their card number into a search box that you didn't think of as sensitive.

The defensive default is "mask everything, then opt-in to capturing specific safe fields" rather than "capture everything, then opt-out of sensitive ones."

How it relates to CloseTrace

CloseTrace masks all input fields, password fields and any element marked with data-lt-mask by default, applies the mask before serialization so raw values never touch the network, and gives you a per-project allowlist for fields you explicitly want unmasked. PII redaction is on by default, not an upsell.

Related terms

Session replay

Session replay is a technique that reconstructs a website visitor's browsing session by recording DOM mutations, inputs, and network events as a serialized event stream — then replays them inside a sandboxed iframe. It is not a video recording, which keeps payloads much smaller than equivalent screen video.

rrweb

rrweb (record and replay the web) is an open-source MIT-licensed JavaScript library that records DOM mutations, user inputs, and network metadata as a serialized event stream and replays them inside a sandboxed iframe. Created by Yuyz0112, it powers the session replay features of PostHog, Sentry, OpenReplay, Highlight, and CloseTrace.

Form abandonment rate

Form abandonment rate is the percentage of visitors who start filling out a form but never submit it, calculated as (forms started − forms submitted) / forms started × 100. A form is considered started the moment the visitor focuses any field, and benchmark studies place the cross-industry average between 67 and 81 percent.