PROOF Series № 1 — Forthcoming Q3 2026
A pre-registered quantitative study of 200 California law firms, tracked across ChatGPT, Perplexity, Google AI Overviews, and Claude over 60 days. The methodology is committed before the data is in. The result is publishable either way.
The question
AI engines retrieve content from a curated set of high-trust sources, weighted by entity consolidation, citation diversity, recency, and intent matching. The mechanics are increasingly well understood. What's less clear is whether the engines' RLHF training also penalizes content that pattern-matches to promotional or non-compliant legal marketing — and if so, by how much.
The premise of this study is that California firms whose public-facing content materially complies with Rule 7.1 are cited more frequently in AI-search answers than firms with documented compliance gaps, controlling for domain authority, content velocity, practice-area mix, and firm size. The premise is grounded in pilot observations and in the general direction of RLHF reward modeling, but it is a hypothesis, not a finding. The study tests it.
If the hypothesis is supported, firms have a market reason in addition to the regulatory reason to take Rule 7.1 seriously. If it's not, the null result is itself useful — it would refocus the practitioner conversation on the mechanics other than compliance. Either way, the paper publishes.
Study design at a glance
What's already published
Before data collection begins, the methodology is committed in writing and made public. That's the discipline that distinguishes pre-registered empirical research from post-hoc storytelling. Two documents are published as of May 14, 2026:
1. The Pre-Registration Document. Hypothesis, sample frame, inclusion and exclusion criteria, the prompt set, the analysis plan, the primary and secondary outcome measures, the falsification criteria, the power analysis, the timeline, and the ethics protocol. Read the pre-registration →
2. The Compliance Coding Rubric. The operational definition of "materially compliant" versus "materially non-compliant." Five flag categories with weights and exception criteria. Two independent coders apply this rubric to each firm's public-facing content; inter-rater reliability is reported. Read the coding rubric →
An optional pilot study (20 firms, 7 days, one practice area) may run between May and June 2026 to validate the methodology before the full data collection begins. Read the pilot study plan →
Timeline
Any deviation of more than 30 days will be publicly disclosed when known. The pre-registration is the binding methodology; if findings during data collection require methodology revisions, those revisions are versioned and disclosed in the final paper.
Get notified
The pre-publication mailing list goes out twice — once when the pilot results post (~July 2026), and once when the full paper publishes (~September 2026). No marketing, no upsell, no other emails between those two sends. The list is for researchers, peer reviewers, journalists covering legal-tech regulation, and operators inside law firms who want the data when it's ready.
Subscribe via the audit form on the contact page with the note "PROOF Series #1." If you have methodology questions, replication interest, or want to peer-review the pre-registration before data collection begins, email [email protected] directly.
Conflict-of-interest disclosure
The principal investigator (Shawn Lai) is the founder of WTT Digital, an agency that provides AI-search and compliance-review services to law firms. WTT Digital's commercial positioning would benefit from a finding that confirms the hypothesis over the null. This conflict is disclosed up front.
Mitigations: the methodology is pre-registered, so the analysis plan is fixed before any data is in; the analysis plan explicitly specifies what would falsify the hypothesis and commits to publishing the null result; compliance coding is performed by coders blind to the hypothesis; the raw dataset and analysis code will be published at the time of the paper, enabling independent replication. The full disclosure protocol is documented in Section 11 of the pre-registration.
Ready when you are
Free AI Visibility Audit. 60 seconds. See your citation gap before your competitors do.