PROOF Series #1 — Pre-Registration
Cal Bar Rule 7.1 Compliance and AI-Search Citation Rates: A Quantitative Study of 200 California Law Firms
Pre-registration date: May 14, 2026 Principal investigator: Shawn Lai, Founder, WTT Digital (WTT Consulting LLC, DBA WTT Digital) Contact: [email protected] Pre-registration version: 1.0 Expected publication date: Q3 2026 (target: September 1, 2026)
Why this document exists
This is the pre-registration document for PROOF Series #1, the first quantitative research paper in WTT Digital's PROOF Series. It is published before data collection begins so that the methodology, hypothesis, sample frame, analysis plan, and primary outcome measures are publicly committed in advance of any results. This is the discipline that distinguishes legitimate empirical work from post-hoc storytelling.
If the methodology specified here proves flawed during data collection — for example, if a category of firms turns out to be unrepresentable in the sample frame — a revision will be published and timestamped. The current version stands until explicitly revised. Revisions after data collection begins will be logged and disclosed in the final paper.
1. Research question
Does material compliance with California Rules of Professional Conduct 7.1 and 7.4 correlate with AI-search citation rates for law firms?
The question is observational, not interventional. We are not asking whether changing a firm's compliance posture causes a citation-rate change — that would require a different study design. We are asking whether, holding other observable variables constant, compliant firms appear in AI-search answers more often than non-compliant firms.
The motivation is twofold. First, regulatory: if the answer is yes, firms have a market reason in addition to the regulatory reason to take Rule 7.1 seriously. Second, mechanistic: AI engines' RLHF training rewards conservative, accurate outputs, so a hypothesis that the engines under-cite promotional / non-compliant content has prior plausibility — but plausibility is not evidence.
2. Hypothesis
Primary hypothesis (H1): California law firms classified as materially compliant with Rule 7.1 (Bin A per the compliance coding rubric, version 1.0) are cited in AI-search answers at a higher rate than firms classified as materially non-compliant (Bin C), after controlling for domain authority, content velocity, practice-area mix, and firm size.
Null hypothesis (H0): No statistically significant difference in citation rate between Bin A and Bin C firms after controls.
Pre-specified effect size of interest: A 15-percentage-point difference in citation share-of-voice is the smallest difference we consider materially meaningful. The power analysis below targets detection of this effect size or larger.
Direction: The hypothesis is directional (Bin A > Bin C). Statistical tests are one-sided at α = 0.05.
3. Sample frame
3.1 Population definition
The target population is law firms with active California State Bar registration, a public-facing website on a primary domain (i.e., not a subdomain of an aggregator), and at least one office located in California, as of the data collection start date (target: July 1, 2026).
3.2 Sample size and stratification
The full study sample is 200 firms, drawn from the population above and stratified as follows:
- 100 firms classified as Bin A (materially compliant) per the coding rubric v1.0
- 100 firms classified as Bin C (materially non-compliant) per the coding rubric v1.0
Bin B firms (mixed signals) are excluded from the primary analysis but coded and reported descriptively.
Within each Bin, firms are stratified across:
- Practice area (5 strata, 20 firms each per Bin): personal injury, family law, business litigation, immigration, real estate / landlord-tenant
- Firm size (3 strata applied within each practice-area stratum, not strictly balanced): solo / small (1–5 attorneys), mid-size (6–25 attorneys), large (26+ attorneys)
- Region (qualitative balance): Bay Area, Los Angeles County, Orange County, San Diego County, Central Valley, other
The stratified design ensures the comparison is not confounded by practice-area or firm-size distribution differences between Bin A and Bin C.
3.3 Sampling procedure
Candidate firms are drawn from the following sources, in priority order:
- California State Bar member directory — searched by city, practice area, and firm size attributes where available
- Justia California lawyer directory — used to identify firms with both a State Bar match and a public website
- Avvo California listings — used as a backup source to identify firms not surfaced by the above
For each candidate firm, the screening process applies in order:
- Confirm active California State Bar registration
- Confirm a public-facing primary website (not a subdomain, not a parked page)
- Confirm at least one California office address
- Apply the coding rubric to determine Bin A / B / C
- If Bin A or Bin C, add to the candidate pool
Firms are added to the final sample by random selection within each (Bin × practice area) stratum until the per-stratum quota is met. Random selection uses a documented seeded random number generator; the seed is committed before sampling begins and disclosed in the final paper.
3.4 Exclusion criteria
A candidate firm is excluded if any of the following apply:
- The firm's website is non-functional or returns a 4xx/5xx for the homepage at the time of sampling
- The firm appears on the California State Bar's public list of disciplined attorneys with active suspensions or disbarments at the time of sampling
- The firm has a recorded Cal Bar advertising-rule complaint or sanction filed in the 90 days prior to sampling (these firms may be in active remediation and their content may change during the study window)
- The firm is owned by the principal investigator, a relative, or a current or past WTT Digital client
- The firm requests removal from the study (firms are notified of inclusion within 30 days of the publication of preliminary results; pre-publication removal requests are honored)
4. Independent variable: Compliance classification
The independent variable is the Bin A / Bin C classification per the Compliance Coding Rubric v1.0, published separately at wtt.digital/proof/cal-bar-citation-study/coding-rubric.
Each firm is coded by two independent coders blind to the study hypothesis. Disagreements are resolved by a third coder (adjudicator). Inter-rater reliability (Cohen's kappa) is calculated and reported in the final paper. Pre-study target: kappa ≥ 0.70. If pilot-phase kappa falls below 0.65, the rubric is revised before full-study coding.
Coding is performed once per firm at the start of data collection (target: July 1, 2026). Firms whose public content changes substantially during the data collection window are flagged and reported in a sensitivity analysis, but their original coding is retained as the primary classification.
5. Dependent variable: Citation rate
5.1 Definition
Citation rate is the proportion of prompts, across a defined prompt set, for which the firm's name appears in the AI engine's generated answer or in the cited-sources list accompanying the answer.
A firm is "cited" in a given prompt-engine-day observation if any of the following holds:
- The firm's name is mentioned in the generated answer text (including alternate spellings, common abbreviations, and verified Chinese-language variants for firms with documented Chinese names)
- The firm's primary domain appears in the cited-sources list (where the engine provides one)
- A third-party source describing the firm (e.g., the firm's Avvo profile, Justia listing) appears in the cited-sources list, AND the source describes the firm specifically (not the firm as one of many in a list)
A citation is recorded as a binary observation per (firm, prompt, engine, day) tuple.
5.2 AI engines included
Four engines are tested:
- ChatGPT (web-enabled mode, GPT-5 / GPT-4o per OpenAI's production default at the time of each query)
- Perplexity (default model, "Search" mode enabled)
- Google AI Overviews (US English, signed-out browser session, queries via google.com)
- Claude (claude.ai web interface, default model with web search enabled per Anthropic's production default)
Note: Article #1 referenced "Google AI Overviews" as the fourth engine; this pre-registration substitutes Claude for the broader picture. Gemini, originally considered, is dropped from the primary analysis because Gemini's Chinese-language behavior is unstable across testing dates; Gemini results are reported descriptively as a robustness check but not used for the primary hypothesis test.
5.3 Prompt set
The prompt set comprises 50 prompts, distributed across practice areas and intent types as follows:
- Per practice area (5 practice areas × 8 prompts): 40 prompts
- 2 prompts: high-intent buyer queries ("best personal injury lawyer San Francisco")
- 2 prompts: sub-intent buyer queries ("personal injury lawyer San Francisco rear-end collision")
- 2 prompts: language-specific buyer queries (e.g., Cantonese, Mandarin, Spanish, where relevant to California demographics)
- 2 prompts: informational queries within the practice area ("statute of limitations personal injury California")
- Cross-practice / generalist prompts: 10 prompts
- "Best California law firm for [generic legal need]"
- "Find a lawyer in [California city]"
- "Top-rated California attorneys for [demographic / category]"
The full prompt list is committed in advance and frozen at the start of data collection. Mid-study modifications are not permitted; if a prompt produces unparseable results consistently, it is excluded from analysis with the exclusion logged.
5.4 Data collection cadence
Each prompt is run on each engine once per day, for 60 consecutive days. Total observations: 200 firms × 50 prompts × 4 engines × 60 days = 2,400,000 observation cells, of which the binary citation outcome for each cell is the unit of analysis.
Actual observation count is the number of cells where (a) the engine returned a usable response, (b) the response was parseable for firm name extraction. Cells where the engine returned an error or a refusal are logged but excluded from rate calculation.
Queries are submitted via the engines' standard interfaces (or API where the API result is materially equivalent to the web interface, validated via paired testing). All queries originate from US-based IP addresses with a signed-out browser session (or API equivalent) to minimize personalization confounds.
6. Control variables
The primary analysis controls for the following observable confounders:
- Domain authority — Semrush Authority Score at the start of data collection
- Backlink profile size — number of referring domains per Semrush
- Content velocity — number of pages on the firm's primary domain (excluding boilerplate)
- Practice-area mix — handled by stratification (see Section 3.2)
- Firm size — number of attorneys per Cal Bar registration, handled by stratification
- Years in operation — calculated from State Bar registration date of the founding attorney (proxy)
- Multilingual content presence — binary indicator for whether the firm publishes content in any non-English language
All controls are measured once at the start of data collection (target: July 1, 2026). Changes during the 60-day window are not tracked for control variables (covariates) but are tracked for the independent variable (compliance classification) via the sensitivity analysis noted in Section 4.
7. Analysis plan
7.1 Primary analysis
The primary outcome is the firm-level citation rate aggregated across all (prompt, engine, day) cells for that firm.
A linear regression model is fit:
citation_rate_i = β0 + β1 × compliance_bin_i + β2 × controls_i + ε_i
Where compliance_bin_i is a binary variable (1 if Bin A, 0 if Bin C; Bin B firms excluded from primary analysis), and controls_i is the vector of control variables listed in Section 6.
The test of H1 is whether β1 > 0 with p < 0.05 (one-sided test). The pre-specified effect size of interest is β1 corresponding to a ≥15-percentage-point absolute citation-rate difference.
7.2 Secondary analyses
Per-engine analysis. The same regression is fit separately for each of the four engines. We pre-specify that the headline finding requires consistency across at least three of the four engines (i.e., directional β1 > 0 with p < 0.10 on at least three engines). A finding driven by a single engine is reported but not described as the primary result.
Per-practice-area analysis. The same regression is fit separately within each of the five practice-area strata. Practice-area-specific findings are reported with the caveat that within-stratum sample sizes (20 Bin A vs. 20 Bin C per practice) limit power for any individual stratum.
Per-flag-category analysis. Whether specific flag categories from the rubric drive the citation-rate effect more than others. For example, do firms flagged primarily on Flag 1 (specialization claims) show different citation behavior than firms flagged primarily on Flag 3 (predictive outcomes)? This is exploratory.
Sensitivity to Bin definition. The primary analysis uses the rubric's Bin cutoffs (0–8 = A, 25–80 = C). Sensitivity analysis: re-run the primary analysis using continuous compliance score instead of binary Bin classification, and using alternative cutoffs (e.g., 0–12 vs. 18–80). Robustness of the headline result to cutoff choice is reported.
7.3 Pre-specified subgroup analyses
- Bilingual firms (English + Chinese). Given Article #2's observation that AI engines retrieve from native-language content for non-English queries, bilingual firms may show different citation patterns. Reported separately if N ≥ 20 in each Bin.
- Wix-platform firms vs. WordPress/custom platforms. Given the audit pattern of Wix-platform firms having structural constraints, this confounds compliance scoring with platform choice. Reported separately as a sensitivity check.
7.4 What would falsify the hypothesis
The primary hypothesis is falsified if:
- β1 is non-significant (p > 0.05) across the primary analysis AND across at least three of four engines, OR
- β1 is directionally negative (Bin C firms cited more than Bin A firms), even if non-significant, OR
- The effect size β1 is positive and significant but corresponds to less than 5 percentage points in absolute citation-rate terms, in which case the effect is real but practically trivial
In any of these cases, the finding is reported as null or negative, and the paper's framing reflects that. The paper publishes either way.
8. Power analysis
Assumed parameters:
- Effect size of interest: β1 corresponds to a 15-percentage-point absolute citation-rate difference between Bin A and Bin C
- Within-group standard deviation of citation rate: estimated from CCD audit baseline and from prior informal testing, ~20 percentage points
- Number of citation observations per firm: 50 prompts × 4 engines × 60 days = 12,000 binary observations per firm; high precision per-firm rate estimate
- Significance level: α = 0.05 (one-sided)
- Desired power: 0.80
Under these assumptions, the required sample size per Bin to detect a 15-percentage-point effect with 80% power is approximately 30 firms per Bin. The actual sample size of 100 per Bin is well-powered for the primary test and supports the per-practice-area secondary analyses (n=20 per Bin per practice gives roughly 50% power for the same effect size — adequate for exploratory but not confirmatory).
A formal G*Power calculation is documented in the supplementary materials.
9. Data management
9.1 Data collection infrastructure
Prompt runs are executed via a documented Python pipeline that:
- Queries each engine through its standard interface (web automation for engines without stable APIs, official API where available and equivalent)
- Captures the full response text and the cited-sources list (where the engine provides one)
- Stores raw responses in a versioned data store with timestamps
- Runs the firm-name extraction routine described in Section 5.1 against the raw responses
- Outputs the binary citation observation per (firm, prompt, engine, day) tuple
The pipeline code is published to a public repository at the time of the paper's publication, alongside the raw dataset (with firm-identifying information per Section 11 redactions).
9.2 Data integrity protections
- All raw query responses are stored verbatim and hash-stamped
- The firm-name extraction routine is validated against a hand-coded sample of 500 responses; precision and recall metrics are reported
- Any modification to the extraction routine after data collection begins triggers a re-run on all prior data; the change is logged
10. Timeline
- May 14, 2026: Pre-registration v1.0 published
- May 14 – June 30, 2026: Sample frame construction, coding, inter-rater reliability check on pilot (n=20)
- July 1 – August 29, 2026: Data collection (60 days)
- August 30 – September 15, 2026: Analysis and writing
- Target publication: September 1 – September 15, 2026
Any deviation from this timeline of more than 30 days will be publicly disclosed at the time the delay is known.
11. Ethics, disclosure, and firm-identification
Firms are not anonymized in the published paper, with one exception (see below). Public-facing law firm content is, by definition, public. Disclosing firm names allows replication and accountability — a paper claiming "10% of California PI firms violate Rule 7.1" without naming any of them is unfalsifiable. Firms in the study can verify their inclusion and their coding.
Exception: Bin C firms (materially non-compliant) are offered a 30-day pre-publication notification window. They are notified of inclusion and provided their coding rationale. They may:
- Accept inclusion (default if no response)
- Request anonymization in the final paper (granted on request, with the anonymous data still included in aggregate analyses but not in firm-named tables)
- Request exclusion from the study (granted; their data is removed and a replacement firm is sampled per the random selection procedure)
This notification protocol is added because Bin C firms may face professional consequences from being publicly named in a research paper about Cal Bar violations, even though the underlying content is itself public. The compromise preserves the study's epistemic value (the analysis still runs) while allowing affected firms to opt out of being named.
Anonymization is NOT offered to Bin A firms. Being identified as a "materially compliant" firm in this study is positive, not adverse.
Conflict of interest: Shawn Lai is the founder of WTT Digital, an agency that sells AI-search and compliance-review services to law firms. The agency's commercial positioning would benefit from a finding of H1 over H0. This conflict is disclosed up front. Mitigations:
- The methodology is pre-registered, so the analysis plan is fixed before results are known
- The analysis plan specifies what would falsify H1 (Section 7.4) and commits to publishing the null result
- Coding is done by coders blind to the study hypothesis
- The raw dataset is published at the time of the paper, enabling independent replication
12. Peer review pathway
The paper is intended for submission to a peer-reviewed venue at the intersection of law and marketing or computational social science. Target venues include:
- Journal of Empirical Legal Studies
- Stanford Law and Policy Review
- ACM CHI (Computer-Human Interaction) — relevant for the AI-system behavior component
A preprint is published at wtt.digital/proof/cal-bar-citation-study/preprint at the same time as the formal submission, regardless of which venue accepts the paper.
Methodology questions, replication interest, or peer-review feedback can be sent to [email protected] before, during, or after the study.
13. Funding and acknowledgments
This study is self-funded by WTT Consulting LLC. No external grants or sponsor relationships apply. The data collection infrastructure cost (API fees, compute) is estimated at $2,000–4,000 and is borne entirely by the principal investigator. No firms in the study were compensated for inclusion.
14. Pre-registration commitments — summary checklist
- [x] Hypothesis specified before data collection
- [x] Sample frame and selection procedure documented
- [x] Coding rubric published as separate artifact (v1.0)
- [x] Prompt set committed and described
- [x] Primary outcome measure defined
- [x] Statistical analysis plan specified including pre-specified effect size of interest
- [x] Falsification criteria stated
- [x] Power analysis documented
- [x] Timeline committed with disclosure requirement for delays
- [x] Ethics protocol (firm notification, opt-out for Bin C) documented
- [x] Conflict of interest disclosed and mitigated
- [x] Raw data publication commitment made
- [x] Analysis code publication commitment made
This document timestamps the pre-registration at May 14, 2026. Substantive revisions after this date will be versioned, dated, and disclosed in the final paper.
Pre-registration document is published as a permanent artifact at wtt.digital/proof/cal-bar-citation-study/pre-registration. The companion coding rubric is at wtt.digital/proof/cal-bar-citation-study/coding-rubric. Both are part of the PROOF Series, the research arm of WTT Digital's VERDICT methodology.