// Margin notes

Scribbles

Passages from papers I've been reading. Highlighted, underlined, argued with. Not summaries — just the lines that made me stop.

// Paper Cast-R1 · Tao et al. arXiv:2602.13802

In practice, effective time series forecasting rarely follows a single-pass procedure. Instead, experienced practitioners treat forecasting as a sequential decision process. They examine historical patterns and contextual information, identify informative features, select forecasting models, and reason over intermediate results to assess forecast reliability. As new contextual evidence emerges, predictions are often revised, highlighting that high-quality forecasting involves a series of interdependent decisions rather than a one-shot model inference.

→ this is literally just describing what analysts do. why has nobody built this before
Every architecture paper I've read in the last three years optimises the model. This one says the model isn't the bottleneck — the decision process around it is. If that's right, we've been solving the wrong problem.
this one kept me up. contaminated test sets are everywhere in this literature
// Paper Zhang & Zhang · Hedge Fund LLM Review arXiv:2605.05211

LLMs pretrained on internet-scale text corpora have likely been exposed to financial news, analyst reports, and market commentary from periods that overlap with academic test sets. This creates a form of data leakage that is distinct from look-ahead bias and considerably harder to detect. The model may pattern-match on absorbed associations between companies, sectors, and outcomes — associations that constitute implicit knowledge of the test period — without any single piece of future information being directly accessible. Reported directional accuracy figures may be substantially inflated as a result.

The insidious thing is you can't fix this with a train/test split. The contamination happened during pretraining, before you ever touched the dataset. Standard backtesting hygiene doesn't catch it. You'd need to know exactly what was in the pretraining corpus and when — which nobody publishes.
// Report Flagright · AI and the Future of AML Compliance flagright.com · 2026

AI-powered solutions reduce false positives by 90–95%, automate labor-intensive compliance tasks, and detect sophisticated money laundering patterns in real time. Financial institutions using AI for AML achieve faster detection, lower compliance costs (from $180+ billion annually), and better regulatory outcomes.

← vendor report so grain of salt — but the direction is right
90–95% is a wild number. The question is what the baseline false positive rate was — AML systems are famously terrible so even a 70% reduction would be transformative. Need to find primary source on this. Also: reducing false positives is only half the problem. False negatives are the ones that get you fined.
// Paper Park · Reflexivity as Prompt arXiv:2606.00061

Standard AI-assisted forecasting treats the market as an exogenous system. Reflexivity theory holds otherwise: prices shape fundamentals, and every forecaster is a participative agent in the loop it analyzes. We evaluate three frontier models — GPT-5, Claude Sonnet 4.6, and Gemini 3 Pro — under four accumulating zero-shot conditions across two historically distinct episodes: the dot-com bubble (1996–2001) and the global financial crisis (2004–2009).

→ Soros in a prompt. of course someone did this
The framing is genuinely interesting — most forecasting papers pretend markets are physics. This one asks what happens when you give the model a theory of its own participation in the thing it's predicting. Whether the improvement is real or an artefact of the LLM recognising the narrative shape of a boom-bust cycle from training data is a separate question. Writing this one up properly.
// Paper Khanvilkar et al. · Regulatory Graphs and GenAI arXiv:2506.01093

The system constructs dynamic transaction graphs, extracts structural and contextual features, and classifies suspicious behavior using a graph neural network. A retrieval-augmented generation module generates natural language explanations aligned with regulatory clauses for each flagged transaction. Experiments conducted on a simulated stream of financial data show that the proposed method achieves superior results, with 98.2% F1-score, 97.8% precision, and 97.0% recall.

← simulated stream. always simulated.
The RAG-to-regulatory-clause piece is the actually novel bit here — generating an explanation that cites the specific rule being violated rather than just flagging anomalous behavior. That's what a compliance officer actually needs to file a SAR. 98.2% F1 on synthetic data tells you almost nothing about real transaction networks but the architecture direction is right.
"pilot purgatory" — best phrase I've read this year
// Report BCG · For Banks, the AI Reckoning Has Arrived bcg.com · 2025

Most banks are stuck in what the industry has come to call "pilot purgatory" — running dozens of isolated experiments that never scale. The conventional wisdom says the only way out is "rip and replace" transformation: tear out the legacy core, rebuild from scratch, accept 18-month procurement cycles and eight-figure budgets. But this narrative is both paralyzing and wrong. Banks that cannot reason and act in real time across the entire customer journey will not merely fall behind — they risk becoming operationally irrelevant.

→ the rip-and-replace story is used to justify inaction. most useful compliance AI doesn't need greenfield infra
Only 1 in 4 banks is actively using AI to gain competitive advantage per BCG's own survey. The rest are running the same three POCs they started in 2022. The bottleneck isn't technology — it's that compliance teams still have to submit IT tickets to get data access and wait months.
// Report McKinsey · State of AI Trust 2026 mckinsey.com · Mar 2026

Only about one-third of organizations report maturity levels of three or higher in strategy, governance, and agentic AI governance. This imbalance suggests that while technical and risk management capabilities are advancing, organizational alignment and oversight structures are struggling to keep pace with the rapid expansion of AI use. Security and risk concerns are the top barrier to scaling agentic AI. Inaccuracy and cybersecurity remain the most frequently cited AI risks as adoption expands.

← deploying models faster than building governance. that ends badly in a regulated industry.
In finance the consequences of miscalibrated compliance AI aren't just operational — they're regulatory. A model that generates a wrong SAR narrative doesn't waste analyst time. It can constitute a filing failure. Institutions are building the car before the brakes.
// Report Norton Rose Fulbright · AML Enforcement on the Rise amlnetwork.org · 2025

Global AML fines hit $10.4 billion in 2024, surpassing previous records set in 2023. Enforcement is projected to exceed $15 billion in 2026. Norton Rose Fulbright's Boon predicts focus on AI misuse in laundering. Boon states: "Regulators will demand explainable AI in AML systems."

→ $15B in fines and the industry is running rules written in 2003. the business case writes itself.
The explainability demand is the one that matters most architecturally. You can build a GNN that detects suspicious patterns with 98% precision on synthetic data — but if you can't tell the regulator why it flagged the transaction in language that maps to a specific rule, it's useless in a real compliance context. Detection is the easy part. Explanation is the product.
// Report Capco · AI Transforming Payments & Financial Crime Monitoring capco.com · Feb 2026

Most institutions still run separate stacks and teams for fraud, AML and sanctions. Fraud and sanctions screening typically happen pre-transaction while AML often triggers post-event. Tools remain predominantly rule-based: banks must pre-define scenarios, which leads to poor detection of emerging patterns and a constant trade-off between missing risk (false negatives) and overwhelming operations teams (false positives).

← three separate systems, three separate teams. criminals operate across all three simultaneously.
Fraud typologies that start as card fraud become AML concerns when funds are moved, then sanctions issues if they hit a designated jurisdiction. A model that only sees one slice of the transaction lifecycle misses this by design. The architecture argument for unification is clear. The organisational argument is what nobody wants to have.
compliance cost of the compliance AI — nobody models this
// Analysis Convergences · The Compliance Bill for AI in Asset Management convergences.substack.com · Mar 2026

Building a compliant AI governance program is not a technology project. It is an organizational transformation. The most important thing to understand about AI regulation in financial services is that it is simultaneously more complex and more imminent than most firms appreciate. The EU AI Act classifies AI systems used for credit scoring as "high risk" and introduces additional safeguards — with full application expected by August 2026.

→ "not a technology project" is doing a lot of work here. most firms are treating it as one.
EU AI Act, DORA, GDPR, and SR 11-7 model risk guidance all apply simultaneously to a large asset manager deploying LLMs in compliance workflows. None were written with each other in mind. The compliance cost of the compliance AI is real and almost nobody is accounting for it in ROI calculations.