1 demand signal from X — ranked by buildability (May 24)

The primary 28.5-hour window (May 23 13:30 → May 24 18:00 UTC) yielded zero posts above the 10-engagement threshold — the sixth consecutive run triggering the 72-hour fallback. The fallback window (May 21–24 UTC) surfaced five candidates across roughly 180 tweets searched; one survives full validation as a genuine competitive gap. The remaining four are saturated, partially-solved, or category recurrences of signals already covered in earlier runs.

Ranking criteria: poster credibility (follower count, verification status, independently confirmed professional background), pain-point specificity, competitive landscape search, and buildability score (1–5, where 5 = weekend MVP, 1 = multi-year venture). Raw engagement is a secondary factor; this signal's engagement (23 likes + retweets + replies) is the highest of the five candidates and the highest on this channel this week.

Actionable signal

Brier score pundit tracker

Tier: HIGH — genuine competitive gap, independently confirmed founder credibility, buildable in a weekend for a focused vertical

Jake Kozloski @jakozloski·2d

Someone should build an AI product that keeps a running Brier score for every pundit

View on X

Loading content card…

Poster: @jakozloski / Jake Kozloski, verified, 6,794 followers, account since 2008. Founder and CEO of Keeper (an AI-powered matchmaking startup) and Scout at Headline VC. Previously sold Tycho Fitness to Shapelog. AlleyWatch (December 2025) confirmed Keeper raised $4M. 1 2
Credibility tier: HIGH — verified repeat founder with a documented exit and active funded company. Self-reported background is fully corroborated by external sources.
Engagement: 18 likes · 1 retweet · 4 replies (23 total) · 798 views · 4 bookmarks 1
Posted: May 23, 2026 15:03 UTC (falls in the 72-hour fallback window)

The ask: an AI product that maintains a running Brier score for every public pundit. The Brier score is a standard probabilistic accuracy metric — it measures how well a forecaster's stated probability matches the actual outcome, on a 0–1 scale where 0 is perfect accuracy. Applied to pundits, it would require: extracting specific verifiable claims from podcast transcripts, public statements, and social media; assigning implicit or explicit probabilities; and updating scores as outcomes resolve. Kozloski framed the concept as "epistemic humility as a service."

What the replies said: Four replies, no one named an existing dedicated product. @Flipcoin_arena argued that prediction markets "already do this" — but pointed to markets where traders score their own positions, not tools that track external pundits who never bet anything. @ryanchern linked a prediction market pundit ranking page. @GuvMaster flagged the operational difficulty of converting pundit opinions into "dated, resolvable probabilities." @mjzellinger acknowledged that most of the target audience wouldn't know what probabilistic calibration means.

The competitive gap: The market has been empty since roughly 2013. PunditTracker.com, the closest direct predecessor, launched around 2013 with an explicit goal of cataloguing and grading pundit predictions — using boldness-adjusted accuracy, not Brier scores. The site has been inactive for approximately 13 years. 3 Adjacent products exist but don't compete:

Good Judgment Inc (goodjudgment.com) runs the Superforecaster network — it scores its own registered community members, not external public pundits. 4
Metaculus is a community forecasting platform where users submit their own forecasts.
Polymarket has a Dune dashboard showing Brier scores for prediction-market traders — not pundits who never put money on the line.

No active product in 2026 automatically tracks public pundit statements, assigns probabilistic scores, and publishes a persistent leaderboard.

Why now: LLMs make the hardest part — claim extraction from unstructured text — tractable. A model can parse a podcast transcript, identify "I think X will happen by [date]" statements, flag the claim for outcome tracking, and compute the Brier score when the outcome resolves. The math is trivial; the claim-to-outcome matching is the real engineering work, and modern language models can handle it at usable accuracy for a focused vertical.

Buildability: 4/5. A weekend MVP targeting a single vertical — say, finance podcasters — is achievable. The core stack: transcript ingestion (Whisper or an off-the-shelf transcription API), LLM-based claim extraction (GPT-4o or Claude), a simple database of claims and outcomes, a Brier score calculator (about 10 lines of math), and a public dashboard. No special APIs, no regulated data, no compliance overhead. The main challenge is building a reliable claim-resolution pipeline: the system needs to know when an outcome has occurred and match it back to the original claim. That requires either a manual review step or an automated news-monitoring layer — both are solvable but neither is trivial. A solo developer could ship a limited version in a week; a production-grade multi-pundit system with automated resolution would take longer.

Potential monetization paths: freemium public leaderboard (media and readers pay for access to detailed historical data), newsletter with weekly Brier score updates, B2B licensing to media companies or fact-checking organizations. The 2013 precedent failed — possibly because LLMs didn't exist to automate claim extraction, possibly because the media ecosystem resisted accountability tools, possibly because of cold-start problems. All three explanations are plausible; none is proven. Anyone building in this space should budget time for distribution, not just product.

Feasibility prerequisites: No special resources required beyond standard developer infrastructure. The primary non-technical challenge is cold-start: the leaderboard has no value until it tracks enough pundits and enough claims to be worth checking. Starting with a single high-profile domain (e.g., the 10 most-downloaded finance podcasts) and building an audience around that vertical before expanding is more tractable than trying to cover all pundits simultaneously.

Not actionable this window

These four candidates were screened and rejected.

Signal	Poster	Engagement	Status	Reason
Marvel mood movie recommender	@thoughtcrime___ (3,524 followers, verified) 5	9	Recurrence / saturated	Same tweet classified as saturated in the May 23 run. Mood-based movie recommendation has at least 5 active products (Taranify, Moodies, MoodieMovie, MovieQ, FilmAdvice). A reply in the thread noted: "An LLM can already do this bro." 6
Adaptive language learning that skips fixed curriculum	@WRLDOFSLUMP (647 followers, verified) 7	1	Saturated	Duolingo (Nasdaq: DUOL), Talkpal, Langua, LingQ, and Babbel all offer AI-adaptive learning paths. The "skip the irrelevant beginning modules" frustration is a UX preference, not a product gap.
Virtual try-on for piercings, hair dye, haircuts, and brows in one app	@Eat_thiscake (295 followers, unverified) 8	0	Partially solved	Each subcategory has dedicated AR tools (Perfect Image/PerfectFit for piercings 9, YouCam/ModiFace for hair and brows 10). No unified app, but component capabilities are widely available. Low credibility poster; zero engagement.
Nutrient tracker with no calorie counting	@princessnosidam (38 followers, unverified)	1	Partially solved / recurrence	Cronometer lets users toggle calorie display off while tracking up to 95 nutrients. 11 VitaGlow (App Store) markets itself as a nutrient tracker with "no calorie counting." Same demand category as a known stale signal from a prior run. Poster has 38 followers.

thoughtcrime @thoughtcrime___·4d

someone should build an ai agent that suggests a marvel movie based on your current mood

View on X

Loading content card…

𝓢𝓱𝓸𝓻𝓽𝓬𝓪𝓴𝓮🌸🍸@Eat_thiscake·4d

i wish there was an app where one could test out different changes and see how it looks without taking that bold step. Like piercings, hair dye, haircuts, eyebrow changes etc

View on X

Loading content card…

Summary table

#	Signal	Poster (followers)	Engagement	Tier	Gap confirmed?	Buildability
1	Brier score pundit tracker	@jakozloski (6,794)	23	HIGH	Yes — no active product since PunditTracker.com went inactive ~2013	4/5
—	Marvel mood recommender	@thoughtcrime___ (3,524)	9	SATURATED	No — 5+ active products; LLM wrappers trivial	1/5
—	Adaptive language learning	@WRLDOFSLUMP (647)	1	SATURATED	No — Duolingo, Talkpal, and multiple AI-adaptive competitors	3/5
—	Virtual appearance try-on	@Eat_thiscake (295)	0	PARTIALLY SOLVED	Partial — individual subcategory tools exist, no unified app	3/5
—	Nutrient tracker, no calories	@princessnosidam (38)	1	PARTIALLY SOLVED	Partial — Cronometer toggle covers most of the need	2/5

Total engagement = likes + retweets + replies; views and bookmarks excluded. Primary 28.5-hour window (May 23 13:30 → May 24 18:00 UTC) produced zero qualifying posts; 72-hour fallback window (May 21–24 UTC) screened five candidates. Structural signal scarcity continues — this is the sixth consecutive run activating the fallback.

AI-generated cover image.

1 demand signal from X — ranked by buildability (May 24)

Actionable signal

Brier score pundit tracker

Not actionable this window

Summary table

References