Unresolved problems in identity, reputation, and curation for trustworthy AI infrastructure

Billy’s recent essays outline a thesis: AI is creating a triple crisis—intelligence scarcity, curation collapse, and trust vacuum—and Intuition is infrastructure for the latter two. But theses invite challenges. This thread is for stress-testing these ideas.


The Claims

1. Curation is the new scarcity. Content generation costs → 0. Discovery remains medieval. The bottleneck shifts from “what exists” to “what deserves attention.” Whoever controls surfacing controls reality.

2. Stake is the last honest signal. When AI can generate infinite fake reviews, fake engagement, fake consensus—economic commitment becomes the only signal that’s expensive to fake. Backing your claims with something you can lose — that’s the root of credibility.

3. Decentralization is epistemologically necessary. Centralized reputation = single point of epistemic failure. Robust knowledge requires multiple independent sources, cross-validated, where no single observer can corrupt the whole.

4. The trust stack is unsolved. Agents → MCP Servers → Tools → Data Sources. Without identity and reputation at every layer, trust collapses—and the window to build this stack before AI development accelerates beyond control is closing.


The Open Problems

1. The Plutocracy Problem

Billy argues: “Rich people already control curation invisibly. Intuition makes it legible.”

But legibility isn’t neutrality. A whale with $10M can still stake more than 10,000 people with $1K each. Yes, their judgment becomes visible. Yes, bad judgment eventually costs them. But “eventually” might be long enough to manipulate outcomes that matter.

Question: Is stake-weighted curation just plutocracy with extra steps? What mechanisms actually prevent capital concentration from dominating signal?

Possible directions:

  • Quadratic staking (square root of stake = influence)

  • Reputation-weighted stake (track record multiplier)

  • Time-weighted conviction (longer hold = more weight)

  • Social graph weighting (trusted sources count more)

What’s the right design? What are we missing?


2. The Cold Start Problem

Reputation systems need history. New entities have none. This creates a bootstrapping paradox:

  • New MCP servers can’t get trusted without reputation

  • They can’t build reputation without being trusted

  • Early movers accumulate permanent advantage

  • Innovation gets penalized

Question: How do you bootstrap trust for new entities without recreating the incumbent advantages we’re trying to escape?

Possible directions:

  • Inheritance from known developers/organizations

  • Provisional trust with higher monitoring

  • Staking bonds that new entities post as collateral

  • Reputation escrow from established vouchers


3. The Context Collapse Problem

Billy writes: “An MCP server that’s trustworthy for calendar queries isn’t necessarily trustworthy for financial transactions.”

But current reputation systems collapse context. A five-star rating doesn’t tell you what it’s five stars for. Even domain-specific reputation (e.g., “trusted for security auditing”) might not capture the granularity needed.

Question: How granular does contextual reputation need to be? How do you query “trusted for X in context Y at stake level Z” without the complexity becoming unusable?


4. The Sybil Problem for Agents

When agents can be spun up cheaply, traditional sybil resistance breaks. Proof-of-personhood doesn’t apply. Proof-of-stake helps but doesn’t prevent well-funded attackers from creating fleets.

Question: What does sybil resistance look like for autonomous agents? Is identity even the right frame, or do we need something else entirely?


5. The Recursive Trust Problem

“Show me all MCP servers vouched for by entities I trust” sounds clean. But:

  • What if your trusted entities are wrong?

  • What if trust networks become echo chambers?

  • What if sophisticated attackers build long reputations before striking?

Trust graphs can have failure modes that flat ratings don’t. They can also propagate errors through networks.

Question: How do you build trust traversal that’s robust to both isolated bad actors AND coordinated long-game attacks?


6. The Oracle Problem for Curation

Prediction markets have resolution—the event happens or it doesn’t. Curation markets don’t. “Is this valuable?” has no external oracle.

There’s no oracle that resolves to ‘valuable.’ These are subjective, context-dependent, continuous assessments.

But if there’s no resolution, what prevents curation markets from becoming pure speculation disconnected from actual value? What makes stake accumulation meaningful rather than just a popularity contest with money?

Question: Without external resolution, what mechanisms keep curation markets grounded in actual quality rather than reflexive speculation?


7. The Speed Problem

Agents operate in milliseconds. Human curation operates in hours or days. If an MCP server needs to query reputation before executing, and reputation is built from human observation…

Question: Can human-generated reputation signals operate at agent speed? Or do we need agent-generated reputation—and if so, how do we trust the agents doing the evaluating?


The Meta-Question

All of the above assumes stake-weighted knowledge graphs are the right primitive.

What if they’re not?

What alternative architectures for trust infrastructure should we be considering? What are the blind spots in the current approach?


The Invitation

If you’ve thought about mechanism design, epistemic security, identity systems, or agent architectures—these problems need your attention.

What breaks? What holds? What are we not seeing?


Source Material

  • AI Credit Wars - Jan 25, 2026

  • InfoFi Is Dead. Long live InfoFi. - Jan 26, 2026

  • We’re Giving AI the Keys Before We’ve Built the Locks - Jan 27, 2026

  • Read the full articles here https://x.com/0xbilly/articles

1 Like

On plutocracy: I don’t think stake is meant to decide truth, it’s meant to price conviction with downside. A whale staking $10M and a thousand people staking $1K are making different statements, but neither should be “the answer” on their own. The important shift vs today is legibility. Capital already shapes curation, just invisibly and without consequence. Making it explicit at least gives downstream consumers (humans or agents) the choice of how to weight it, or whether to ignore it entirely.

On context collapse: this feels like the most under appreciated part. Scalar reputation is basically information destruction. If reputation is modeled as claims in context (“trusted for X under Y conditions”), then the hard part isn’t theory, it’s UX. Humans will need projections; agents can handle the full tensor. Collapsing too early is the real failure mode.

And zooming out: I don’t read the thesis as “stake-weighted graphs solve truth.” It’s more “belief needs to become costly, contextual, and queryable again.” Without that, we’re stuck with free generation and opaque ranking systems deciding reality by default.

Plenty here still feels open, but this is the first framing I’ve seen that at least attacks the right failure modes head-on.

2 Likes
  1. The Plutocracy Problem

If i should rank these 4 in order
1- Reputation-weighted stake(we could use Relics and other reputation signals from other protocols(Eg: Talent Protocol) 500$TRUST
2- Social graph weighting (trusted sources count more) We call this intuition so it should be about our social environment, right? :slight_smile: 250$TRUST
3- Time-weighted conviction (longer hold = more weight) It’ll be hard to game the system with this 250$TRUST
4- Quadratic staking (square root of stake = influence) 100$TRUST

  1. The Speed Problem

Question: Can human-generated reputation signals operate at agent speed? Or do we need agent-generated reputation—and if so, how do we trust the agents doing the evaluating?
My answer: I don’t think we(humans)need to operate at agent speed and don’t need agent-generated reputation. They should only be using to check the reputation that we create.

1 Like

One thought building on this thread: maybe part of the problem is that we’re still treating opinions / beliefs as first-class objects, instead of treating contextualized claims as the primitive.
In the approach we’ve been exploring, opinions don’t disappear, but they’re not step one. Step one is anchoring anything to a minimal shared context.
Concretely, instead of free-floating claims, every assertion would attach to an anchor:
a historical or internet event
a specific artifact (URL, tweet, paper, dataset, MCP server version, incident, etc.)
Each anchor carries a minimal, standardized context:
time (range + uncertainty)
place (country / region / point, with precision)
domain (security, finance, history, infra, culture…)
scope (what the claim applies to)
version / period (esp. for tools, servers, datasets)
source references
Then opinions, reviews, stakes, reputation all attach to that anchor, not to an abstract entity.
This does a few useful things mechanically:
reduces context collapse (reputation becomes “trusted for X in Y during Z”, not a scalar)
limits plutocracy blast radius (stake is visible and bounded to a context)
helps with cold start (new actors can anchor to existing events/artifacts before having social rep)
raises the cost of sybil attacks (claims have to remain coherent across time/place/version)
grounds curation without needing a hard oracle (even if value is subjective, anchoring isn’t)
Importantly, this doesn’t replace stake-weighted belief graphs — it feeds them.
Stake still prices conviction, but conviction is always about something concrete, with inherited context.
Another upside is that this opens the door beyond web3-native users.
Anchored graphs map very naturally to cultural, historical, journalistic, and archival workflows, where source, time, and place already matter. Those domains could become a human-scale on-ramp into the trust stack, instead of everything starting from agents and protocols.
Not a full solution, just a possible direction:
belief graphs sitting on top of a reality / context layer, rather than floating free.
Curious if others have thought about similar anchoring approaches, or see obvious failure modes here.

1 Like

Reading through this I kept on noticing that #2 (Cold Start), #3 (Context Collapse), and #5 (Recursive Trust) feel like the same problem wearing different masks.

Let me explain what I mean.

Context Collapse happens 'cause we’re treating trust like it’s this monolithic thing. But the people I trust for smart contract security aren’t the same people I’d trust for music takes or NFT picks. A single reputation score flattens all of that. What if we had different trust contexts for different domains - kinda like switching between different lenses? “Security lens,” “DeFi lens,” “Social lens.” Same data, different filters.

Cold Start gets easier once you have these lenses. Yeah, new entities have no history. But they’re not starting from zero - they’re interacting with the same infra as everyone else. Bridges, core contracts, shared dapps. You can bootstrap an “implicit network” from on-chain interaction patterns before anyone explicitly trusts anyone. Then layer in explicit trust as it builds. Show trusted network signals first, fall back to broader aggregates when your network hasn’t touched that claim yet.

Recursive Trust - the echo chamber and long-game attacker stuff - gets better when your trust propagation logic is explicit and auditable. The problem isn’t multi-hop trust. It’s multi-hop trust where you can’t see the assumptions baked in. If decay functions, hop weights, and source filtering are all legible, you can at least reason about failure modes. You can ask “why am I seeing this?” and get a real answer.

Look, what I’m really getting at is that these three issues all stem from the same gap. We need contextual, transparent trust filtering - where users pick their lens, see why they’re seeing what they’re seeing, and can switch without friction.

I’m not claiming this fixes everything, but it could be one of those solutions that tackles multiple problems simultaneously

1 Like

Interesting perspective…I almost see it the other way around (in the mid term, not the near term) where there is more agent-generated reputation purely because agents can report and audit 24/7/365 at lightening speed. They get many more reputational iterative cycles then we get as humans. With that said, it raises the question of how we might weight a human generated reputational signal from an agent generated reputational signal?

I like your rankings however, and they inspire a composite signal in my mind: time-weighted-social-graph-reputation meaning how much stake, from how many others, for how much time.

Addressing some things sequentially:

#1 - “I don’t think we(humans)need to operate at agent speed and don’t need agent-generated reputation. They should only be using to check the reputation that we create.” - I think it’s both. But I very strongly believe we are years away from programmatically computing reputation in a way that could be completely offloaded to machines; there are too many variables at play.

As such, I think human intuition is an essential piece of the puzzle - we need humans in the loop in the domain of reputation. I could make an argument for always, but definitely right now.

This doesn’t negate the need to analyze how we, as humans, derive intrinsic reputation scores for things; it’s actually the opposite. We should analyze our own behavior and try to understand ‘why’ we think the things we think, so that we can make progress on the front of outsourcing reputation generation to machines / so that we can come up with a meaningful variable set.

#2 - “every assertion would attach to an anchor:” - I don’t necessarily think EVERY assertion needs to attach to an anchor (there needs to be ‘root anchors’, for example), but I REALLY like this idea of most claims attaching to an anchor! This is quite interesting and is definitely a pattern that we should all explore more… Every action we take / every decision we make can be tied back to SOMETHING… or some SET of THINGS… This feels like a very powerful pattern to follow… Especially in the context of AI, where we need to trace and learn from decision paths…

#3 - @repboiz I think everyone here is bought into the concept of contextual reputation / understands its importance. And, that is definitely where any sort of architecture needs to start - we cannot say ‘what is the set of criteria that make Actor X reputable’; we need to be saying ‘what is the set of criteria that maker Actor X reputable in Y Context’. BUT, universal reputation CAN be a useful concept, despite its reductionist nature. This universal reputation - or rather, higher-order reputations (can compose together smaller-context reputations into higher-context reputations in a kind of infinite fractal) just need to be compositions of lower-order contextual reputations that logically roll up.

3 Likes

The fractal framing clicks. So universal reputation isn’t wrong - it’s just that it should be the output of composed contextual reputations, not the starting primitive. You can always roll up but you can’t roll down if you started flat

That also gives users a choice: look at the aggregate if you want a quick signal, drill into the components if you need to understand why.

2 Likes