Reputation Should Be Queryable

repboiz · March 28, 2026, 9:26am

A lot of people aren’t deep into how reputation works in Web3.

But after ~4 years building across DeFi, privacy, and now the Intuition MCP… one pattern keeps showing up:

Every system gives you a number - and calls it trust.

Most systems answer the wrong question.

They try to tell you:

“Is this person trusted?”

But that’s not how decisions work.

The Real Question Is Contextual

You’re not asking:

“Is Luda trusted?”

You’re asking:

Is Luda a trusted Solidity dev?
Is Luda a trusted trader?
Is Luda trusted by people I trust?

Same person.
Different answers.

Why Most Systems Fail

They collapse everything into:

one score
one label
one dimension

So you lose:

why someone is trusted
who trusts them
for what they’re trusted

A single number flattens everything that matters.

Intuition Changes the Primitive

Reputation isn’t a score.
It’s attestations

Example:

Billy → Luda
“strong Solidity developer” (0.9)

Zet → Luda
“reliable trader” (0.8)

Luda doesn’t have a reputation.

Luda has multiple reputations

Reputation Becomes Queryable

Instead of:

“Give me Luda trust score”

You can ask:

Who are the most trusted Solidity devs?
Who is trusted by Zet for trading?
Who is trusted by people Billy trusts?

Reputation stops being something you read.

It becomes something you query

Predicate Filtering = Signal Control

This is where things clicked for me while building the https://mcp.intuition.box.

We are implemented predicate filtering with weighted scoring across multiple predicate types.

The difference?

Night and day.

You can filter by:

predicate
source
weight

Example:

Ignore generic endorsements

Only use:

→ “is Solidity developer”
→ from high-trust devs

Now you get:

high signal, low noise

Why This Matters for Agents

Agents don’t need generic answers.

They need relevant answers

A hiring agent shouldn’t care about:

trading reputation
social popularity

It should filter for:

→ dev attestations
→ from credible sources

And this is why I also build a trust lens system into the Intuition MCP which will enable.

Each lens = a filtered view of the graph

Agents pick a lens → get only the signal they need

Builder Opportunities

This unlocks primitives that feel underexplored:

1. Reputation Query Engines
The query becomes the product

2. Domain Leaderboards
Separate trust per domain – not one global list

3. Predicate Marketplaces
Communities define what trust means

4. Agent-Specific Filters
Each agent defines its own trust logic

Where It Gets Interesting

Combine:

predicate filtering
graph traversal
trust propagation

And you can ask:

“Find Solidity devs trusted by people Billy trusts, weighted by multi-hop trust”

That’s not a score.

That’s logic

Open Questions

Who defines the “right” predicates?
Should they standardize or evolve?
Can agents learn what matters over time?
How do we reduce spam + low-signal attestations?

Final Thought

Reputation shouldn’t be something you check.

It should be something you ask

What’s the first reputation query you’d run on the graph?

kryptoremontier · March 28, 2026, 9:06pm

This resonates hard. Especially the point about domain leaderboards and predicate filtering as separate primitives.

I’ve been building exactly this — contextual trust scoring where each [Agent] [hasAgentSkill] [Skill] triple has its own vault and its own independent score. So you don’t ask “is this agent trusted” but “is this agent trusted FOR this specific capability.” The overall score becomes a weighted average of domain scores.

Flipped the query too — instead of “what can this agent do” you ask “who is the best agent for this domain” and get a ranked leaderboard per skill. Different domains, different rankings, different top agents.

On your open question about reducing low-signal attestations — one approach I’m exploring is accuracy-weighted staking. Your track record as an evaluator determines how much your signal weighs. Consistently back agents that maintain trust → your weight goes up. Back agents that crash → it goes down. It’s a natural spam filter because low-quality evaluators lose influence over time without needing manual moderation.

Your point about agents picking a lens is interesting. Each lens could map to a predicate + evaluator weight filter. High-accuracy evaluators only, specific domain only → clean signal for agent decision-making.

smilingkylan · March 30, 2026, 1:27am

Out of curiosity how do you guys foresee the atom / triple structure looking? My use case is slightly different since my extension will be focused on who users trust on specific topics like politics, tech, crypto, sports, etc. Our users will browse X.com and be able to apply different lenses (topical trust circles). It’s a bit tricky since a typical X feed includes Tweets from a variety of topics but I can probably handle that by giving each lens a different color, etc.

I originally figured just a I - trust - smilingkylan.eth type triple would work, and we’d be able to extend with a (I - trust - smilingkylan.eth) - for topic - technology but I’m not sure how much value the initial (inner) triple truly has.

So if we go with something like smilingkylan.eth - is trusted for topic - technology that could definitely work. You could still figure out their aggregate trustworthiness by querying the first 2 of 3 atoms of the triple… if that’s of any value.

The big step that needs to be taken if we move forward with a scheme like this is figuring out how to divvy up the different topics. Maybe start off with broad topics (like the ones I mentioned earlier) then let more specific topics pop up in response to community demand? We could theoretically vote on it as a list / ranking as well although any vote is subject to manipulation. Still, such a critical list would likely get a lot of staking on it from whales so maybe they’d be able to drown out any attempt at manipulation.

These are good conversations and it sounds like most of the dapp developers are starting to converge towards a network-wide convention, which is nice.

Edit: another important question is whether you guys expect to give a user’s stake amount as a weight for their trust circle? Weighted scores can be a bit harder for users to audit (ie “are claims by my trust circle being weighed correctly by the stake amount that I trust them?”)

Edit #2: the other implication of this scheme is that new users will start off with empty trust circles which is a terrible user experience. This means we will need to find a (decentralized) way to get new users a large list of people to trust. I kinda expect different dapps / individuals to put together lists of people they recommend users trust. The cool thing is that with batched staking a new user can follow dozens of EVM accounts as trustworthy for a given topic… but I’m not sure if we have much of that infrastructure built within the community just yet. And how would a dapp like Hive Mind decide which EVM accounts to trust for which topics? Should EVM accounts nominate themselves for given topics? What would that self-nomination process look like? Where would it take place? “Topical trust lists” (or whatever we want to call them) may end up playing a critical role in the future of Intuition. Or maybe the network will just look at how must stake is organically staked onto EVM accounts for specific topics and encourages new users to also trust them for that specific topic? That would probably create an echo effect though

repboiz · March 30, 2026, 11:54am

Yeah this resonates

The “trusted FOR something” framing is exactly the shift

And the accuracy-weighted staking idea is kinda powerful ngl
It basically makes bad signal decay on its own

The cold start problem feels tricky though

How do you stop early bad actors from shaping the initial weights?

repboiz · March 30, 2026, 12:32pm

This is a really thoughtful breakdown

I think you’re circling the right problem:
not just how to store trust - but how to make it usable across different contexts

On the triple structure, I’d lean strongly toward:

smilingkylan.eth → trusted_for → technology

instead of nesting like:

(I → trust → you) → for topic → X

The nested version feels harder to reason about and query

If the goal is composability and clean queries, keeping the predicate contextual from the start feels cleaner

The interesting part is you can still recover “global-ish” trust by aggregating across domains

So instead of:

global score → primary

It becomes:

domain scores → primary
global → derived

On topics — yeah this is where it gets tricky

I wouldn’t over-optimize it early

Start with broad domains:

tech
crypto
politics
sports

Then let more granular ones emerge on top of that

Trying to standardize too early might actually slow things down

Your point on accuracy-weighted evaluators + stake is

Feels like two separate axes though:

stake = economic weight
accuracy = informational weight

I wouldn’t collapse them too quickly

Because high stake ≠ high signal

On the empty trust circle problem - this is the real UX bottleneck imo

Cold start kills everything if not handled properly

Your idea of “topical trust lists” makes sense

But I think the key is:

→ make them forkable + composable

So instead of one canonical list, you get:

Billy tech trust list
Zet dev trust list
curated DAO lists

New users just subscribe to a base layer

Then refine over time

Also agree on the echo chamber risk

If we purely follow “most staked = most trusted”
we’ll just recreate popularity loops

So maybe:

→ discovery should bias diversity
→ not just weight

On your lens idea - I like the direction a lot

Feels like:

lens = predicate + source filter + weighting logic

Different lenses = different “views of trust”

One thing I’m still thinking about:

Should topics be:

fixed primitives (clean, comparable)
or fully emergent (flexible, messy)

Feels like the right answer might be a hybrid

Also curious how you’re thinking about:

→ topic overlap

Like tech vs crypto vs AI

Do you see those as separate graphs or overlapping layers?

kryptoremontier · April 5, 2026, 1:25pm

Good question. Cold start is actually where the system’s design pays off the most.

Four things handle it:

Confidence anchoring - new agents start at 50 (neutral), not 0 or 100. Score only moves away from 50 as real stake accumulates. One bad actor staking early barely moves the needle because confidence is near zero with low total stake.
New evaluators start at 1.0x weight - not 1.5x. You can’t walk in and immediately have outsized influence. You have to build a track record across multiple agents over time. Gaming that is expensive - you need many positions on many agents, each with real stake.
Soft gate - if an agent has low support ratio (bad actors opposing early), the score gets proportionally crushed. 30% support = score × 0.6. So even if a bad actor inflates support, one or two honest oppose signals hard-cap the damage.
Min stake threshold - dust wallets don’t count for diversity. You need real capital per position, so spinning up 50 sybil wallets with 0.001 each does nothing.

Basically the system is designed to be boring at cold start. Score sits near 50, nothing dramatic happens, and influence has to be earned slowly. That’s a feature, not a bug - early bad actors get a muted, expensive, temporary effect that decays as real participants arrive.

kryptoremontier · April 5, 2026, 1:28pm

We landed on almost exactly the same structure. On AgentScore we use [Agent] [hasAgentSkill] [Skill] where each triple gets its own vault with independent staking. So trust is always contextual - same agent, different scores per domain. The aggregate score becomes a weighted average of domain scores.

On the topic taxonomy question - we started broad (Developer Tools, Data Analysis, Security, etc.) and plan to let community demand surface more specific ones. On mainnet skills.sh already has canonical atoms which helps avoid fragmentation.

On staking weight - we went a step further than raw stake amount. Each position gets three multipliers:

effectiveStake = amount × diversityWeight × evaluatorWeight

DiversityWeight handles whale resistance (one wallet holding >50% of a side gets 0.5x). EvaluatorWeight is based on the staker’s track record across all their positions - if you consistently back agents/people that maintain trust, your weight goes up (max 1.5x). Bad evaluators naturally lose influence.

On the cold start / empty trust circles problem - this is the hardest one. Confidence anchoring helps (new entries start neutral at 50, not 0 or 100, so early manipulation is muted). But for onboarding UX, curated starter lists could work well - especially if the list itself is stakeable so the community can signal which starter pack is high quality.

Your X extension use case is really interesting. The trust lens concept maps directly to what we call Agent Domains - different filtered views of the same trust graph. Would be cool to compare approaches.

Topic		Replies	Views
Why a Single Trust Score Doesn't Work 3. Reputation Computation	0	23	March 23, 2026
Unresolved problems in identity, reputation, and curation for trustworthy AI infrastructure 1. Ideas & Brainstorming intuition , reputation	7	157	February 12, 2026
Why Evaluator Accuracy Changes Everything About On-Chain Reputation 3. Reputation Computation	0	5	April 5, 2026
About the 3. Reputation Computation category 3. Reputation Computation	0	10	March 24, 2026
Smart Trust Reputation Engine for Intuition (AI-powered) 4. Ecosystem Development intuition , reputation , knowledge-graphs	0	22	November 28, 2025