Atom Classification & Enrichment (Intuition Improvement Proposal)

IIP-1: Onchain Atom Classification and Offchain Enrichment

To scale Intuition, a global onchain social knowledge graph, we need to expand upon the “things” we can contextualize and reason about.

I am proposing new Atom Classification and Atom Enrichment data structures to help with that goal. Below is the proposed data structures, examples and reasons why we should adopt this approach.

You can find the proposed data structures in the Intuition Data Structures repo.

Github: https://github.com/0xIntuition/intuition-data-structures

Summary

IIP-1 proposes a simple rule for Intuition data structures:

  • Keep onchain atoms minimal and durable.
  • Move rich, changing metadata into offchain enrichment artifacts.

The goal is to make atoms easy to create, stable over time, and scalable across the knowledge graph.

Why This Matters

If we store too much data in base atoms, it becomes stale quickly. Descriptions change, URLs move, and media disappears.

Instead, the atom should contain only the smallest viable representation of an idea, such as a person, place, thing, company, or software project. Additional context can be attached later as enrichment.

This keeps the core graph clean while still supporting rich experiences in apps and APIs.

Design Principles

  1. Minimal by default: every atom starts with only required identifying fields.
  2. Durability first: prefer fields that are less likely to change over time.
  3. Separation of concerns:
    • On-chain atom = identity
    • Off-chain enrichment = context
  4. Composable growth: add metadata as artifacts without mutating the core identity model.

Scope of IIP-1

IIP-1 introduces:

  1. Atom classification structures for flat, first-class categories.
  2. Enrichment envelope structures for attaching metadata artifacts offchain.
  3. A standard flow from URL input to classified atom to enriched entity.

Flat Classification Model

IIP-1 treats each schema type as a top-level classification in docs and folder structure.

This means we do not group types under umbrella labels like “Song” or “Ethereum” in the classification catalog. For example:

  • MusicRecording, MusicAlbum, and MusicGroup are separate first-class classifications.
  • EthereumAccount, EthereumSmartContract, and EthereumERC20 are separate first-class classifications.

Minimal Atom Classification

Atoms are classified using Schema.org types where applicable (plus protocol-specific types like Ethereum), with only a minimal subset of fields stored in the base atom.

Person (Person)

{
  "@context": "https://schema.org/",
  "@type": "Person",
  "name": "Brad Pitt",
  "sameAs": ["https://www.wikidata.org/wiki/Q35332"]
}

Location (Place)

{
  "@context": "https://schema.org/",
  "@type": "Place",
  "name": "Golden Gate Bridge",
  "address": "San Francisco, CA"
}

Thing (Thing)

{
  "@context": "https://schema.org/",
  "@type": "Thing",
  "name": "Apple"
}

Company (Organization)

{
  "@context": "https://schema.org/",
  "@type": "Organization",
  "name": "OpenAI",
  "url": "https://openai.com"
}

Software (SoftwareSourceCode)

{
  "@context": "https://schema.org/",
  "@type": "SoftwareSourceCode",
  "name": "Bun",
  "codeRepository": "https://github.com/oven-sh/bun"
}

Music Recording (MusicRecording)

{
  "@context": "https://schema.org/",
  "@type": "MusicRecording",
  "name": "Bohemian Rhapsody",
  "byArtist": "Queen"
}

Music Album (MusicAlbum)

{
  "@context": "https://schema.org/",
  "@type": "MusicAlbum",
  "name": "A Night at the Opera",
  "byArtist": "Queen"
}

Music Group (MusicGroup)

{
  "@context": "https://schema.org/",
  "@type": "MusicGroup",
  "name": "Queen"
}

Ethereum Account (EthereumAccount)

{
  "address": "0x1234567890abcdef1234567890abcdef12345678"
}

Ethereum Smart Contract (EthereumSmartContract)

{
  "chainId": "1",
  "address": "0xA0b86991c6218b36c1d19d4a2e9eb0ce3606eb48"
}

Ethereum ERC20 (EthereumERC20)

{
  "chainId": "1",
  "address": "0xA0b86991c6218b36c1d19d4a2e9eb0ce3606eb48",
  "name": "USD Coin",
  "symbol": "USDC",
  "decimals": "6"
}

Offchain Enrichment Artifacts

Enrichment artifacts attach additional metadata to atoms. These artifacts are:

  • offchain
  • provider-specific
  • replaceable and refreshable
  • linked to a canonical atom identity

Generic enrichment envelope:

{
  "artifact_type": "<classification_slug>",
  "data": {},
  "meta": {
    "pluginId": "<plugin_slug>",
    "provider": "<provider_name>",
    "fetchedAt": "<iso_datetime>",
    "sourceUrl": "<source_url_if_available>",
    "confidence": "<number_0_to_1_if_available>"
  }
}

Example enrichment for Brad Pitt:

{
  "artifact_type": "person",
  "data": {
    "image": "https://upload.wikimedia.org/example.jpg",
    "description": "American actor and film producer",
    "notableWorks": ["Fight Club", "Se7en", "Moneyball"]
  },
  "meta": {
    "pluginId": "wikidata",
    "provider": "wikidata",
    "fetchedAt": "2026-02-26T12:00:00Z",
    "sourceUrl": "https://www.wikidata.org/wiki/Q35332",
    "confidence": 0.98
  }
}

URL to Atom to Enrichment Flow

Step 1: Input URL

User submits a URL:

https://en.wikipedia.org/wiki/Brad_Pitt

Step 2: Classification

Classifier resolves the canonical type and minimal identity fields:

{
  "type": "Person",
  "data": {
    "@context": "https://schema.org/",
    "@type": "Person",
    "name": "Brad Pitt",
    "sameAs": ["https://www.wikidata.org/wiki/Q35332"]
  }
}

Step 3: Atom Creation (Onchain)

Only the minimal atom payload is committed onchain.

Step 4: Enrichment (Offchain)

Background workers fetch provider metadata and attach artifacts to the atom in backend systems.

Step 5: Query-Time Composition

Clients query both:

  • the stable onchain atom identity
  • current offchain enrichment artifacts

This gives users rich UI context without bloating onchain data.

Benefits

  1. Scalability: smaller atoms mean faster indexing and less storage pressure.
  2. Stability: core identity data does not need constant updates.
  3. Flexibility: enrichment can evolve independently by provider and use case.
  4. Interoperability: schema-based typing keeps classification predictable.

Non-Goals

  1. Storing every available metadata field directly in base atoms.
  2. Defining a single permanent enrichment provider format.
  3. Requiring enrichment to exist before an atom can be created.

Initial Classification Coverage

IIP-1 starts with:

  • Thing
  • Person
  • Organization (Company)
  • Place (Location)
  • Product
  • Service
  • SoftwareSourceCode
  • MusicRecording
  • MusicAlbum
  • MusicGroup
  • Book
  • Article
  • SocialMediaPosting
  • ImageObject
  • VideoObject
  • Movie
  • TVSeries
  • Event
  • DefinedTerm
  • EthereumAccount
  • EthereumSmartContract
  • EthereumERC20

Open Questions for Community Feedback

  1. Which identity fields should be considered acceptable optional disambiguators per type?
  2. Should sameAs be recommended for all types, or only when strong canonical sources exist?
  3. What artifact retention and refresh strategy should we standardize for enrichment?
  4. Should we define confidence thresholds for automatically accepted enrichment artifacts?

Proposed Next Steps

  1. Align classification docs and plugins to a single minimal-field reference table.
  2. Add validation and examples for URL classification outputs.
  3. Publish recommended enrichment provider mappings by category.
  4. Open implementation and migration checklist for services consuming enriched atoms.

Call for Feedback

Please review this proposal with a focus on:

  • whether the minimal atom philosophy is clear and practical
  • whether the current field sets are truly minimal
  • where classification and enrichment boundaries should be tightened

If this direction is accepted, future IIPs can define refinement rules per category and provider.

1 Like

Each Atom must, of course, have a sufficient amount of metadata required to differentiate it from other Atoms, to solve the Father Father problem effectively - but, if we are committing this data on-chain and it is truly immutable, we need to be very careful about stale links, etc.

Additionally - I do not like storing things directly on-chain, as they are then truly ‘forever’… At least if ‘bad data’ is an IPFS CID, for instance, everyone can stop pinning the respective IPFS object. If it’s on-chain, we’d have to… take down the chain? Of course, anyone is free to do whatever they want - but I am not sure if this should be our recommended approach…

I believe we should start this conversation one level zoomed out from this - i.e., ‘what is the recommended data to put in atomData’?

To-date, we’ve been recommending and defaulting to a few different things:

  • Eth addresses for EOAs
  • CAIP-10 prefixed addresses for smart contracts
  • IPFS CIDs for most other things

For something like an ETH address, storing the address as the atomData is fine, of course. Or for any other thing where the atomData is wholly self-contained and unique, storing it as the atomData is likely fine.

However, for most ‘things’ in the world / for most of the entity types listed above… should the atomData instead be… something else?

One option for what it could be - an ENS domain, where the ENS domain of each Atom is controlled by the respective Atom’s Atom Wallet (this is one of the purposes of the Atom Wallet - to allow for controllership over the Atom’s metadata).

So, for example - an Atom representing the Music Group ‘Queen’. We create an ENS domain/subdomain for Queen, controlled by the respective Atom’s Atom Wallet. The Atom Warden controls each Atom Wallet until it is transferred, so the protocol has full control over the respective Atom’s metadata (so the protocol can upload whatever initial 'artifacts’ it needs to so that the Atom resolves nicely (can use our Entity Type Schemas for this purpose (initial completion of necessary fields to resolve Atom nicely), and the protocol can update the Atom if it needs to).

Then, let’s say Queen comes and proves ‘Hey I am Queen! Please give me the controllership of the Atom that represents me!’ → we transfer them controllership of the Atom Wallet, they can update the ENS for ‘Queen’ to a new picture if they’d like, for example → so we enable decentralized control over the ‘Artifacts’.

I think there are a lot of potential benefits we get with this approach, so curious to get your / people’s thoughts.

1 Like

Also - maybe ENS is not flexible enough to accommodate this right now in a way that is efficient / makes sense - and that is where TNS / our own custom fork comes into play (if we decide this is a viable path)? (I am not sure, I am not up to date on ENS).

Also, maybe there is a ‘right now’ solution, and a ‘later, more optimal’ solution? If so, would love to have some sort of known contingency migration path to migrate to the more optimal solution later (if we decide to opt for a known sub optimal approach right now).

This also raises another question that I’ll start in another IIP

1 Like

Also please make sure to move this proposal to the IIP section of the forum once it feels like this conversation has reached some conclusion!

1 Like

There is a lot to unpack here, but I’ll try to answer everything.

I agree it could make sense to continue to use IPFS for storing this data. It doesn’t necessarily need to be written directly on chain in this JSON format. Like you said, there might be content that is not something that we want to “live forever” and we want to stop pinning it.

Now, to your point about every atom also having an ENS name. That is also possible, and we could do something similar to BASE, where they use an off-chain registry that still integrates with the smart contracts on-chain. This way, we can give everything a domain name without writing on-chain. In this way, it gets a persistent identifier that is human-readable and can also be owned by a wallet.

We could do something clever here too, where if an atom has a domain name attached, we allow that domain owner to post a TXT record to prove that they own the domain. Off-chain, we do something clever where we sign over ownership automatically.

Getting back to what’s published, though, these data structures are generally as minimal as possible, where the type field does a lot of the heavy lifting. Most of them just have:

  • name

  • description

  • optional URLs

  • a few other various fields depending on the type

And when this gets consensus, I’ll be sure to move it over to the governance section.

1 Like

Sounds good! Raising this because I think that ‘what gets published’ is a function of our approach here.

If atomData is a pointer to some mutable piece of data, then we can be a bit more liberal about our ‘minimal data per entity type to distinguish Atoms from one another’.

But if it’s immutable - whether on chain or offchain - I believe we need to be more careful - ie URL, Code Repository, Address, etc. - these all do seem like they are ‘good enough’ to provide the context necessary to address the question of ‘what is being referenced by this Atom?’ - but this data might get stale, etc., and so if this data is immutable, I would recommend that these have more thought put into them before being standardized (for each entity type).

Also - don’t need to use ENS for this, was just one thought / option - just raising the question of whether or not we store some immutable pointer to some mutable data as the atomData, vs storing a bunch of detailed immutable data as the atomData.

1 Like

Question - if we go with the immutable Atom approach (which also has a lot of properties we DO want (ex. perfect guarantees around what the Atom references, etc)) - do we care about this potential ‘staleness’? Or are we just looking to get enough data at the beginning of an Atom’s life to be able to create a set of artifacts for it - and then from that point forward, the initial atomData become less relevant?

1 Like

^ this approach actually feels pretty good imo…

ie just using the initial set of atomData to kind of ‘bootstrap’ the Atom…

1 Like

Also need Predicate entity type(s) as part of the initial defined set

2 Likes