The problem
Each office records a company however it was filed — a different legal entity, a different language, a different level of detail. Resolving these is hard in both directions: the same company often looks completely different across offices, and different companies often look identical. The same company, filed three different ways. Procter & Gamble’s records share almost nothing a string match could use:| Owner as filed | Country | Linked by |
|---|---|---|
| The Procter & Gamble Company | US | shared public company |
| Procter & Gamble Manufacturing Cologne GmbH | DE | Madrid registration |
| Procter & Gamble International Operations S.A. | CH | Madrid registration |
Limited Liability Company «DVA MYACHA», as a truncated dva y, and in Cyrillic as «DVA МYАСНА». Nothing but a shared Madrid registration number connects them.
And the reverse is just as dangerous. Two companies can share an identical name and not be the same:
| Owner | Country | Business | LEI |
|---|---|---|---|
| EQT Corp | US | Natural gas | 4NT01YGM4X7ZX86ISY52 |
| EQT AB | SE | Private equity | 213800U7P9GOIRKCTB34 |
Owners and entities
Signa models this with two layers:| Layer | ID | What it is |
|---|---|---|
| Owner | own_* | A single office’s applicant/registrant record, exactly as that office filed it. |
| Entity | ent_* | The cross-office identity that links the owner records belonging to the same company. |
own_ IDs are stable and never break, even as resolution improves and links are added or corrected over time.
Resolved vs. derived entities
Every owner is reachable as an entity, so you can always traverse from a mark to its company-level identity:- Resolved (
entity_id_type: "resolved") — a materialized entity linking two or more owner records. - Derived (
entity_id_type: "derived") — a singletonent_<owner-uuid>for an owner that isn’t linked to anything yet. It has exactly one member (itself).
id may differ from the one you requested, and cached derived IDs never 404.
What a resolved entity gives you
One identity across offices
Member owners (one per office) with the evidence that justified each link.
Global portfolio
Every mark across all member owners, in one paginated, fully filterable list.
Corporate family
The GLEIF parent and direct subsidiaries behind a brand.
Public-company facts
Ticker and LEI, aggregated across the company’s office records.
signal, tier, confidence, and whether it was decided_by auto, llm, or human:
The
link evidence is deliberately narrow. The model id, prompt hash, and any
LLM rationale behind a link are operational data and are never returned through
the API.How resolution works
1. Name normalization
Every party name runs through a canonical normalization pass — case folding, diacritic stripping, Unicode-aware punctuation/whitespace handling, common-abbreviation expansion, and legal-form extraction (recognizingInc., GmbH, S.A. and friends as detachable suffixes). The result is a canonical_name that’s stable across the spelling variants one company gets filed under, plus the original display name.
2. Linking signals
Pairs of owners are linked only on strong, specific evidence. Name similarity alone never links — there must be a second corroborating signal:Signal (tier) | What links the owners |
|---|---|
office_identifier | The same applicant identifier reported by an office. |
madrid_ir | A shared Madrid international registration number, behind a distinctive-token name guard to prevent mass false merges. |
shared_pco | Both owners link to the same public company (SEC/GLEIF) at high confidence. |
portfolio_overlap | Matching name plus overlapping trademark portfolios — the signature of one company across offices. |
3. Adjudication
Ambiguous candidates are resolved by a combination of deterministic rules and model-assisted review, and every decision is durable: a confirmed “different” verdict is recorded so the same pair is never silently re-linked later. Links can also be split when new evidence contradicts an earlier decision, again without ever rewriting the underlying owner records.4. Accuracy and validation
Linking two companies that aren’t the same is the worst mistake an entity resolver can make — far more damaging than a missed link — so Signa is built to earn every link:- Adjudicated, not guessed. Ambiguous pairs are never auto-linked on a hunch. More than 57,000 owner pairs have been individually adjudicated by AI judges and human reviewers.
- LLM-as-judge, cross-checked across models. Each ambiguous pair is evaluated by a large language model acting as an impartial judge, then independently cross-checked across multiple frontier models — with disagreements escalated to web research and human review. A link only stands when the evidence agrees.
- Held to hard gates. New links must clear strict precision and recall thresholds before they’re written, and the system is tuned aggressively against false merges.
- Decisions are durable. A confirmed “different” verdict — like keeping EQT Corp and EQT AB apart — permanently blocks that merge from ever recurring.
Resolution runs continuously as data is ingested, not in real time per request.
Newly ingested owners may briefly appear unlinked until the next pass connects
them.
Public-company enrichment
Signa maintains apublic_companies table from two authoritative sources, and links owners to it:
| Source | Coverage | Identifier |
|---|---|---|
| SEC EDGAR | ~10,000 US-listed companies | CIK |
| GLEIF | ~2.5M legal entities worldwide | LEI |
| Filter | Description |
|---|---|
ticker=AAPL | Entities/owners linked to this stock ticker |
lei=HWUPKR0… | Linked to this LEI |
publicly_traded=true | Has a confirmed active SEC ticker match |
has_lei=true | Has a confirmed GLEIF LEI match |
Corporate families
Using GLEIF Level 2 relationship data (460,000+ parent–subsidiary records), Signa connects an entity to its direct corporate parent and subsidiaries:GLEIF Level 2 covers LEI-reporting companies only. An absent edge does not
imply the absence of a corporate relationship — see
coverage_caveat on the
response. See Entity Family.Endpoints
| Endpoint | Purpose |
|---|---|
GET /v1/entities | Search and filter resolved entities |
GET /v1/entities/{id} | One entity with members + link evidence |
GET /v1/entities/{id}/trademarks | Global portfolio across all member owners |
GET /v1/entities/{id}/family | GLEIF corporate parent and subsidiaries |
GET /v1/owners/{id} | A single per-office owner record |
FAQ
What's the difference between an owner and an entity?
What's the difference between an owner and an entity?
An owner (
own_) is one office’s record of an applicant/registrant. An entity (ent_) links the owner records that belong to the same company across offices. Use entities for company-level questions (“everything Apple owns, everywhere”); use owners when you need the exact per-office record.Why did the entity ID I requested come back with a different id?
Why did the entity ID I requested come back with a different id?
You requested a derived singleton ID for an owner that has since been linked into a real entity. Signa transparently resolves it to the canonical entity and returns that
id. This is expected — store the returned id.Do owner IDs ever break?
Do owner IDs ever break?
No — owners are linked, not merged, so
own_ IDs are stable. Only entities are ever fused; when that happens the old ent_ ID returns 410 Gone with a pointer to its successor in merged_into, so handle that response in your integration.Can I look up a company by stock ticker?
Can I look up a company by stock ticker?
Yes —
GET /v1/entities?ticker=AAPL returns the resolved entity, with public-company facts aggregated across its office records. The same filter works on /v1/owners.How often is the data refreshed?
How often is the data refreshed?
SEC data refreshes daily (~10,000 US companies); GLEIF and its Level 2 corporate-parent relationships refresh weekly (~2.5M entities). Entity resolution runs continuously as new records are ingested.
Does normalization handle non-Latin scripts?
Does normalization handle non-Latin scripts?
Yes. Normalization preserves CJK, Cyrillic, Arabic, Thai, and other scripts. Only Latin diacritics are stripped, and punctuation handling is Unicode-aware.