Technical Architecture (Validated)

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                        CLIENT LAYER                                 │
│                                                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌───────────────────┐  │
│  │ Web App  │  │   CLI    │  │ Browser  │  │  Findable MCP     │  │
│  │ (Next.js)│  │  (Node)  │  │Extension │  │  Server           │  │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────────┬──────────┘  │
│       │              │              │                  │             │
└───────┼──────────────┼──────────────┼──────────────────┼─────────────┘
        │              │              │                  │
        ▼              ▼              ▼                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         API LAYER                                   │
│              (Fastify/Hono, Cloudflare Workers)                     │
│           Rate limiting, Auth, Routing, Caching                     │
└────────────────────────────┬────────────────────────────────────────┘
                             │
        ┌────────────────────┼────────────────────┐
        ▼                    ▼                    ▼
┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│   DISCOVERY  │   │   SECURITY   │   │  ENTERPRISE  │
│   SERVICES   │   │   SERVICES   │   │  SERVICES    │
│              │   │              │   │              │
│ Search API   │   │ Scanner      │   │ Private Reg  │
│ Catalog API  │   │ Trust Scorer │   │ Policy Eng   │
│ Registry API │   │ Alert Engine │   │ Audit Log    │
│ Review API   │   │ Verify Svc   │   │ SSO/SCIM     │
└──────┬───────┘   └──────┬───────┘   └──────┬───────┘
       │                  │                   │
       ▼                  ▼                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        DATA LAYER                                   │
│                                                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐   │
│  │PostgreSQL│  │ Redis    │  │   R2     │  │   pgvector       │   │
│  │(Primary) │  │ (Cache)  │  │ (Assets) │  │  (Semantic)      │   │
│  └──────────┘  └──────────┘  └──────────┘  └──────────────────┘   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     INGESTION PIPELINE                               │
│                                                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐          │
│  │ Official │  │ Smithery │  │ PulseMCP │  │  GitHub  │          │
│  │ MCP Reg  │  │  Sync    │  │  Sync    │  │  Sync    │          │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐          │
│  │ Glama    │  │ MCP.so   │  │skills.sh │  │ SkillsMP │          │
│  │  Sync    │  │  Sync    │  │  Sync    │  │  Sync    │          │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Core Services

Catalog Service

Purpose: Unified skill index across all registries

Data model:

Skill {
  id: UUID
  name: string
  description: string
  version: string (semver)
  protocol: enum (mcp, skill_md, both)
  source_registry: enum (official_mcp, smithery, pulsemcp, glama,
                         mcp_so, skills_sh, skillsmp, clawhub,
                         github, direct)
  source_url: string
  publisher_id: UUID (FK → Publisher)
  content_hash: string (SHA-256)
  manifest_content: text
  tags: string[]
  categories: string[]
  platform_compatibility: string[]
  trust_score: float (0-100)
  security_grade: enum (A, B, C, D, F)
  install_count: int
  avg_rating: float (0-5)
  last_scanned_at: timestamp
  last_updated_at: timestamp
  created_at: timestamp
}

Publisher {
  id: UUID
  name: string
  github_id: string
  verification_level: enum (unverified, verified, organization)
  reputation_score: float (0-100)
  skills_published: int
  avg_trust_score: float
  created_at: timestamp
}

ScanResult {
  id: UUID
  skill_id: UUID (FK → Skill)
  scanner_version: string
  risk_level: enum (critical, high, medium, low, clean)
  findings: JSONB[]
  permission_map: JSONB
  trust_score_contribution: float
  scanned_at: timestamp
}

Search Service

Purpose: Semantic + full-text search across all indexed skills

Technology: PostgreSQL pgvector (semantic) + full-text search (keyword)

Approach:

Skill name + description → embedded using embedding model (e.g., text-embedding-3-small)
Stored as vectors in pgvector
Full-text index for keyword search (PostgreSQL tsvector)
Hybrid search: combine vector similarity + keyword relevance + trust score

Search ranking formula:

score = 0.35 * semantic_similarity
      + 0.20 * keyword_relevance
      + 0.25 * trust_score_normalized
      + 0.10 * install_count_log_normalized
      + 0.10 * recency_score

Note: Trust score weighted at 25% — this is the key differentiator vs. other registries. High trust score = better ranking. This creates an incentive for developers to improve their security posture.

Security Services (Findable Shield)

Scanner Engine

Purpose: Automated security analysis of MCP server manifests and SKILL.md files

Scanning pipeline:

Input (SKILL.md / MCP manifest + supporting files)
    │
    ├─→ [Stage 1] Static Analysis
    │     ├─ Regex patterns for API keys, tokens, passwords
    │     ├─ Known malware signature matching
    │     ├─ File type validation (reject suspicious binaries)
    │     └─ Hardcoded credential detection
    │
    ├─→ [Stage 2] Semantic Analysis
    │     ├─ Prompt injection detection (LLM-based)
    │     ├─ Instruction analysis (what does the skill tell the agent to do?)
    │     ├─ Data exfiltration pattern detection
    │     └─ Privilege escalation checks
    │
    ├─→ [Stage 3] Dependency Analysis
    │     ├─ Script dependency scanning (npm audit, pip safety)
    │     ├─ Known vulnerability matching (CVE database)
    │     └─ Supply chain risk assessment
    │
    ├─→ [Stage 4] Permission Mapping
    │     ├─ File/directory access patterns
    │     ├─ Network call destinations
    │     ├─ System command execution
    │     └─ Data read/write scope
    │
    └─→ [Output] Security Report
          ├─ Grade: A/B/C/D/F
          ├─ Risk level: CRITICAL / HIGH / MEDIUM / LOW / CLEAN
          ├─ Findings with severity + remediation
          ├─ Permission map
          └─ Trust score contribution

Trust Score Algorithm

trust_score = weighted_sum(
  security_scan_score      * 0.30,   # Grade from scanner (0-100)
  publisher_reputation     * 0.20,   # Verified status, history, other skills
  community_signals        * 0.15,   # Install count, ratings, reviews
  code_quality_signals     * 0.15,   # Update frequency, docs, tests
  age_and_stability        * 0.10,   # Time since first publish, version count
  transparency_score       * 0.10    # Open source, clear permissions, changelog
)

Differentiation vs. Snyk/Invariant Labs:

Snyk scans and reports. Findable scores and ranks.
Trust scores are visible at discovery time (before installation), not after deployment.
Integrated with search ranking — security directly affects discoverability.

Continuous Monitoring

Re-scan all indexed skills every 7 days
Immediate re-scan on version updates
Alert publishers on new vulnerabilities
Revoke trust scores for skills that fail scans

Findable MCP Server

Purpose: Allow AI agents to discover and evaluate skills through Findable directly.

This is a critical strategic asset — it makes Findable the discovery layer INSIDE the agent, not just a website.

MCP Tools exposed:

findable_search
  - Input: query (string), filters (object)
  - Output: List of skills with trust scores, descriptions, compatibility

findable_get_details
  - Input: skill_id or source_url (string)
  - Output: Full skill details, security report, reviews

findable_check_trust
  - Input: skill_id or skill_url (string)
  - Output: Trust score, security grade, findings, recommendation

findable_get_alternatives
  - Input: skill_id (string)
  - Output: Similar skills ranked by trust score + relevance

Distribution: Published on Smithery, MCP.so, and documented for direct integration with Claude Code, OpenClaw, Codex CLI, and Gemini CLI.

Technology Stack

Layer	Technology	Rationale
Frontend	Next.js + Tailwind	SSR for SEO/GEO, fast iteration
CLI	Node.js (TypeScript)	Cross-platform, npm distribution
API	Hono / Fastify	Performance, edge-compatible
Database	PostgreSQL + pgvector	Relational + vector search in one DB
Search	PostgreSQL tsvector (start) → Elasticsearch (scale)	Simple start, scale when needed
Cache	Redis / Upstash	Session, rate limiting, hot data
Object Storage	Cloudflare R2	$0 egress, cost-effective
Queue	BullMQ (Redis)	Async scanning, ingestion jobs
Auth	Clerk or Auth0	OAuth, SSO, SCIM for enterprise
Hosting	Cloudflare Workers + AWS (scanning workers)	Edge for API, compute for scanning
CI/CD	GitHub Actions	Standard
Monitoring	Grafana Cloud or Datadog	Observability
Embedding	text-embedding-3-small	Semantic search vectors
LLM (scanning)	Claude Haiku or GPT-4o-mini	Prompt injection detection

Infrastructure Requirements

Phase 1 (Months 1-4) — Lean

Resource	Spec	Monthly Cost
API (Cloudflare Workers)	Free tier → $5/mo	$0-5
PostgreSQL	Neon or Supabase free tier → Pro	$0-25
Redis	Upstash free tier	$0-10
R2 storage	10GB	$0.15
Scanning workers	1x small VM (Hetzner/Fly.io)	$20-50
Total		~$50-100/mo

Phase 2-3 (Months 4-14) — Growth

Resource	Spec	Monthly Cost
API servers	CF Workers Pro + origin servers	$200-500
PostgreSQL	Managed, r5.large equivalent	$200-500
Redis	Upstash Pro	$50-100
R2 storage	500GB	$7.50
Scanning workers	2-4x dedicated VMs	$200-500
Elasticsearch (if needed)	2-node cluster	$400
Total		~$1,000-2,000/mo

Key insight: Cloudflare infrastructure (Workers, R2, KV) keeps costs dramatically lower than AWS-first approach. R2’s $0 egress is significant for serving trust score badges and API responses globally.

Security Architecture

Platform Security

All data encrypted at rest (AES-256) and in transit (TLS 1.3)
SOC2 Type II compliance target by Month 18
GDPR-compliant data handling
Regular penetration testing (quarterly from Phase 3)
Bug bounty program (launch at Phase 2)

Skill Isolation

Skills are stored and analyzed in sandboxed environments
Scanner runs in ephemeral containers (no persistent state)
No skill code is executed during scanning (static + semantic analysis only)
User-installed skills run in user’s own environment

Supply Chain Security

All scanner dependencies pinned and hash-verified
SBOM generated for every release
Transparent scanning methodology published

Build vs. Buy

Component	Decision	Rationale
Security Scanner	BUILD	Core IP, competitive moat
Trust Score Algorithm	BUILD	Core differentiation
Search Engine	BUY (pgvector + tsvector)	Proven, no need to reinvent
Payments (Phase 4)	BUY (Stripe Connect)	Complex compliance handled
Auth/SSO	BUY (Clerk/Auth0)	Faster time-to-market, SCIM
Embedding Model	BUY (OpenAI API)	Commodity
MCP Server Framework	BUILD	Custom discovery logic
Ingestion Pipeline	BUILD	Custom per-registry logic

Scalability Targets

Dimension	Year 1	Year 3	Approach
Skills indexed	100K	1M+	Horizontal scaling, read replicas
Searches/day	10K	500K	Edge caching, CDN
Scans/day	1K	50K	Worker auto-scaling
API requests/sec	50	2,000	CF Workers auto-scaling