Skip to content

Technical Architecture (Validated)

┌─────────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────────────┐ │
│ │ Web App │ │ CLI │ │ Browser │ │ Findable MCP │ │
│ │ (Next.js)│ │ (Node) │ │Extension │ │ Server │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────────┬──────────┘ │
│ │ │ │ │ │
└───────┼──────────────┼──────────────┼──────────────────┼─────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ API LAYER │
│ (Fastify/Hono, Cloudflare Workers) │
│ Rate limiting, Auth, Routing, Caching │
└────────────────────────────┬────────────────────────────────────────┘
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ DISCOVERY │ │ SECURITY │ │ ENTERPRISE │
│ SERVICES │ │ SERVICES │ │ SERVICES │
│ │ │ │ │ │
│ Search API │ │ Scanner │ │ Private Reg │
│ Catalog API │ │ Trust Scorer │ │ Policy Eng │
│ Registry API │ │ Alert Engine │ │ Audit Log │
│ Review API │ │ Verify Svc │ │ SSO/SCIM │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ DATA LAYER │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │PostgreSQL│ │ Redis │ │ R2 │ │ pgvector │ │
│ │(Primary) │ │ (Cache) │ │ (Assets) │ │ (Semantic) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ INGESTION PIPELINE │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Official │ │ Smithery │ │ PulseMCP │ │ GitHub │ │
│ │ MCP Reg │ │ Sync │ │ Sync │ │ Sync │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Glama │ │ MCP.so │ │skills.sh │ │ SkillsMP │ │
│ │ Sync │ │ Sync │ │ Sync │ │ Sync │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

Purpose: Unified skill index across all registries

Data model:

Skill {
id: UUID
name: string
description: string
version: string (semver)
protocol: enum (mcp, skill_md, both)
source_registry: enum (official_mcp, smithery, pulsemcp, glama,
mcp_so, skills_sh, skillsmp, clawhub,
github, direct)
source_url: string
publisher_id: UUID (FK → Publisher)
content_hash: string (SHA-256)
manifest_content: text
tags: string[]
categories: string[]
platform_compatibility: string[]
trust_score: float (0-100)
security_grade: enum (A, B, C, D, F)
install_count: int
avg_rating: float (0-5)
last_scanned_at: timestamp
last_updated_at: timestamp
created_at: timestamp
}
Publisher {
id: UUID
name: string
github_id: string
verification_level: enum (unverified, verified, organization)
reputation_score: float (0-100)
skills_published: int
avg_trust_score: float
created_at: timestamp
}
ScanResult {
id: UUID
skill_id: UUID (FK → Skill)
scanner_version: string
risk_level: enum (critical, high, medium, low, clean)
findings: JSONB[]
permission_map: JSONB
trust_score_contribution: float
scanned_at: timestamp
}

Purpose: Semantic + full-text search across all indexed skills

Technology: PostgreSQL pgvector (semantic) + full-text search (keyword)

Approach:

  1. Skill name + description → embedded using embedding model (e.g., text-embedding-3-small)
  2. Stored as vectors in pgvector
  3. Full-text index for keyword search (PostgreSQL tsvector)
  4. Hybrid search: combine vector similarity + keyword relevance + trust score

Search ranking formula:

score = 0.35 * semantic_similarity
+ 0.20 * keyword_relevance
+ 0.25 * trust_score_normalized
+ 0.10 * install_count_log_normalized
+ 0.10 * recency_score

Note: Trust score weighted at 25% — this is the key differentiator vs. other registries. High trust score = better ranking. This creates an incentive for developers to improve their security posture.


Purpose: Automated security analysis of MCP server manifests and SKILL.md files

Scanning pipeline:

Input (SKILL.md / MCP manifest + supporting files)
├─→ [Stage 1] Static Analysis
│ ├─ Regex patterns for API keys, tokens, passwords
│ ├─ Known malware signature matching
│ ├─ File type validation (reject suspicious binaries)
│ └─ Hardcoded credential detection
├─→ [Stage 2] Semantic Analysis
│ ├─ Prompt injection detection (LLM-based)
│ ├─ Instruction analysis (what does the skill tell the agent to do?)
│ ├─ Data exfiltration pattern detection
│ └─ Privilege escalation checks
├─→ [Stage 3] Dependency Analysis
│ ├─ Script dependency scanning (npm audit, pip safety)
│ ├─ Known vulnerability matching (CVE database)
│ └─ Supply chain risk assessment
├─→ [Stage 4] Permission Mapping
│ ├─ File/directory access patterns
│ ├─ Network call destinations
│ ├─ System command execution
│ └─ Data read/write scope
└─→ [Output] Security Report
├─ Grade: A/B/C/D/F
├─ Risk level: CRITICAL / HIGH / MEDIUM / LOW / CLEAN
├─ Findings with severity + remediation
├─ Permission map
└─ Trust score contribution
trust_score = weighted_sum(
security_scan_score * 0.30, # Grade from scanner (0-100)
publisher_reputation * 0.20, # Verified status, history, other skills
community_signals * 0.15, # Install count, ratings, reviews
code_quality_signals * 0.15, # Update frequency, docs, tests
age_and_stability * 0.10, # Time since first publish, version count
transparency_score * 0.10 # Open source, clear permissions, changelog
)

Differentiation vs. Snyk/Invariant Labs:

  • Snyk scans and reports. Findable scores and ranks.
  • Trust scores are visible at discovery time (before installation), not after deployment.
  • Integrated with search ranking — security directly affects discoverability.
  • Re-scan all indexed skills every 7 days
  • Immediate re-scan on version updates
  • Alert publishers on new vulnerabilities
  • Revoke trust scores for skills that fail scans

Purpose: Allow AI agents to discover and evaluate skills through Findable directly.

This is a critical strategic asset — it makes Findable the discovery layer INSIDE the agent, not just a website.

MCP Tools exposed:

findable_search
- Input: query (string), filters (object)
- Output: List of skills with trust scores, descriptions, compatibility
findable_get_details
- Input: skill_id or source_url (string)
- Output: Full skill details, security report, reviews
findable_check_trust
- Input: skill_id or skill_url (string)
- Output: Trust score, security grade, findings, recommendation
findable_get_alternatives
- Input: skill_id (string)
- Output: Similar skills ranked by trust score + relevance

Distribution: Published on Smithery, MCP.so, and documented for direct integration with Claude Code, OpenClaw, Codex CLI, and Gemini CLI.


LayerTechnologyRationale
FrontendNext.js + TailwindSSR for SEO/GEO, fast iteration
CLINode.js (TypeScript)Cross-platform, npm distribution
APIHono / FastifyPerformance, edge-compatible
DatabasePostgreSQL + pgvectorRelational + vector search in one DB
SearchPostgreSQL tsvector (start) → Elasticsearch (scale)Simple start, scale when needed
CacheRedis / UpstashSession, rate limiting, hot data
Object StorageCloudflare R2$0 egress, cost-effective
QueueBullMQ (Redis)Async scanning, ingestion jobs
AuthClerk or Auth0OAuth, SSO, SCIM for enterprise
HostingCloudflare Workers + AWS (scanning workers)Edge for API, compute for scanning
CI/CDGitHub ActionsStandard
MonitoringGrafana Cloud or DatadogObservability
Embeddingtext-embedding-3-smallSemantic search vectors
LLM (scanning)Claude Haiku or GPT-4o-miniPrompt injection detection

ResourceSpecMonthly Cost
API (Cloudflare Workers)Free tier → $5/mo$0-5
PostgreSQLNeon or Supabase free tier → Pro$0-25
RedisUpstash free tier$0-10
R2 storage10GB$0.15
Scanning workers1x small VM (Hetzner/Fly.io)$20-50
Total~$50-100/mo
ResourceSpecMonthly Cost
API serversCF Workers Pro + origin servers$200-500
PostgreSQLManaged, r5.large equivalent$200-500
RedisUpstash Pro$50-100
R2 storage500GB$7.50
Scanning workers2-4x dedicated VMs$200-500
Elasticsearch (if needed)2-node cluster$400
Total~$1,000-2,000/mo

Key insight: Cloudflare infrastructure (Workers, R2, KV) keeps costs dramatically lower than AWS-first approach. R2’s $0 egress is significant for serving trust score badges and API responses globally.


  • All data encrypted at rest (AES-256) and in transit (TLS 1.3)
  • SOC2 Type II compliance target by Month 18
  • GDPR-compliant data handling
  • Regular penetration testing (quarterly from Phase 3)
  • Bug bounty program (launch at Phase 2)
  • Skills are stored and analyzed in sandboxed environments
  • Scanner runs in ephemeral containers (no persistent state)
  • No skill code is executed during scanning (static + semantic analysis only)
  • User-installed skills run in user’s own environment
  • All scanner dependencies pinned and hash-verified
  • SBOM generated for every release
  • Transparent scanning methodology published

ComponentDecisionRationale
Security ScannerBUILDCore IP, competitive moat
Trust Score AlgorithmBUILDCore differentiation
Search EngineBUY (pgvector + tsvector)Proven, no need to reinvent
Payments (Phase 4)BUY (Stripe Connect)Complex compliance handled
Auth/SSOBUY (Clerk/Auth0)Faster time-to-market, SCIM
Embedding ModelBUY (OpenAI API)Commodity
MCP Server FrameworkBUILDCustom discovery logic
Ingestion PipelineBUILDCustom per-registry logic

DimensionYear 1Year 3Approach
Skills indexed100K1M+Horizontal scaling, read replicas
Searches/day10K500KEdge caching, CDN
Scans/day1K50KWorker auto-scaling
API requests/sec502,000CF Workers auto-scaling