Technical Architecture (Validated)
Architecture Overview
Section titled “Architecture Overview”┌─────────────────────────────────────────────────────────────────────┐│ CLIENT LAYER ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────────────┐ ││ │ Web App │ │ CLI │ │ Browser │ │ Findable MCP │ ││ │ (Next.js)│ │ (Node) │ │Extension │ │ Server │ ││ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────────┬──────────┘ ││ │ │ │ │ │└───────┼──────────────┼──────────────┼──────────────────┼─────────────┘ │ │ │ │ ▼ ▼ ▼ ▼┌─────────────────────────────────────────────────────────────────────┐│ API LAYER ││ (Fastify/Hono, Cloudflare Workers) ││ Rate limiting, Auth, Routing, Caching │└────────────────────────────┬────────────────────────────────────────┘ │ ┌────────────────────┼────────────────────┐ ▼ ▼ ▼┌──────────────┐ ┌──────────────┐ ┌──────────────┐│ DISCOVERY │ │ SECURITY │ │ ENTERPRISE ││ SERVICES │ │ SERVICES │ │ SERVICES ││ │ │ │ │ ││ Search API │ │ Scanner │ │ Private Reg ││ Catalog API │ │ Trust Scorer │ │ Policy Eng ││ Registry API │ │ Alert Engine │ │ Audit Log ││ Review API │ │ Verify Svc │ │ SSO/SCIM │└──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ ▼ ▼ ▼┌─────────────────────────────────────────────────────────────────────┐│ DATA LAYER ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ ││ │PostgreSQL│ │ Redis │ │ R2 │ │ pgvector │ ││ │(Primary) │ │ (Cache) │ │ (Assets) │ │ (Semantic) │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────────┐│ INGESTION PIPELINE ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Official │ │ Smithery │ │ PulseMCP │ │ GitHub │ ││ │ MCP Reg │ │ Sync │ │ Sync │ │ Sync │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Glama │ │ MCP.so │ │skills.sh │ │ SkillsMP │ ││ │ Sync │ │ Sync │ │ Sync │ │ Sync │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││ │└─────────────────────────────────────────────────────────────────────┘Core Services
Section titled “Core Services”Catalog Service
Section titled “Catalog Service”Purpose: Unified skill index across all registries
Data model:
Skill { id: UUID name: string description: string version: string (semver) protocol: enum (mcp, skill_md, both) source_registry: enum (official_mcp, smithery, pulsemcp, glama, mcp_so, skills_sh, skillsmp, clawhub, github, direct) source_url: string publisher_id: UUID (FK → Publisher) content_hash: string (SHA-256) manifest_content: text tags: string[] categories: string[] platform_compatibility: string[] trust_score: float (0-100) security_grade: enum (A, B, C, D, F) install_count: int avg_rating: float (0-5) last_scanned_at: timestamp last_updated_at: timestamp created_at: timestamp}
Publisher { id: UUID name: string github_id: string verification_level: enum (unverified, verified, organization) reputation_score: float (0-100) skills_published: int avg_trust_score: float created_at: timestamp}
ScanResult { id: UUID skill_id: UUID (FK → Skill) scanner_version: string risk_level: enum (critical, high, medium, low, clean) findings: JSONB[] permission_map: JSONB trust_score_contribution: float scanned_at: timestamp}Search Service
Section titled “Search Service”Purpose: Semantic + full-text search across all indexed skills
Technology: PostgreSQL pgvector (semantic) + full-text search (keyword)
Approach:
- Skill name + description → embedded using embedding model (e.g.,
text-embedding-3-small) - Stored as vectors in pgvector
- Full-text index for keyword search (PostgreSQL tsvector)
- Hybrid search: combine vector similarity + keyword relevance + trust score
Search ranking formula:
score = 0.35 * semantic_similarity + 0.20 * keyword_relevance + 0.25 * trust_score_normalized + 0.10 * install_count_log_normalized + 0.10 * recency_scoreNote: Trust score weighted at 25% — this is the key differentiator vs. other registries. High trust score = better ranking. This creates an incentive for developers to improve their security posture.
Security Services (Findable Shield)
Section titled “Security Services (Findable Shield)”Scanner Engine
Section titled “Scanner Engine”Purpose: Automated security analysis of MCP server manifests and SKILL.md files
Scanning pipeline:
Input (SKILL.md / MCP manifest + supporting files) │ ├─→ [Stage 1] Static Analysis │ ├─ Regex patterns for API keys, tokens, passwords │ ├─ Known malware signature matching │ ├─ File type validation (reject suspicious binaries) │ └─ Hardcoded credential detection │ ├─→ [Stage 2] Semantic Analysis │ ├─ Prompt injection detection (LLM-based) │ ├─ Instruction analysis (what does the skill tell the agent to do?) │ ├─ Data exfiltration pattern detection │ └─ Privilege escalation checks │ ├─→ [Stage 3] Dependency Analysis │ ├─ Script dependency scanning (npm audit, pip safety) │ ├─ Known vulnerability matching (CVE database) │ └─ Supply chain risk assessment │ ├─→ [Stage 4] Permission Mapping │ ├─ File/directory access patterns │ ├─ Network call destinations │ ├─ System command execution │ └─ Data read/write scope │ └─→ [Output] Security Report ├─ Grade: A/B/C/D/F ├─ Risk level: CRITICAL / HIGH / MEDIUM / LOW / CLEAN ├─ Findings with severity + remediation ├─ Permission map └─ Trust score contributionTrust Score Algorithm
Section titled “Trust Score Algorithm”trust_score = weighted_sum( security_scan_score * 0.30, # Grade from scanner (0-100) publisher_reputation * 0.20, # Verified status, history, other skills community_signals * 0.15, # Install count, ratings, reviews code_quality_signals * 0.15, # Update frequency, docs, tests age_and_stability * 0.10, # Time since first publish, version count transparency_score * 0.10 # Open source, clear permissions, changelog)Differentiation vs. Snyk/Invariant Labs:
- Snyk scans and reports. Findable scores and ranks.
- Trust scores are visible at discovery time (before installation), not after deployment.
- Integrated with search ranking — security directly affects discoverability.
Continuous Monitoring
Section titled “Continuous Monitoring”- Re-scan all indexed skills every 7 days
- Immediate re-scan on version updates
- Alert publishers on new vulnerabilities
- Revoke trust scores for skills that fail scans
Findable MCP Server
Section titled “Findable MCP Server”Purpose: Allow AI agents to discover and evaluate skills through Findable directly.
This is a critical strategic asset — it makes Findable the discovery layer INSIDE the agent, not just a website.
MCP Tools exposed:
findable_search - Input: query (string), filters (object) - Output: List of skills with trust scores, descriptions, compatibility
findable_get_details - Input: skill_id or source_url (string) - Output: Full skill details, security report, reviews
findable_check_trust - Input: skill_id or skill_url (string) - Output: Trust score, security grade, findings, recommendation
findable_get_alternatives - Input: skill_id (string) - Output: Similar skills ranked by trust score + relevanceDistribution: Published on Smithery, MCP.so, and documented for direct integration with Claude Code, OpenClaw, Codex CLI, and Gemini CLI.
Technology Stack
Section titled “Technology Stack”| Layer | Technology | Rationale |
|---|---|---|
| Frontend | Next.js + Tailwind | SSR for SEO/GEO, fast iteration |
| CLI | Node.js (TypeScript) | Cross-platform, npm distribution |
| API | Hono / Fastify | Performance, edge-compatible |
| Database | PostgreSQL + pgvector | Relational + vector search in one DB |
| Search | PostgreSQL tsvector (start) → Elasticsearch (scale) | Simple start, scale when needed |
| Cache | Redis / Upstash | Session, rate limiting, hot data |
| Object Storage | Cloudflare R2 | $0 egress, cost-effective |
| Queue | BullMQ (Redis) | Async scanning, ingestion jobs |
| Auth | Clerk or Auth0 | OAuth, SSO, SCIM for enterprise |
| Hosting | Cloudflare Workers + AWS (scanning workers) | Edge for API, compute for scanning |
| CI/CD | GitHub Actions | Standard |
| Monitoring | Grafana Cloud or Datadog | Observability |
| Embedding | text-embedding-3-small | Semantic search vectors |
| LLM (scanning) | Claude Haiku or GPT-4o-mini | Prompt injection detection |
Infrastructure Requirements
Section titled “Infrastructure Requirements”Phase 1 (Months 1-4) — Lean
Section titled “Phase 1 (Months 1-4) — Lean”| Resource | Spec | Monthly Cost |
|---|---|---|
| API (Cloudflare Workers) | Free tier → $5/mo | $0-5 |
| PostgreSQL | Neon or Supabase free tier → Pro | $0-25 |
| Redis | Upstash free tier | $0-10 |
| R2 storage | 10GB | $0.15 |
| Scanning workers | 1x small VM (Hetzner/Fly.io) | $20-50 |
| Total | ~$50-100/mo |
Phase 2-3 (Months 4-14) — Growth
Section titled “Phase 2-3 (Months 4-14) — Growth”| Resource | Spec | Monthly Cost |
|---|---|---|
| API servers | CF Workers Pro + origin servers | $200-500 |
| PostgreSQL | Managed, r5.large equivalent | $200-500 |
| Redis | Upstash Pro | $50-100 |
| R2 storage | 500GB | $7.50 |
| Scanning workers | 2-4x dedicated VMs | $200-500 |
| Elasticsearch (if needed) | 2-node cluster | $400 |
| Total | ~$1,000-2,000/mo |
Key insight: Cloudflare infrastructure (Workers, R2, KV) keeps costs dramatically lower than AWS-first approach. R2’s $0 egress is significant for serving trust score badges and API responses globally.
Security Architecture
Section titled “Security Architecture”Platform Security
Section titled “Platform Security”- All data encrypted at rest (AES-256) and in transit (TLS 1.3)
- SOC2 Type II compliance target by Month 18
- GDPR-compliant data handling
- Regular penetration testing (quarterly from Phase 3)
- Bug bounty program (launch at Phase 2)
Skill Isolation
Section titled “Skill Isolation”- Skills are stored and analyzed in sandboxed environments
- Scanner runs in ephemeral containers (no persistent state)
- No skill code is executed during scanning (static + semantic analysis only)
- User-installed skills run in user’s own environment
Supply Chain Security
Section titled “Supply Chain Security”- All scanner dependencies pinned and hash-verified
- SBOM generated for every release
- Transparent scanning methodology published
Build vs. Buy
Section titled “Build vs. Buy”| Component | Decision | Rationale |
|---|---|---|
| Security Scanner | BUILD | Core IP, competitive moat |
| Trust Score Algorithm | BUILD | Core differentiation |
| Search Engine | BUY (pgvector + tsvector) | Proven, no need to reinvent |
| Payments (Phase 4) | BUY (Stripe Connect) | Complex compliance handled |
| Auth/SSO | BUY (Clerk/Auth0) | Faster time-to-market, SCIM |
| Embedding Model | BUY (OpenAI API) | Commodity |
| MCP Server Framework | BUILD | Custom discovery logic |
| Ingestion Pipeline | BUILD | Custom per-registry logic |
Scalability Targets
Section titled “Scalability Targets”| Dimension | Year 1 | Year 3 | Approach |
|---|---|---|---|
| Skills indexed | 100K | 1M+ | Horizontal scaling, read replicas |
| Searches/day | 10K | 500K | Edge caching, CDN |
| Scans/day | 1K | 50K | Worker auto-scaling |
| API requests/sec | 50 | 2,000 | CF Workers auto-scaling |