Smart Apply — High-Level Architecture

Based on PRD v2.1 and TRD v1.0 This document defines the system-level architecture, component responsibilities, and data flows. Business Requirements Document (BRD), Detailed HLD, and LLD will follow.

1. System Context

Smart Apply is a semi-automated AI assistant that helps job seekers tailor resumes for specific job descriptions, calculate ATS compatibility scores, generate PDF resumes, and auto-fill application forms — all while keeping the human in the loop.

Core Principles

Client-first processing — scraping, PDF rendering, and form fill happen in the browser
Server-side intelligence — AI reasoning, scoring, and orchestration run on the backend
Zero resume file storage — generated PDFs live in the user’s Google Drive, never on our servers
Explicit user approval — AI suggestions and final submission always require human confirmation

2. Repository Structure (Polyrepo)

smart-apply-shared/       Zod schemas + TypeScript types (npm package)
smart-apply-backend/      NestJS API — auth, scoring, LLM orchestration
smart-apply-web/          Next.js web portal — dashboard, profile editor
smart-apply-extension/    Chrome Extension (Manifest V3) — scraping, PDF, autofill
smart-apply-doc/          PRD, TRD, OpenAPI spec, SQL schema, architecture docs

Each repo is independently buildable and deployable. smart-apply-shared is consumed via file: references in development and published as an npm package for CI/CD.

3. High-Level Architecture Diagram

graph TB
    subgraph Client["Browser (Chrome Extension)"]
        CS_LP["Content Script<br/>LinkedIn Profile Parser"]
        CS_JD["Content Script<br/>JD Detector"]
        CS_AF["Content Script<br/>Form Autofill"]
        BG["Background<br/>Service Worker"]
        POPUP["Popup / Side Panel UI"]
        PDF["PDF Generator<br/>(pdf-lib)"]
    end

    subgraph WebPortal["Web Portal (Next.js)"]
        DASH["Dashboard"]
        PROFILE["Profile Editor"]
        HISTORY["Application History"]
    end

    subgraph Backend["Backend API (NestJS)"]
        AUTH["Auth Guard<br/>(Clerk JWT)"]
        PROF_SVC["Profiles Service"]
        OPT_SVC["Optimize Orchestrator"]
        SCORE["ATS Scoring Engine"]
        APP_SVC["Applications Service"]
        JD_PARSE["JD Parser /<br/>Requirements Extractor"]
    end

    subgraph External["External Services"]
        CLERK["Clerk<br/>Auth Provider"]
        SUPA["Supabase<br/>PostgreSQL + RLS"]
        LLM["LLM Provider<br/>(OpenAI / Anthropic)"]
        GDRIVE["Google Drive<br/>API"]
    end

    %% Extension → Backend
    CS_LP -->|raw profile text| BG
    CS_JD -->|JD text + metadata| BG
    BG -->|API calls| AUTH

    %% Auth flow
    CLERK -.->|JWT verification| AUTH
    POPUP -->|login / token| CLERK

    %% Backend internal
    AUTH --> PROF_SVC
    AUTH --> OPT_SVC
    AUTH --> APP_SVC
    OPT_SVC --> JD_PARSE
    OPT_SVC --> SCORE
    OPT_SVC --> LLM
    PROF_SVC --> SUPA
    APP_SVC --> SUPA
    OPT_SVC --> SUPA

    %% Web Portal → Backend
    DASH -->|fetch applications| AUTH
    PROFILE -->|read/update profile| AUTH
    HISTORY -->|status updates| AUTH

    %% Client-side outputs
    BG -->|optimized JSON| POPUP
    POPUP -->|approved resume| PDF
    PDF -->|upload PDF| GDRIVE
    PDF -->|save metadata| BG
    BG -->|save application| AUTH

    %% Autofill
    BG -->|profile data| CS_AF
    CS_AF -->|fill form fields| CS_AF

    classDef client fill:#dbeafe,stroke:#2563eb,color:#1e3a5f
    classDef web fill:#dcfce7,stroke:#16a34a,color:#14532d
    classDef backend fill:#fef3c7,stroke:#d97706,color:#78350f
    classDef external fill:#f3e8ff,stroke:#9333ea,color:#581c87

    class CS_LP,CS_JD,CS_AF,BG,POPUP,PDF client
    class DASH,PROFILE,HISTORY web
    class AUTH,PROF_SVC,OPT_SVC,SCORE,APP_SVC,JD_PARSE backend
    class CLERK,SUPA,LLM,GDRIVE external

4. Core Data Flow Diagrams

4.1 Profile Ingestion Flow

sequenceDiagram
    actor User
    participant Ext as Chrome Extension
    participant BG as Background Worker
    participant API as Backend API
    participant LLM as LLM Provider
    participant DB as Supabase

    User->>Ext: Visit LinkedIn profile page
    Ext->>Ext: Content script extracts DOM text
    Ext->>BG: SYNC_PROFILE message
    BG->>API: POST /api/profile/ingest<br/>{source, raw_text, source_url}
    API->>API: Validate & sanitize input
    API->>LLM: Parse raw text → structured profile
    LLM-->>API: {full_name, skills, experiences, ...}
    API->>DB: Upsert master_profiles
    DB-->>API: Profile saved
    API-->>BG: {success, profile}
    BG-->>Ext: Show sync result
    Ext-->>User: ✓ Profile synced

4.2 Resume Optimization Flow (Core Loop)

sequenceDiagram
    actor User
    participant Ext as Chrome Extension
    participant BG as Background Worker
    participant API as Backend API
    participant Score as ATS Scoring Engine
    participant LLM as LLM Provider
    participant DB as Supabase
    participant Drive as Google Drive

    User->>Ext: Browse job posting page
    Ext->>Ext: Content script detects JD
    Ext->>BG: OPTIMIZE_JD message<br/>{jd_text, company, job_title}
    BG->>API: POST /api/optimize

    rect rgb(255, 249, 219)
        Note over API,LLM: Backend Optimization Pipeline
        API->>DB: Load master_profiles
        DB-->>API: User profile
        API->>API: Extract JD requirements<br/>(hard_skills, soft_skills, certs)
        API->>Score: Calculate ATS score BEFORE
        Score-->>API: ats_score_before (e.g. 48)
        API->>LLM: Optimize resume<br/>{profile, requirements, constraints}
        LLM-->>API: {summary, skills, experience_edits, warnings}
        API->>API: Validate LLM response schema
        API->>Score: Calculate ATS score AFTER
        Score-->>API: ats_score_after (e.g. 86)
    end

    API-->>BG: OptimizeResponse
    BG-->>Ext: Show results in popup

    rect rgb(219, 234, 254)
        Note over User,Drive: User Review & Generation
        Ext-->>User: Display ATS: 48% → 86%<br/>+ diff of suggested changes
        User->>Ext: Review, toggle changes, Approve
        Ext->>Ext: pdf-lib renders PDF
        Ext->>Drive: Upload PDF to<br/>Resume-Flow/{company}/
        Drive-->>Ext: shareable link
    end

    Ext->>BG: Save application metadata
    BG->>API: POST /api/applications<br/>{company, job_title, drive_link, scores, snapshot}
    API->>DB: Insert application_history
    DB-->>API: application_id
    API-->>BG: {success}

4.3 Form Autofill Flow

sequenceDiagram
    actor User
    participant Ext as Chrome Extension
    participant AF as Autofill Content Script
    participant BG as Background Worker

    User->>Ext: Open job application form
    AF->>AF: Detect form inputs<br/>(label, name, placeholder, aria-label)
    AF->>AF: Inject "Auto-fill" button
    User->>AF: Click Auto-fill
    AF->>BG: Request profile + optimized data
    BG-->>AF: {name, email, phone, summary, skills, ...}
    AF->>AF: Map fields → inputs<br/>Native value setter + event dispatch
    AF-->>User: Form populated

    alt Unsupported fields detected
        AF-->>User: Show "Copy to Clipboard" buttons
    end

    Note over User: User reviews & submits manually

5. Authentication & Authorization Flow

flowchart LR
    subgraph Clients
        EXT[Chrome Extension]
        WEB[Web Portal]
    end

    subgraph Auth
        CLERK[Clerk]
    end

    subgraph Backend
        GUARD[Auth Guard]
        API[API Routes]
    end

    subgraph Data
        SUPA[Supabase + RLS]
    end

    EXT -->|1. Login| CLERK
    WEB -->|1. Login| CLERK
    CLERK -->|2. JWT issued| EXT
    CLERK -->|2. JWT issued| WEB
    EXT -->|3. Bearer JWT| GUARD
    WEB -->|3. Bearer JWT| GUARD
    GUARD -->|4. verify token| CLERK
    GUARD -->|5. Extract clerk_user_id| API
    API -->|6. Query with user context| SUPA

Key Rules:

Extension stores token in chrome.storage.local
Web portal uses Clerk’s built-in Next.js middleware
Backend verifies JWT using @clerk/backend verifyToken()
Supabase enforces row-level security: clerk_user_id = auth.jwt()->>'sub'
Google Drive scope limited to drive.file (only files created by app)

6. Data Model Overview

erDiagram
    master_profiles {
        uuid id PK
        text clerk_user_id UK
        text full_name
        text email
        text phone
        text location
        text linkedin_url
        text summary
        jsonb base_skills
        jsonb certifications
        jsonb experiences
        jsonb education
        text raw_profile_source
        int profile_version
        timestamptz created_at
        timestamptz updated_at
    }

    application_history {
        uuid id PK
        text clerk_user_id FK
        text company_name
        text job_title
        text source_platform
        text source_url
        text job_description_hash
        text drive_link
        int ats_score_before
        int ats_score_after
        jsonb applied_resume_snapshot
        text status
        timestamptz created_at
        timestamptz applied_at
        timestamptz updated_at
    }

    user_integrations {
        uuid id PK
        text clerk_user_id FK
        text provider
        text provider_account_email
        text access_scope
        text refresh_token_encrypted
        timestamptz created_at
        timestamptz updated_at
    }

    master_profiles ||--o{ application_history : "clerk_user_id"
    master_profiles ||--o{ user_integrations : "clerk_user_id"

Status Lifecycle: draft → generated → applied → interviewing → offer / rejected / withdrawn

7. Component Responsibilities

Component	Responsibility	Tech Stack
smart-apply-extension	DOM scraping (LinkedIn/Indeed), JD detection, user review UI, PDF rendering (pdf-lib), form autofill, Google Drive upload	React, Tailwind, Manifest V3, Vite, pdf-lib
smart-apply-backend	Auth verification, profile CRUD, JD parsing, ATS scoring, LLM orchestration, application metadata	NestJS 11, Clerk, Supabase, Zod
smart-apply-web	Dashboard, application history, profile editor, settings	Next.js 15, React 19, Clerk, TanStack Query, shadcn/ui
smart-apply-shared	Shared types, Zod schemas, enums	TypeScript, Zod
smart-apply-doc	PRD, TRD, OpenAPI, SQL schema, architecture	Markdown

8. Deployment Architecture

graph TB
    subgraph Production
        VERCEL["Vercel<br/>(smart-apply-web)"]
        RENDER["Render / Railway<br/>(smart-apply-backend)"]
        SUPA_CLOUD["Supabase Cloud<br/>(PostgreSQL + RLS)"]
        CWS["Chrome Web Store<br/>(smart-apply-extension)"]
    end

    subgraph External
        CLERK_CLOUD["Clerk Cloud"]
        LLM_API["LLM API<br/>(OpenAI / Anthropic)"]
        GDRIVE_API["Google Drive API"]
    end

    CWS -->|API calls| RENDER
    VERCEL -->|API calls| RENDER
    RENDER -->|DB queries| SUPA_CLOUD
    RENDER -->|JWT verify| CLERK_CLOUD
    RENDER -->|AI prompts| LLM_API
    CWS -->|PDF upload| GDRIVE_API
    CWS -->|Auth| CLERK_CLOUD

    classDef prod fill:#dbeafe,stroke:#2563eb
    classDef ext fill:#f3e8ff,stroke:#9333ea

    class VERCEL,RENDER,SUPA_CLOUD,CWS prod
    class CLERK_CLOUD,LLM_API,GDRIVE_API ext

9. ATS Scoring Engine (Heuristic)

Total: 100 points
├── Hard Skills Match:      50 pts (exact + synonym matching)
├── Role/Domain Relevance:  20 pts (job title ↔ experience titles)
├── Seniority Alignment:    10 pts (years/level match)
├── Soft Skills & Certs:    10 pts (keyword presence)
└── Keyword Coverage:       10 pts (density across resume sections)

Synonym map supports equivalences like Node ↔ Node.js, Postgres ↔ PostgreSQL. Keyword spam is capped. Score is labeled as an internal heuristic, not a guarantee of ATS passage.

10. AI Orchestration Pipeline

graph LR
    A["Master Profile<br/>(JSON)"] --> D["Prompt Builder"]
    B["Raw JD Text"] --> C["Requirements<br/>Extractor"]
    C --> D
    D --> E["LLM Call"]
    E --> F["Schema Validator"]
    F --> G["Optimized Resume<br/>JSON"]

    style D fill:#fef3c7,stroke:#d97706
    style E fill:#f3e8ff,stroke:#9333ea
    style F fill:#dcfce7,stroke:#16a34a

LLM Constraints (from TRD §10.3):

No fabricated experience or certifications
Minimal edit over full rewrite
Infer adjacent skills only with explicit caution
Low-confidence suggestions flagged in warnings[]
Output validated against strict JSON schema

LLM Output Contract:

{
  "summary": "string",
  "skills": ["string"],
  "experience_edits": [
    {
      "company": "string",
      "original_bullet": "string",
      "revised_bullet": "string",
      "inserted_keywords": ["string"],
      "confidence": 0.91
    }
  ],
  "warnings": ["string"]
}

11. Security Architecture

Layer	Measure
Auth	Clerk JWT — verified on every API request
Data isolation	Supabase RLS — `clerk_user_id = auth.jwt()->>'sub'`
Storage	Zero resume file storage on server; PDFs in user’s Drive only
Secrets	Server-only env vars; extension bundle contains no secrets
Input sanitization	DOM-extracted text sanitized before backend processing
XSS prevention	No unsafe HTML rendering in diff UI
Account deletion	Clerk webhook → cascading hard delete in Supabase
Google Drive	`drive.file` scope only — cannot access user’s other files
PII	No PII in logs; raw JD stored temporarily only

12. Development Phases (from TRD §23)

Phase	Scope
Phase 1	Clerk auth, Supabase schema + RLS, Web Portal shell, Extension shell + auth bridge
Phase 2	LinkedIn profile parser, profile ingest API, master profile CRUD
Phase 3	JD extractor, optimize API, ATS scoring engine, LLM response validation
Phase 4	Review UI, PDF generation, Google Drive upload, application_history
Phase 5	Autofill engine, LinkedIn Easy Apply support, clipboard fallback
Phase 6	Observability, QA hardening, Chrome Web Store release

Next Documents

BRD — Business Requirements Document (stakeholder-facing)
HLD — Detailed High-Level Design per component
LLD — Low-Level Design (class diagrams, API contracts, DB migrations)
AI Prompt Spec — System prompts, few-shot examples, guardrails
Deployment Runbook — Environment setup, CI/CD, monitoring

Smart Apply architecture