Collect Everything
Ingesting Notes, Bookmarks & Docs
The Collection Problem
You have knowledge scattered everywhere. Notes in one app, bookmarks in another, articles you half-read in a third, meeting notes in documents, project updates in chat threads. Each source has a different format, different metadata, different structure.
A second brain starts by solving this: get everything into one place, in one format, with the right metadata attached.
Why Unified Ingestion Matters
Without a unified ingestion layer, you end up with:
The fix isn't a better app. It's a pipeline that normalizes everything into a common format.
The Universal Document Schema
Every piece of knowledge, regardless of source, has these core properties:
| Field | Purpose | Example |
|---|---|---|
| `id` | Unique identifier | `note-003`, `bookmark-007` |
| `content` | The actual text | Note body, article summary, meeting transcript |
| `source` | Where it came from | `notes`, `bookmarks`, `articles`, `meetings`, `projects` |
| `title` | Human-readable label | "React Server Components Deep Dive" |
| `tags` | Topic labels | `["react", "architecture", "frontend"]` |
| `createdAt` | When it was captured | `2025-03-15` |
| `metadata` | Source-specific extras | URL for bookmarks, attendees for meetings |
The key insight: metadata varies by source, but the core schema is universal. A bookmark has a URL. A meeting note has attendees. But both have content, tags, and a date.
Building the Pipeline
A good ingestion pipeline follows three steps:
Source Files → Reader → Transformer → Deduplicator → Unified DocumentsEach source gets its own reader function, but they all output the same Document type. This makes the rest of the pipeline (chunking, search, connections) source-agnostic.
Content Hashing for Deduplication
When you ingest from multiple sources, the same content can appear more than once — a note that quotes a bookmark, a meeting summary that restates a project doc. Content hashing catches these:
import { createHash } from "crypto";
function hashContent(content: string): string {
return createHash("sha256").update(content.trim().toLowerCase()).digest("hex");
}If two documents produce the same hash, keep the one with richer metadata.
Making It Extensible
A well-designed ingestion layer makes adding new sources trivial. Each source is a function that takes raw data and returns Documents. When you want to add Slack messages or email archives later, you write one new reader function — everything downstream just works.
Key Takeaways
This is chapter 1 of AI-Powered Second Brain.
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details