Keyword alerts miss everything that matters. "Emerging market currency contagion" won't match "Turkish lira plunges as market prices in another Fed hike." We spent six years in quant finance watching teams build elaborate keyword-based monitoring systems that consistently failed at the one thing they needed: matching meaning, not strings.
So we built Sembr — an open-source, self-hosted semantic news radar. Write any natural language intent. It continuously scans 53 news sources (Chinese + English), finds the semantically relevant articles on arrival, summarizes them via LLM, and pushes the result to your email. No keyword config. No Boolean logic. One docker compose up.
Today we're releasing Sembr v1.0 under Apache 2.0 on GitHub.
What's in the Box
53 pre-configured sources, ready to run out of the box — international wire services, financial media, Chinese-language news outlets, academic journals, tech blogs (TechCrunch, Wired, The Verge), open-source trackers (HelloGitHub, GitHub Trending), and select Twitter feeds. One intent covers both languages simultaneously.
git clone https://github.com/Peakstone-Labs/sembr.git
cd sembr
cp .env.example .env # fill in one API key
docker compose up --build
Once it's running, a web panel gives you six tabs — Feeds, Intents, Templates, Articles, Logs, Settings. Everything is manageable from the UI. No YAML editing. No command-line config.
Architecture
The pipeline is a six-step closed loop:
Collector
Custom RSS + API fetcher. Configurable intervals per source — 5 minutes for breaking news, daily for policy tracking.
Embedder — BGE-M3 via SiliconFlow
Free tier. Encodes both intents and articles into the same vector space. One embedding model, two languages.
Vector Store — Qdrant
Open-source vector database, self-hosted alongside the app. No external vector service dependency.
Matcher — Reverse RAG Engine
Each intent's vector queries against incoming article vectors. Configurable similarity threshold (0–1) per intent — tighter for precision, looser for recall.
Summarizer — DeepSeek-V4-Flash
Only fires on matches. Each intent has its own analysis prompt — a quant macro analyst and a supply-chain analyst can read the same article through different lenses.
Notifier — SMTP Email
Digest format configurable per intent. Telegram / Discord / Slack channels on the blueprint.
Three Knobs Per Intent
Matching Tightness
Threshold 0 to 1. Higher = more precise, lower = higher recall. Set independently per intent.
Analysis Angle
Write your own prompt. Same article, different perspectives — cross-asset rotation, supply-demand margins, or regulatory signals.
Scan Cadence
Per-source, per-intent. 5-minute cycles for breaking events, daily digests for policy tracking.
Cost: under ¥0.10/day (~$0.014) per intent. BGE-M3 embedding runs on SiliconFlow's free tier. LLM summarization calls DeepSeek-V4-Flash — only fires on matches. Self-hosting means no monthly subscription, and your data never touches a third-party SaaS provider.
Designed for AI Agents, Not Just Humans
This is the part we're most excited about. If you're running an AI-agent-heavy setup — whether it's OpenClaw, LangGraph, or plain Claude — your agents shouldn't need you to read documentation and configure things for them. Sembr treats AI agents as first-class citizens from day one.
Layer 1: Agents Deploy It Themselves Install
The repository includes sembr/agent/INSTALL.md, a structured guide written for agent consumption — explicit phases, parallel work markers, idempotency checks, consent prompts for privileged operations. Drop the repo URL into an agent and it installs Sembr end-to-end in about 15 minutes.
Layer 2: Agents Manage It Themselves Operate
sembr/agent/sembr/ is a standard agent skills package — 5 files covering auth models, 31 API endpoints, request schemas, ready-to-use code recipes, and error formats. Load this into any skills-compatible agent framework and it can create intents, tweak thresholds, modify analysis prompt templates, trigger diagnostic fires, and read matching records — all without human intervention.
Layer 3: Agents Use Sembr as a Pipeline Node Orchestrate
POST /api/external/intents/{id}/fire is a synchronous endpoint designed for external agent orchestration. One HTTP call returns matched articles plus LLM summaries. It writes no records, sends no notifications, and can be called repeatedly without side effects. Imagine your morning routine: your agent calls Sembr at 7 AM, reads the macro scan, checks your portfolio against the risk model, and drafts a briefing email — all before you wake up.
Why Not Just Use Existing Tools?
| Semantic | Bilingual | Custom Sources | Self-Hosted | Per-Intent Analysis | |
|---|---|---|---|---|---|
| Feedly Pro+ AI | ✅ | ⚠️ | ⚠️ | ❌ | ⚠️ |
| Inoreader Pro | ❌ | ⚠️ | ✅ | ❌ | ⚠️ |
| Bloomberg Terminal | ✅ | ✅ | ❌ | ❌ | ❌ |
| Perplexity Pro | ✅ | ⚠️ | ❌ | ❌ | ⚠️ |
| Sembr | ✅ | ✅ | ✅ | ✅ | ✅ |
Common question: "Can't I just script a Perplexity API call on a timer?" Three structural gaps:
- Cost. Perplexity charges per query. 10 intents × 24 rounds × 365 days ≈ 88,000 calls. Sembr's embedding layer is free; LLM calls only fire on matches.
- Matching quality. You write a search query manually each time. "Emerging market currency contagion" won't match "Turkish lira plunges" in keyword search. Semantic vectors catch the connection.
- Source boundary. Perplexity relies on search engine indices — coverage and timeliness depend on crawler frequency. Sembr connects directly to RSS feeds and APIs.
Blueprint
- More push channels — Telegram, Discord, Slack. Scaffold already in place.
- Local models — Ollama / mlx-lm for embedding and summarization. Data never leaves the machine.
- More source plugins — Reddit, Hacker News, Mastodon, plus community plugin discovery.
- Memory system — Daily analysis archived as structured timelines. Agents revisit "what did we think then vs. what actually happened" and calibrate from the gaps over time.
Get Started
GitHub: github.com/Peakstone-Labs/sembr — Star, fork, open issues, send PRs.
Documentation: peakstone-labs.github.io/sembr — Quickstart, configuration guide, architecture docs, plugin dev guide.
Live demo: panel.peakstone-labs.com — Switch to the Hot News tab and see what Sembr is tracking right now. No install required.
Sembr is our first open-source release, and it won't be our last. We're a quant firm that runs lean — one person, a team of AI agents, and a server. The tools we build for ourselves turn out to be things other teams struggle with too. Open-sourcing them makes the tools better, the community stronger, and the feedback loop faster.
Check back — we're building in public. There's more coming.
Sembr daily briefings are AI-generated. Content is for informational and research purposes only and does not constitute investment advice.