Sembr: An Open-Source Semantic News Radar Built for AI Agents

May 18, 2026 · Product Launch

Sembr: Open-Source Semantic Radar — Match Meaning, Not Strings

Keyword alerts miss everything that matters. "Emerging market currency contagion" won't match "Turkish lira plunges as market prices in another Fed hike." We spent six years in quant finance watching teams build elaborate keyword-based monitoring systems that consistently failed at the one thing they needed: matching meaning, not strings.

So we built Sembr — an open-source, self-hosted semantic news radar. Write any natural language intent. It continuously scans 53 news sources (Chinese + English), finds the semantically relevant articles on arrival, summarizes them via LLM, and pushes the result to your email. No keyword config. No Boolean logic. One docker compose up.

Today we're releasing Sembr v1.0 under Apache 2.0 on GitHub.

What's in the Box

53 pre-configured sources, ready to run out of the box — international wire services, financial media, Chinese-language news outlets, academic journals, tech blogs (TechCrunch, Wired, The Verge), open-source trackers (HelloGitHub, GitHub Trending), and select Twitter feeds. One intent covers both languages simultaneously.

git clone https://github.com/Peakstone-Labs/sembr.git
cd sembr
cp .env.example .env     # fill in one API key
docker compose up --build

Once it's running, a web panel gives you six tabs — Feeds, Intents, Templates, Articles, Logs, Settings. Everything is manageable from the UI. No YAML editing. No command-line config.

Architecture

The pipeline is a six-step closed loop:

Collector

Custom RSS + API fetcher. Configurable intervals per source — 5 minutes for breaking news, daily for policy tracking.

Embedder — BGE-M3 via SiliconFlow

Free tier. Encodes both intents and articles into the same vector space. One embedding model, two languages.

Vector Store — Qdrant

Open-source vector database, self-hosted alongside the app. No external vector service dependency.

Matcher — Reverse RAG Engine

Each intent's vector queries against incoming article vectors. Configurable similarity threshold (0–1) per intent — tighter for precision, looser for recall.

Summarizer — DeepSeek-V4-Flash

Only fires on matches. Each intent has its own analysis prompt — a quant macro analyst and a supply-chain analyst can read the same article through different lenses.

Notifier — SMTP Email

Digest format configurable per intent. Telegram / Discord / Slack channels on the blueprint.

Three Knobs Per Intent

Matching Tightness

Threshold 0 to 1. Higher = more precise, lower = higher recall. Set independently per intent.

Analysis Angle

Write your own prompt. Same article, different perspectives — cross-asset rotation, supply-demand margins, or regulatory signals.

Scan Cadence

Per-source, per-intent. 5-minute cycles for breaking events, daily digests for policy tracking.

Cost: under ¥0.10/day (~$0.014) per intent. BGE-M3 embedding runs on SiliconFlow's free tier. LLM summarization calls DeepSeek-V4-Flash — only fires on matches. Self-hosting means no monthly subscription, and your data never touches a third-party SaaS provider.

Designed for AI Agents, Not Just Humans

This is the part we're most excited about. If you're running an AI-agent-heavy setup — whether it's OpenClaw, LangGraph, or plain Claude — your agents shouldn't need you to read documentation and configure things for them. Sembr treats AI agents as first-class citizens from day one.

Layer 1: Agents Deploy It Themselves Install

The repository includes sembr/agent/INSTALL.md, a structured guide written for agent consumption — explicit phases, parallel work markers, idempotency checks, consent prompts for privileged operations. Drop the repo URL into an agent and it installs Sembr end-to-end in about 15 minutes.

Layer 2: Agents Manage It Themselves Operate

sembr/agent/sembr/ is a standard agent skills package — 5 files covering auth models, 31 API endpoints, request schemas, ready-to-use code recipes, and error formats. Load this into any skills-compatible agent framework and it can create intents, tweak thresholds, modify analysis prompt templates, trigger diagnostic fires, and read matching records — all without human intervention.

Layer 3: Agents Use Sembr as a Pipeline Node Orchestrate

POST /api/external/intents/{id}/fire is a synchronous endpoint designed for external agent orchestration. One HTTP call returns matched articles plus LLM summaries. It writes no records, sends no notifications, and can be called repeatedly without side effects. Imagine your morning routine: your agent calls Sembr at 7 AM, reads the macro scan, checks your portfolio against the risk model, and drafts a briefing email — all before you wake up.

Why Not Just Use Existing Tools?

	Semantic	Bilingual	Custom Sources	Self-Hosted	Per-Intent Analysis
Feedly Pro+ AI	✅	⚠️	⚠️	❌	⚠️
Inoreader Pro	❌	⚠️	✅	❌	⚠️
Bloomberg Terminal	✅	✅	❌	❌	❌
Perplexity Pro	✅	⚠️	❌	❌	⚠️
Sembr	✅	✅	✅	✅	✅

Common question: "Can't I just script a Perplexity API call on a timer?" Three structural gaps:

Cost. Perplexity charges per query. 10 intents × 24 rounds × 365 days ≈ 88,000 calls. Sembr's embedding layer is free; LLM calls only fire on matches.
Matching quality. You write a search query manually each time. "Emerging market currency contagion" won't match "Turkish lira plunges" in keyword search. Semantic vectors catch the connection.
Source boundary. Perplexity relies on search engine indices — coverage and timeliness depend on crawler frequency. Sembr connects directly to RSS feeds and APIs.

Blueprint

More push channels — Telegram, Discord, Slack. Scaffold already in place.
Local models — Ollama / mlx-lm for embedding and summarization. Data never leaves the machine.
More source plugins — Reddit, Hacker News, Mastodon, plus community plugin discovery.
Memory system — Daily analysis archived as structured timelines. Agents revisit "what did we think then vs. what actually happened" and calibrate from the gaps over time.

Get Started

GitHub: github.com/Peakstone-Labs/sembr — Star, fork, open issues, send PRs.

Documentation: peakstone-labs.github.io/sembr — Quickstart, configuration guide, architecture docs, plugin dev guide.

Live demo: panel.peakstone-labs.com — Switch to the Hot News tab and see what Sembr is tracking right now. No install required.

Sembr is our first open-source release, and it won't be our last. We're a quant firm that runs lean — one person, a team of AI agents, and a server. The tools we build for ourselves turn out to be things other teams struggle with too. Open-sourcing them makes the tools better, the community stronger, and the feedback loop faster.

Check back — we're building in public. There's more coming.

Sembr daily briefings are AI-generated. Content is for informational and research purposes only and does not constitute investment advice.

关键词提醒会漏掉所有真正重要的东西。"新兴市场货币传染"匹配不到"土耳其里拉跳水，市场押注美联储再加息"。我们在量化金融行业待了六年，看着团队搭建一套又一套复杂的关键词监控系统，它们在一个核心任务上反复失败：匹配的是意义，不是字符串。

所以我们造了 Sembr —— 一个开源的、可自部署的语义新闻雷达。用自然语言写下你关心的意图，它持续扫描 53 个中英文新闻源，在文章到达时即时找到语义匹配的内容，调 LLM 写摘要，推送给你。不需要配置关键词，不需要布尔逻辑。一行 docker compose up 就起来了。

今天，Sembr v1.0 以 Apache 2.0 协议在 GitHub 上正式开源。

盒子里有什么

53 条预置源，开箱即跑 —— 国际大报、财经媒体、中文快讯、学术期刊、科技博客（TechCrunch / Wired / The Verge）、开源动态（HelloGitHub / GitHub Trending）、Twitter 名人动态。一个意图同时覆盖中英两种语言。

git clone https://github.com/Peakstone-Labs/sembr.git
cd sembr
cp .env.example .env     # 填一行 API key
docker compose up --build

起来之后，Web 面板六个 tab 全在 UI 里操作：Feeds、Intents、Templates、Articles、Logs、Settings。加源、建意图、改模板、调配置，不用写 yaml，全部可视化。

架构

底层链路六步闭环：

Collector（采集器）

自研 RSS + API 采集器。按源独立配置采集频率 —— 突发新闻 5 分钟一轮，政策跟踪每日即可。

Embedder（嵌入）— BGE-M3 via SiliconFlow

免费档。将意图和文章编码到同一向量空间。一个嵌入模型，两种语言。

Vector Store（向量库）— Qdrant

开源向量数据库，与应用一同自部署。不依赖外部向量服务。

Matcher（匹配）— 反向 RAG 引擎

每条意图的向量反向查询新入库文章向量。每意图独立设相似度阈值（0–1）—— 往上更精准，往下更全量。

Summarizer（摘要）— DeepSeek-V4-Flash

仅命中时调用 LLM。每意图独立分析 Prompt —— 同一篇文章，量化宏观分析师与产业链分析师可以透过不同视角解读。

Notifier（推送）— SMTP 邮件

摘要格式按意图独立配置。Telegram / Discord / Slack 推送渠道在路线图上。

三个旋钮自己拧

匹配松紧

阈值 0 到 1，往上更精准，往下更全量。每意图独立设。

分析视角

自己写 Prompt。同一篇文章，量化研究员看跨资产轮动，行业分析师看供需边际，政策研究者看监管信号。

扫描节奏

按源、按意图配置。追突发 5 分钟一轮，盯政策演变每天一封晨报够了。

成本：一天不到一毛钱人民币。BGE-M3 嵌入走 SiliconFlow 免费档，LLM 摘要走 DeepSeek-V4-Flash —— 仅命中时调用。自部署的核心好处：没有月费，数据自己管，不被 SaaS 绑定。

为 AI Agent 而设计，不只是为人

这是我们最兴奋的部分。如果你的团队重度使用 AI Agent —— 不管是 OpenClaw、LangGraph 还是直接用 Claude —— 你的 Agent 不应该需要你去翻文档、手动配置。Sembr 从设计第一天就把 AI Agent 当作一等公民。

第一层：Agent 能自己部署安装

仓库包含 sembr/agent/INSTALL.md，一份写给 AI Agent 看的结构化安装指南 —— 显式 Phase、并行工作标注、幂等检查、特权操作前的 Consent 确认。把仓库地址丢给 Agent，约一刻钟全自动装完。

第二层：Agent 能自己管理运维

sembr/agent/sembr/ 是一个标准 Agent Skills 包 —— 5 份文件，覆盖鉴权模型、31 个 API 端点、请求体结构、即用代码示例、错误格式。加载到任何兼容 Skills 的 Agent 框架，Agent 就能自己建意图、调阈值、改 Prompt 模板、触发诊断 fire、读取匹配记录 —— 不需要人类翻文档。

第三层：Agent 能把 Sembr 当流水线节点编排

POST /api/external/intents/{id}/fire 是专门为外部 Agent 设计的同步端点。一次 HTTP 调用，返回匹配文章加 LLM 摘要。不写匹配记录、不发邮件、不触发通知 —— 纯诊断调用，可重复执行，不污染状态。想象你的晨间 Routine：Agent 早上 7 点调 Sembr 扫隔夜宏观变化，对照风控模型检查你的持仓，拟好一份晨报邮件 —— 在你起床之前全部做完。

市面上不是有现成工具吗？

	语义匹配	双语中英	自定义源	自部署	每意图独立分析
Feedly Pro+ AI	✅	⚠️	⚠️	❌	⚠️
Inoreader Pro	❌	⚠️	✅	❌	⚠️
Bloomberg Terminal	✅	✅	❌	❌	❌
Perplexity Pro	✅	⚠️	❌	❌	⚠️
Sembr	✅	✅	✅	✅	✅

常见质疑："我写个脚本调 Perplexity API 定时搜，不就行了？"三道结构性差距绕不过去：

成本结构。Perplexity 每次查询都收费。10 个意图 × 每天 24 轮 × 365 天 ≈ 8.8 万次调用，账单会很难看。Sembr 的嵌入层免费，LLM 只在命中后才触发。
匹配质量。你每次都要手写搜索 query。"新兴市场货币传染"在关键词检索里搜不到"土耳其里拉跳水"；语义向量能。
信源边界。Perplexity 底层依赖搜索引擎索引，覆盖面和时效性受限于爬虫频率。Sembr 直接对接 RSS 和 API，第一手。

蓝图

更多推送渠道 — Telegram、Discord、Slack，scaffold 已就位。
本地模型 — Ollama / mlx-lm 做嵌入和摘要，数据真不出机器。
更多信息源插件 — Reddit、Hacker News、Mastodon，加社区 plugin 发现机制。
记忆系统 — 每日分析归档为结构化时间线。Agent 能回溯"当时怎么判断的 vs 后来实际怎么走的"，从偏差里持续校准。

现在就开始

GitHub：github.com/Peakstone-Labs/sembr —— Star、Fork、提 Issue、提 PR。

文档：peakstone-labs.github.io/sembr —— Quickstart、配置指南、架构说明、插件开发指南。

Live 日报：panel.peakstone-labs.com —— 切到热点新闻 tab，看一眼 Sembr 今天抓到了什么。什么都不用装。

Sembr 是 Peakstone Labs 第一个正式开源的项目，也不会是最后一个。我们是一家量化公司，也是一个 OPC（一个人的公司）—— 就一个人、一群 AI Agent、一台服务器。我们为自己造的工具，别的团队也在头疼。开源出来，自己用着舒服，别人拿去也能改着用，社区还能往我们想不到的方向延伸。

关注我们，后面还有更多。

Sembr 日报由 AI 自动生成，所含信息不构成投资建议。

← Back to Insights