当前位置: 首页 > news >正文

学术研究图谱_academic-research-mapper

以下为本文档的中文说明

该技能用于绘制任何技术或学术主题的研究领域图谱。它通过搜索arXiv、Semantic Scholar等学术数据库,系统性地收集和分析相关文献,识别研究趋势、关键论文、主要研究者和机构合作关系。该技能自动构建主题的知识结构图谱,展示研究方向的分支脉络和演进路径。适用于研究生、科研人员和学术新手需要快速了解一个研究领域的全貌。通过自动化文献检索和分析,大大缩短了文献调研的时间周期,帮助研究人员在论文撰写、课题立项或研究方向选择时获得全面的文献基础支持。该技能提供了详细的操作指南和最佳实践,帮助用户快速上手并深入掌握。通过系统的功能模块划分和丰富的应用场景说明,用户可以在实际项目中有效运用该技能提升工作效率。该技能注重实用性和可操作性,涵盖从基础配置到高级功能的完整知识体系,满足不同层次用户的学习需求。持续更新和优化的内容确保用户始终能够接触到最新的技术发展和行业实践。通过此技能的学习和应用,用户可以减少摸索时间,快速获得可用的解决方案,将精力集中在核心业务逻辑和创新工作上,从而在技术快速迭代的环境中保持竞争力。该技能的模块化设计使其易于扩展和定制,用户可以根据自身需求灵活调整应用方式,实现最大化的价值产出。该技能整合了常见的设计模式和最佳实践,提供了清晰的学习路径和参考资料,帮助用户在短时间内建立起完整的知识框架,并有能力在实际项目中灵活运用所学内容解决问题。


Research Landscape Mapper — Understand a Field Before You Build or Write

You have access to the TinyFish CLI (tinyfish), a tool that runs browser automations from the terminal using natural language goals. This skill uses it to search arXiv, Semantic Scholar, and Google Scholar in parallel, then synthesizes results into a structured landscape report with identified gaps.

Pre-flight Check (REQUIRED)

Before making any TinyFish call, always run BOTH checks:

1. CLI installed?

bash/zsh:

whichtinyfish&&tinyfish--version||echo"TINYFISH_CLI_NOT_INSTALLED"

PowerShell:

Get-Commandtinyfish;tinyfish--version

If not installed, stop and tell the user:

Install the TinyFish CLI:npm install -g @tiny-fish/cli

2. Authenticated?

tinyfish auth status

If not authenticated, stop and tell the user:

You need a TinyFish API key. Get one at: https://agent.tinyfish.ai/api-keys

Then authenticate:

Option 1 — CLI login (interactive):

tinyfish auth login

Option 2 — bash/zsh (Mac/Linux, current session):

exportTINYFISH_API_KEY="your-api-key-here"

Option 3 — bash/zsh (persist across sessions, add to ~/.bashrc or ~/.zshrc):

echo'export TINYFISH_API_KEY="your-api-key-here"'>>~/.zshrcsource~/.zshrc

Option 4 — PowerShell (current session only):

$env:TINYFISH_API_KEY="your-api-key-here"

Option 5 — Claude Code settings:Add to~/.claude/settings.local.json:

{"env":{"TINYFISH_API_KEY":"your-api-key-here"}}

Do NOT proceed until both checks pass.


What This Skill Does

Given a research topic (e.g.“retrieval-augmented generation”or“protein structure prediction”), this skill:

  1. SearchesarXivfor preprints sorted by most recent — capturing what is being worked on right now
  2. SearchesSemantic Scholarfor papers ranked by relevance with citation counts — identifying what the field considers important
  3. SearchesGoogle Scholarfor broad coverage including published venues not yet on arXiv

It then deduplicates across all three sources by title similarity, clusters papers into subtopics, and synthesizes findings into a structured landscape report: what is well-studied, what is emerging, and where the gaps are.


Core Command

tinyfish agent run--url<url>"<goal>"

Flags

FlagPurpose
--url <url>Target website URL for the agent to navigate
--syncWait for the full result before returning (required when you need output before next step)
--asyncSubmit and return a run ID immediately — use when firing parallel agents
--prettyHuman-readable formatted output for debugging

Keyword Strategy

The quality of results depends entirely on your search terms. Before running anything, derive 2–3 keyword variants from the topic. Each source has different vocabulary norms — academic terms work best on Semantic Scholar, shorter compressed terms work best on arXiv.

TopicPrimary keywordsVariant AVariant B
Retrieval-augmented generationretrieval augmented generationRAG language modeldense retrieval QA
Protein structure predictionprotein structure predictionAlphaFold protein foldingab initio structure biology
Neural architecture searchneural architecture searchNAS automated machine learninghyperparameter optimization deep learning
Federated learning privacyfederated learningfederated learning differential privacydistributed training privacy

Use the primary keywords for the first parallel pass. If any source returns fewer than 5 results, run a second pass with the variant keywords on that source only.


Step-by-Step Workflow

Step 1 — Derive keywords and build URLs

Before running any agents, construct all three search URLs. Do this in your hea
d or in a scratch note — do not make TinyFish calls yet.

arXiv URL pattern:

https://arxiv.org/search/?query=<keywords>&searchtype=all&order=-announced_date_first

Semantic Scholar URL pattern:

https://www.semanticscholar.org/search?q=<keywords>&sort=Relevance

Google Scholar URL pattern:

https://scholar.google.com/scholar?q=<keywords>&as_sdt=0%2C5&hl=en

Replace<keywords>with URL-encoded primary keywords (spaces become+).


Step 2 — Search all three sources in parallel

Fire all three agents simultaneously. Do NOT wait for one to finish before starting the next.

arXiv — sorted by most recent:

tinyfish agent run--sync\\--url"https://arxiv.org/search/?query=retrieval+augmented+generation&searchtype=all&order=-announced_date_first"\\"Extract the top 15 search results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"abstract_snippet\\": str (first 150 chars of abstract),\\"arxiv_id\\": str,\\"url\\": str}]. If a result has no year visible, use the submission date year."

Semantic Scholar — sorted by relevance with citation counts:

tinyfish agent run--sync\\--url"https://www.semanticscholar.org/search?q=retrieval+augmented+generation&sort=Relevance"\\"Extract the top 15 search results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"abstract_snippet\\": str (first 150 chars),\\"url\\": str}]. Scroll down to load more results if fewer than 10 are visible."

Google Scholar — broad coverage:

tinyfish agent run--sync\\--url"https://scholar.google.com/scholar?q=retrieval+augmented+generation&as_sdt=0%2C5&hl=en"\\"Extract the top 15 search results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"snippet\\": str,\\"url\\": str}]. Citation count appears after 'Cited by' — extract that number."

Parallel Execution

All three source searches are fully independent. Always fire them simultaneously.

Good — parallel calls (fire and wait):

tinyfish agent run--sync\\--url"https://arxiv.org/search/?query=retrieval+augmented+generation&searchtype=all&order=-announced_date_first"\\"Extract the top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"abstract_snippet\\": str,\\"arxiv_id\\": str,\\"url\\": str}]">/tmp/arxiv_results.json&tinyfish agent run--sync\\--url"https://www.semanticscholar.org/search?q=retrieval+augmented+generation&sort=Relevance"\\"Extract the top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"abstract_snippet\\": str,\\"url\\": str}]">/tmp/s2_results.json&tinyfish agent run--sync\\--url"https://scholar.google.com/scholar?q=retrieval+augmented+generation&as_sdt=0%2C5&hl=en"\\"Extract the top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"snippet\\": str,\\"url\\": str}]">/tmp/scholar_results.json&waitecho"All three sources complete."

Bad — sequential calls:

# Do NOT do this — triples the wait time for no benefittinyfish agent run--url"https://arxiv.org/...""search arxiv, then also search semantic scholar, then also search google scholar"

Each source is always its own separate call. Never combine them into one goal.


Step 3 — Handle sparse results (if needed)

After the parallel run completes, check each result set. If any source returned fewer than 5 papers, run a second pass on that source with variant keywords:

# Example: arXiv returned only 3 results for primary keywordstinyfish agent run--sync\\--url"https://arxiv.org/search/?query=RAG+language+model&searchtype=all&order=-announced_date_first"\\"Extract the top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"abstract_snippet\\": str,\\"arxiv_id\\": str,\\"url\\": str}]"

Do not run second passes if the primary pass wa
s already rich — this wastes steps.


Step 4 — Synthesize into a Landscape Report

Once all three sources have returned results, synthesize findings into this structure. Use only data that TinyFish actually returned — do not hallucinate paper titles, citation counts, or author names.

## Research Landscape: <topic> ### Volume & Coverage - arXiv: <N> papers found, most recent: <year> - Semantic Scholar: <N> papers found, highest citations: <N> (paper title) - Google Scholar: <N> papers found - Unique papers after deduplication: <N> ### Key Papers (sorted by citation count) 1. <Title> — <Authors>, <Year>, <Venue if known> — <citation_count> citations <one-sentence summary from abstract snippet> 2. ... (list top 8–10 unique papers) ### Active Subtopics Cluster the papers by what they are actually about. Label each cluster with a short name. - **<Subtopic A>**: <N> papers — <1-sentence description of what this cluster covers> - **<Subtopic B>**: <N> papers — ... - **<Subtopic C>**: <N> papers — ... ### Key Authors & Groups - <Author name> — <N> papers in results, affiliated with <institution if visible> - ... (list authors appearing 2+ times across the results) ### Recency Signal - Papers from last 12 months: <N> - Papers from last 3 years: <N> - Oldest paper in results: <year> - Trend: <accelerating / stable / declining> (infer from year distribution) ### Gaps & Open Directions Based on what the papers cover and what they do not: - **Gap 1**: <specific thing that is missing or underexplored> - **Gap 2**: ... - **Gap 3**: ... ### Landscape Verdict <2–3 sentences: is this field crowded or open, mature or nascent, dominated by a few groups or distributed, and what is the single most underexplored angle?>

Deduplication Rules

Papers appear across multiple sources. Before synthesizing, deduplicate using these rules in order:

  1. Exact title match(case-insensitive) → keep one, prefer the Semantic Scholar entry (has citation count)
  2. Title similarity > 85%(same words, different punctuation) → treat as the same paper
  3. Same arXiv ID→ always the same paper regardless of title variation
  4. If unsure, keep both and note the possible duplicate in the report

Subtopic Clustering Guide

Group papers by reading their abstract snippets, not just their titles. Common cluster patterns:

If papers discuss…Cluster label
Benchmarks, evaluation datasets, metrics“Evaluation & benchmarks”
New model architectures or training methods“Model architecture”
Application to a specific domain (medical, legal, code)“Domain adaptation: ”
Efficiency, speed, compression, cost“Efficiency & scaling”
Safety, alignment, robustness, hallucination“Safety & reliability”
Surveys, meta-analyses, overviews“Surveys & overviews”

A paper can belong to at most two clusters. Name the clusters based on what you actually see, not these defaults if the topic warrants different ones.


Managing Runs

# List recent runs (useful if a run takes longer than expected)tinyfish agent run list# Get the full output of a specific run by IDtinyfish agent run get<run_id># Cancel a run that is taking too longtinyfish agent run cancel<run_id>

Output Format

The CLI streamsdata: {...}SSE lines by default. The final usable result is the event wheretype == "COMPLETE"andstatus == "COMPLETED"— the extracted data is in theresultJsonfield. Read the raw output directly; no script-side parsing is required.

When saving to files with>redirection as shown in the parallel example, the full SSE stream is saved. Extract the JSON by looking for the last line containing"COMPLETED"and parsing theresultJsonvalue from it.


Example: Full Run for “Mixture of Experts”

# Step 1 — fire all three in paralleltinyfish agent run--sync\\--url"https://arxiv.org/search/?query=mixture+of+experts+transformer&searchtype=all&order=-announced_date_first "\\"Extract top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"abstract_snippet\\": str,\\"arxiv_id\\": str,\\"url\\": str}]"\\>/tmp/moe_arxiv.json&tinyfish agent run--sync\\--url"https://www.semanticscholar.org/search?q=mixture+of+experts+transformer&sort=Relevance"\\"Extract top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"abstract_snippet\\": str,\\"url\\": str}]"\\>/tmp/moe_s2.json&tinyfish agent run--sync\\--url"https://scholar.google.com/scholar?q=mixture+of+experts+LLM&as_sdt=0%2C5&hl=en"\\"Extract top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"snippet\\": str,\\"url\\": str}]"\\>/tmp/moe_scholar.json&wait# Step 2 — synthesize# Read /tmp/moe_arxiv.json, /tmp/moe_s2.json, /tmp/moe_scholar.json# Deduplicate → cluster → produce landscape report
http://www.jsqmd.com/news/1037006/

相关文章:

  • 大润发购物卡回收正规平台排行榜出炉,新手必看避坑指南 - 京顺回收
  • 广州二手包包变现避坑指南 全渠道实测,优质回收品牌实力盘点 - 奢侈品回收测评
  • AI原生开发时代,程序员的核心能力正在被重定义
  • #Linux监控与安全Day03:Prometheus全套部署与基础操作,Prometheus与Grafana,数据库监控,Alertmanager 监控报警机制
  • 094、 PCIE动态链路速度与宽度控制:一次深夜调试的启示
  • 2026重庆奢侈品包包回收排行|7家正规机构实测报价测评 - 名奢变现站
  • MPC5200时钟与电源管理:嵌入式SoC核心架构与低功耗实战
  • 2026宝安3家逸程门店回收体验横评:卡地亚手镯报价实测 - 逸程
  • 2026年6月旋转接头生产厂家汇总:旋转接头、回转接头、密封叠环定制采购指南 - 海棠依旧大
  • 2026无糖茶饮料十大品牌怎么选?看茶多酚含量、原料萃取工艺、场景适配度3个关键维度 - 新闻快传
  • Java表达式注入漏洞CVE-2021-41862深度解析与防御实践
  • 2026年夏邑全屋整装怎么选?博迪装饰16年口碑、零增项、自有工人体系深度评测 - 精选优质企业推荐官
  • 2026密封条选购指南:三元乙丙胶条/尼龙(PA)隔热条/防火阻燃密封条正规厂家推荐:新合星塑胶制品有限公司领衔 - 栗子测评
  • 北京股权代持执行案件律师:股权代持被执行怎么办?3类争议焦点与司法裁判规则 - 品牌2026
  • mysql主从数据同步方案的探讨,解决数据不一致问题
  • XY2100命令行工具:模块化与管道化设计提升数据处理效率
  • 2026苏州黄金回收大盘溯源|合规持证门店金价对标实测 - 奢侈品回收测评
  • 【学习笔记】TI-OSAL
  • PDF解密软件口碑榜:7条品牌口碑深度拆解 - 资讯速览
  • 2026长沙钻石回收门店实力排行,禹竞名奢汇综合实力稳居榜首 - 名奢变现站
  • 2026年甘肃卷闸门厂家深度评测|兰州工业门生产商选型避坑指南 - 精选优质企业推荐官
  • 2026年 陕西西南智能仓储服务/管理系统最新推荐榜单:数字化与自动化智能仓储实力厂家精选 - 品牌发掘
  • 本地人常去!长沙逸程品牌首饰回收,正规实体门店透明交易无套路 - 逸程
  • 深入解析MC92520 ATM芯片外部内存数据结构与QoS实现机制
  • MPC857T FEC以太网控制器:硬件卸载、哈希过滤与驱动实战
  • 2026年宁夏卷闸门、防火门、快速门一站式定制安装选型指南 - 精选优质企业推荐官
  • 3步搭建Python车牌识别系统:从零到实战的完整指南
  • 2026年新能源模组抓取难题怎么解?柔性夹爪供应商选型干货 - 品牌2026
  • 嵌入式STM32---学习笔记(个人笔记记录)
  • 上海宝玑手表表壳镜面抛光!上海宝玑复古雕花表壳抛光会磨掉原有纹路吗?无损轻抛修复技巧亨得利专业解读 - 亨得利官方维修中心