PROMPT=""" HUMAN: <context> {context} </context> What is the most fun thing to do in San Francisco based on the context? Don't give information outside the document or repeat our findings Assistant: here is the most relevant sentence in the context:"""

这些变化导致 Claude 的整体检索准确率显著提高：从 27%跃升至 98%! 我们对这一初步研究感到非常有趣，于是决定通过“干草堆中的针”测试进行一系列实验。

进一步实验

在进行一系列新测试时，我们对原始实验进行了若干修改。我们使用的针是一个随机数，每次迭代都会变化，从而消除了缓存的可能性。此外，我们使用了我们的开源 Phoenix 评估库（完全公开：我是构建 Phoenix 团队的负责人）来缩短测试时间，并使用 rails 直接在输出中搜索随机数，从而避免了冗长的内容，防止降低检索得分。最后，我们考虑了系统未能检索结果的负面情况，并将其标记为无法回答。我们为这种负面情况运行了单独的测试，以评估系统在无法检索数据时的识别能力。这些修改使我们能够进行更严格和全面的评估。

更新后的测试在四种不同的大型语言模型配置上运行：ChatGPT-4、Claude 2.1（有和没有 Anthropic 建议的提示修改）以及 Mistral AI 的 Mixtral-8X7B-v0.1 和 7B Instruct。考虑到提示的小细节差异可能导致模型之间结果的巨大差异，我们使用了几个提示模板，力图比较这些模型在最佳表现下的表现。我们为 ChatGPT 和 Mixtral 使用的简单模板如下：

SIMPLE_TEMPLATE=''' You are a helpful AI bot that answers questionsfora user.Keep your responses shortanddirect.The followingisasetof contextanda question that will relate to the context.#CONTEXT{context}#ENDCONTEXT#QUESTION{question}Don’t give information outside the documentorrepeat your findings.If the informationisnotavailableinthe context respond UNANSWERABLE

对于 Claude，我们测试了前面讨论的两种模板。

ANTHROPIC_TEMPLATE_ORIGINAL=''' Human: You are a close-reading bot with a great memory who answers questions for users. I’m going to give you the text of some essays. Amidst the essays (“the haystack”) I’ve inserted a sentence (“the needle”) that contains an answer to the user’s question. Here's the question: <question>{question}</question> Here’s the text of the essays. The answer appears in it somewhere. <haystack> {context} </haystack> Now that you’ve read the context, please answer the user's question, repeated one more time for reference: <question>{question}</question> To do so, first find the sentence from the haystack that contains the answer (there is such a sentence, I promise!) and put it inside <most_relevant_sentence> XML tags. Then, put your answer in <answer> tags. Base your answer strictly on the context, without reference to outside information. Thank you. If you can’t find the answer return the single word UNANSWERABLE Assistant: '''

ANTHROPIC_TEMPLATE_REV2=''' Human: You are a close-reading bot with a great memory who answers questions for users. I'm going to give you the text of some essays. Amidst the essays ("the haystack") I've inserted a sentence ("the needle") that contains an answer to the user's question. Here's the question: <question>{question}</question> Here's the text of the essays. The answer appears in it somewhere. <haystack> {context} </haystack> Now that you've read the context, please answer the user's question, repeated one more time for reference: <question>{question}</question> To do so, first find the sentence from the haystack that contains the answer (there is such a sentence, I promise!) and put it inside <most_relevant_sentence> XML tags. Then, put your answer in <answer> tags. Base your answer strictly on the context, without reference to outside information. Thank you. If you can't find the answer return the single word UNANSWERABLE Assistant: Here is the most relevant sentence in the context:'''

完成这些测试所运行的所有代码可以在此 GitHub 仓库中找到。