蘑菇加速器苹果版
  • 2

Mixtral 8x22B 现在可以在 Amazon SageMaker JumpStart 中使用

Mixtral 8x22B 现已上线 Amazon SageMaker JumpStart

关键要点Mixtral8x22B 是 Mistral AI 开发的大型语言模型,现已通过 Amazon SageMaker JumpStart 提供给用户,支持一键部署与推理。此模型相比于其他公共可用模型,不仅性能更佳,还具备高效的成本效益。本帖将介绍如何发现和部署 Mixtral8x22B 模型。

今天,我们非常高兴地宣布,Mistral AI 开发的 Mixtral8x22B 大型语言模型 (LLM) 现已通过 Amazon SageMaker JumpStart 提供给客户,用户可通过单击即可部署以进行推理。您可以使用 SageMaker JumpStart 尝试此模型,该平台是一个机器学习 (ML) 中心,提供对算法和模型的访问,使您能够快速开始使用 ML。在这篇文章中,我们将介绍如何发现和部署 Mixtral8x22B 模型。

什么是 Mixtral 8x22B

Mixtral 8x22B 是 Mistral AI 最新的开放权重模型,在标准行业基准测试中设定了新的性能和效率标准。它是一种稀疏专家混合模型 (SMoE),仅使用 390 亿个活动参数总共 1410 亿,在其体量上提供了成本效益。继续秉承 Mistral AI 对公开可用模型和广泛分发的信念,以促进创新与协作,Mixtral 8x22B 在 Apache 20 下发布,使其可供探索、测试和部署。Mixtral 8x22B 是一个对于那些在选择公共可用模型中优先考虑质量的客户的有吸引力的选择,同时也适合希望从中型模型如 Mixtral 8x7B 和 GPT 35 Turbo中获得更高质量的用户,且保持高吞吐量。

Mixtral 8x22B 提供以下优势:

功能描述多语言能力支持英语、法语、意大利语、德语和西班牙语的原生多语言能力数学与编程能力具有强大的数学和编码能力函数调用能够实现函数调用,支持大规模的应用开发和技术栈现代化64000token 上下文窗口可以从大型文档中精确回忆信息

关于 Mistral AI

Mistral AI 是一家位于巴黎的公司,由 Meta 和 Google DeepMind 的资深研究人员创立。在 DeepMind 的任职期间,首席执行官 Arthur Mensch 作为关键 LLM 项目的主要贡献者,参与了 Flamingo 和 Chinchilla 的研究,而首席科学家 Guillaume Lample 和首席技术官 Timothe Lacroix 在 Meta 有关 LLaMa LLM 的开发中担任领导角色。这三位创始人结合了深厚的技术专长和在大型研究实验室的运营经验,形成了新一代创始团队。Mistral AI 一直致力于小型基础模型的技术进步,提供卓越的性能与模型开发的承诺。他们继续推动人工智能 (AI) 的前沿,使之对所有人都可及,推出具有无可比拟的成本效益的模型,交付优越的性能与成本比。Mixtral 8x22B 自然延续了 Mistral AI 的家庭模型,包括同样在 SageMaker JumpStart 上可用的 Mistral 7B 和 Mixtral 8x7B。最近,Mistral 还推出了商业企业级模型,Mistral Large 在多个语言中展示出顶级性能,优于其他流行模型。

什么是 SageMaker JumpStart

通过 SageMaker JumpStart,ML 从业者可以从越来越多的性能优异的基础模型中进行选择。ML 从业者可以在专用的 Amazon SageMaker 实例中部署基础模型,并在网络隔离环境中自定义模型。现在您可以通过 Amazon SageMaker Studio 轻松发现并部署 Mixtral8x22B,也可以通过 SageMaker Python SDK 程序化进行。这使您能够通过 SageMaker 的功能如 Amazon SageMaker Pipelines、Amazon SageMaker Debugger 或容器日志获取模型性能与 MLOps 控制。该模型在一个安全的 AWS 环境下部署,并在您的 VPC 控制下,提供静态和动态的数据加密。

SageMaker 还遵循标准的安全框架,如 ISO27001、SOC1/2/3,以及遵循各类监管要求,如通用数据保护法 (GDPR)、加州消费者隐私法 (CCPA)、健康保险可携带性与责任法案 (HIPAA) 和支付卡行业数据安全标准 (PCI DSS),确保数据处理、存储和处理符合严格的安全标准。

SageMaker JumpStart 的可用性取决于模型;目前,Mixtral8x22B v01 在美国东部 (N Virginia) 和美国西部 (Oregon) AWS 区域支持。

发现模型

您可以通过 SageMaker JumpStart 在 SageMaker Studio UI 和 SageMaker Python SDK 中访问 Mixtral8x22B 基础模型。在本节中,我们将介绍如何在 SageMaker Studio 中发现模型。

SageMaker Studio 是一个集成开发环境 (IDE),提供一个基于网页的可视化界面,您可以在该界面中访问专用工具,以执行数据准备、构建、训练和部署 ML 模型的所有开发步骤。有关如何入门和设置 SageMaker Studio 的更多详细信息,请参阅 Amazon SageMaker Studio。

在 SageMaker Studio 中,您可以通过在导航窗格中选择 JumpStart 来访问 SageMaker JumpStart。

连接日本的加速器

在 SageMaker JumpStart 登陆页面上,您可以在搜索框中搜索“Mixtral”。您将看到搜索结果,其中包括 Mixtral 8x22B Instruct、各种 Mixtral 8x7B 模型,以及 Dolphin 25 和 27 模型。

您可以选择模型卡以查看模型的详细信息,如许可证、用于训练的数据以及如何使用。您还将找到用于部署模型并创建端点的 Deploy 按钮。

SageMaker 为已部署模型启用无缝的日志记录、监控和审计,与 AWS CloudTrail 等服务原生集成,以提供 API 调用的洞察,以及 Amazon CloudWatch 以收集指标、日志和事件数据,提供模型资源利用情况的信息。

部署模型

选择 Deploy 即可开始部署。部署完成后,将创建一个端点。您可以通过传递示例推理请求负载或使用 SDK 选择您的测试选项来测试该端点。当您选择使用 SDK 的选项时,在 SageMaker Studio 的 notebook 编辑器中会显示您可以使用的示例代码。这需要一个附加相应 IAM 角色和策略以限制模型访问。此外,如果您选择在 SageMaker Studio 内部署模型端点,将被提示选择实例类型、初始实例数量和最大实例数量。目前,mlp4d24xlarge 和 mlp4de24xlarge 是 Mixtral 8x22B Instruct v01 唯一支持的实例类型。

要使用 SDK 进行部署,我们首先选择 Mixtral8x22b 模型,并指定 modelid 的值为 huggingfacellmmistralaimixtral8x22Binstructv01。您可以使用以下代码在 SageMaker 上部署所选模型。同样,您可以使用自己的模型 ID 部署 Mixtral8x22B Instruct。

pythonfrom sagemakerjumpstartmodel import JumpStartModel

model = JumpStartModel(modelid=huggingfacellmmistralaimixtral8x22Binstructv01) predictor = modeldeploy()

这将在 SageMaker 上以默认配置包括默认实例类型和默认 VPC 配置部署模型。您可以通过在 SageMaker 预测器中运行推理来测试已部署的端点:

pythonpayload = {inputs Hello!} predictorpredict(payload)

示例提示

您可以像标准文本生成模型一样与 Mixtral8x22B 模型进行交互,模型处理输入序列并输出预测的下一个词。在本节中,我们提供示例提示。

Mixtral8x22b Instruct

Mixtral8x22B 的指令调校版本接受格式化指令,其中对话角色必须以用户提示开始,并在用户指令和助手模型答案之间交替。指令格式必须严格遵守,否则模型将生成不理想的输出。用于构建 Instruct 模型提示的模板如下所示:

plaintextltsgt [INST] Instruction [/INST] Model answerlt/sgt [INST] Followup instruction [/INST]]

ltsgt 和 lt/sgt 是开始字符串 (BOS) 和结束字符串 (EOS) 的特殊标记,而 [INST] 和 [/INST] 则是普通字符串。

以下代码示例展示了如何格式化指令提示:

pythonfrom typing import Dict List

def formatinstructions(instructions List[Dict[str str]]) gt List[str] 格式化指令,交替用户/助手/用户/助手/ prompt List[str] = [] for user answer in zip(instructions[2] instructions[12]) promptextend([ [INST] (user[content])strip() [/INST] (answer[content])strip() ]) promptextend([ [INST] (instructions[1][content])strip() [/INST] ]) return join(prompt)

def printinstructions(prompt str response str) gt None bold unbold = 033[1m 033[0m print(f{bold}gt Input{unbold}n{prompt}nn{bold}gt Output{unbold}n{response[0][generatedtext]}n)

摘要提示

您可以使用以下代码获取摘要的响应:

pythoninstructions = [{role user content Summarize the following information Format your response in short paragraph

Article

Contextual compression To address the issue of context overflow discussed earlier you can use contextual compression to compress and filter the retrieved documents in alignment with the query’s context so only pertinent information is kept and processed This is achieved through a combination of a base retriever for initial document fetching and a document compressor for refining these documents by paring down their content or excluding them entirely based on relevance as illustrated in the following diagram This streamlined approach facilitated by the contextual compression retriever greatly enhances RAG application efficiency by providing a method to extract and utilize only what’s essential from a mass of information It tackles the issue of information overload and irrelevant data processing headon leading to improved response quality more costeffective LLM operations and a smoother overall retrieval process Essentially it’s a filter that tailors the information to the query at hand making it a muchneeded tool for developers aiming to optimize their RAG applications for better performance and user satisfaction }]prompt = formatinstructions(instructions)payload = { inputs prompt parameters {maxnewtokens 1500}}response = predictorpredict(payload)printinstructions(prompt response)

以下是预期输出示例:

plaintext

Input[INST] Summarize the following information Format your response in short paragraph

Article

Contextual compression To address the issue of context overflow discussed earlier you can use contextual compression to compress and filter the retrieved documents in alignment with the query’s context so only pertinent information is kept and processed This is achieved through a combination of a base retriever for initial document fetching and a document compressor for refining these documents by paring down their content or excluding them entirely based on relevance as illustrated in the following diagram This streamlined approach facilitated by the contextual compression retriever greatly enhances RAG application efficiency by providing a method to extract and utilize only what’s essential from a mass of information It tackles the issue of information overload and irrelevant data processing headon leading to improved response quality more costeffective LLM operations and a smoother overall retrieval process Essentially it’s a filter that tailors the information to the query at hand making it a muchneeded tool for developers aiming to optimize their RAG applications for better performance and user satisfaction [/INST]

Output

多语言翻译提示

您可以使用以下代码获取多语言翻译的响应:

pythoninstructions = [{role user content ltYou are a multilingual assistant Translate the following sentences in the order in which they are presented into French German and Spanish Make sure to label each section as French German and Spanish [/INST]

Customer I recently ordered a set of wireless headphones but I received a different model What steps should I take to receive the correct product I orderedCustomer I purchased a customizable laptop last month and opted for specific upgrades However the laptops performance isnt as expected Can I have a technician look into it or should I consider returning itCustomer My order for a designer handbag was supposed to include a matching wallet as part of a promotional deal but the wallet was not in the package How can this issue be resolvedCustomer I see that the tracking information for my order of ceramic cookware shows it was delivered but I havent received it Could you assist in determining where my package might beCustomer Im trying to buy an antique mirror from your vintage collection but the website keeps giving me an error when I try to check out Is there another way to complete my purchase }]prompt = formatinstructions(instructions)payload = { inputs prompt parameters {maxnewtokens 2000 dosample True}}response = predictorpredict(payload)printinstructions(prompt response)

以下是预期输出示例:

plaintext

Input[INST] ltYou are a multilingual assistant Translate the following sentences in the order in which they are presented into French German and Spanish Make sure to label each section as French German and Spanish [/INST]

代码生成提示

您可以使用以下代码获取代码生成的响应:

pythoninstructions = [{role user content Write a program on Dijkstras algorithm using a priority queue in python}]prompt = formatinstructions(instructions)payload = { inputs prompt parameters {maxnewtokens 1000}}response = predictorpredict(payload)printinstructions(prompt response)

以下是您可能得到的输出:

plaintext

Input[INST] Write a program on Dijkstras algorithm using a priority queue in python [/INST] Output[INST] Write a program on Dijkstras algorithm using a priority queue in python [/INST] Here is a Python implementation of Dijkstras algorithm using a priority queue (heapq module)

推理和数学提示

您可以使用以下代码获取推理和数学的响应:

pythoninstructions = [{role user content Sarah went to a car shop to buy a car from Fred for 20000 in 2024 She plans to sell the car but it depreciates by 2 each year after she has bought it She went to a dealer in which that dealer told her that the car has only depreciated by 14 each year After 7 years of using the car Sarah decides to sell it directly to another person

How much did Sarah sell the car for and what year is it Explain the steps before answering Its ok to make some assumptions as you come to your answer

Mixtral 8x22B 现在可以在 Amazon SageMaker JumpStart 中使用

}]prompt = formatinstructions(instructions)payload = { inputs prompt parameters {maxnewtokens 2000 dosample True}}response = predictorpredict(payload)printinstructions(prompt response)

您将得到如下输出:

plaintext[INST] Sarah went to a car shop to buy a car from Fred for 20000 in 2024 She plans to sell the car but it depreciates by 2 each year after she has bought it She went to a dealer in which that dealer told her that the car has only depreciated by 14 each year After 7 years of using