Version: 2.20-unstable

MetaLlamaChatGenerator

This component enables chat completion with any model hosted available with Meta Llama API.


Most common position in a pipeline	After a ChatPromptBuilder
Mandatory init variables	“api_key”: A Meta Llama API key. Can be set with `LLAMA_API_KEY` env variable or passed to `init()` method.
Mandatory run variables	“messages:” A list of ChatMessage objects
Output variables	“replies”: A list of ChatMessage objects
API reference	Meta Llama API
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/meta_llama

Overview

The MetaLlamaChatGenerator enables you to use multiple Meta Llama models by making chat completion calls to the Meta Llama API. The default model is Llama-4-Scout-17B-16E-Instruct-FP8.

Currently available models are:


Model ID	Input context length	Output context length	Input Modalities	Output Modalities
`Llama-4-Scout-17B-16E-Instruct-FP8`	128k	4028	Text, Image	Text
`Llama-4-Maverick-17B-128E-Instruct-FP8`	128k	4028	Text, Image	Text
`Llama-3.3-70B-Instruct`	128k	4028	Text	Text
`Llama-3.3-8B-Instruct`	128k	4028	Text	Text

This component uses the same ChatMessage format as other Haystack Chat Generators for structured input and output. For more information, see the ChatMessage documentation.

Tool Support

MetaLlamaChatGenerator supports function calling through the tools parameter, which accepts flexible tool configurations:

A list of Tool objects: Pass individual tools as a list
A single Toolset: Pass an entire Toolset directly
Mixed Tools and Toolsets: Combine multiple Toolsets with standalone tools in a single list

This allows you to organize related tools into logical groups while also including standalone tools as needed.

python

from haystack.tools import Tool, Toolset
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

# Create individual tools
weather_tool = Tool(name="weather", description="Get weather info", ...)
news_tool = Tool(name="news", description="Get latest news", ...)

# Group related tools into a toolset
math_toolset = Toolset([add_tool, subtract_tool, multiply_tool])

# Pass mixed tools and toolsets to the generator
generator = MetaLlamaChatGenerator(
    tools=[math_toolset, weather_tool, news_tool]  # Mix of Toolset and Tool objects
)

For more details on working with tools, see the Tool and Toolset documentation.

Initialization

To use this integration, you must have a Meta Llama API key. You can provide it with the LLAMA_API_KEY environment variable or by using a Secret.

Then, install the meta-llama-haystack integration:

shell

pip install meta-llama-haystack

Streaming

MetaLlamaChatGenerator supports streaming responses from the LLM, allowing tokens to be emitted as they are generated. To enable streaming, pass a callable to the streaming_callback parameter during initialization.

Usage

On its own

python

from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

llm = MetaLlamaChatGenerator()
response = llm.run(
    [ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
)
print(response["replies"][0].text)

With streaming and model routing:

python

from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

llm = MetaLlamaChatGenerator(model="Llama-3.3-8B-Instruct",
streaming_callback=lambda chunk: print(chunk.content, end="", flush=True))

response = llm.run(
    [ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
    )

## check the model used for the response
print("\n\n Model used: ", response["replies"][0].meta["model"])

In a pipeline

python

## To run this example, you will need to set a `LLAMA_API_KEY` environment variable.

from haystack import Document, Pipeline
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.generators.utils import print_streaming_chunk
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.dataclasses import ChatMessage
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.utils import Secret

from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

## Write documents to InMemoryDocumentStore
document_store = InMemoryDocumentStore()
document_store.write_documents(
    [
        Document(content="My name is Jean and I live in Paris."),
        Document(content="My name is Mark and I live in Berlin."),
        Document(content="My name is Giorgio and I live in Rome."),
    ]
)

## Build a RAG pipeline
prompt_template = [
    ChatMessage.from_user(
        "Given these documents, answer the question.\n"
        "Documents:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
        "Question: {{question}}\n"
        "Answer:"
    )
]

## Define required variables explicitly
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"question", "documents"})

retriever = InMemoryBM25Retriever(document_store=document_store)
llm = MetaLlamaChatGenerator(
    api_key=Secret.from_env_var("LLAMA_API_KEY"),
    streaming_callback=print_streaming_chunk,
)

rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm.messages")

## Ask a question
question = "Who lives in Paris?"
rag_pipeline.run(
    {
        "retriever": {"query": question},
        "prompt_builder": {"question": question},
    }
)

Overview​

Tool Support​

Initialization​

Streaming​

Usage​

On its own​

In a pipeline​

Overview

Tool Support

Initialization

Streaming

Usage

On its own

In a pipeline