# Text Embeddings Inference (TEI) Integration Guide This document describes how to use Text Embeddings Inference (TEI) service with Milvus Helm Chart, and how to integrate TEI with Milvus. TEI is an open-source project developed by Hugging Face, available at [https://github.com/huggingface/text-embeddings-inference](https://github.com/huggingface/text-embeddings-inference). ## Overview Text Embeddings Inference (TEI) is a high-performance text embedding model inference service that converts text into vector representations. Milvus is a vector database that can store and retrieve these vectors. By combining the two, you can build powerful semantic search and retrieval systems. ## Deployment Methods This guide provides two ways to use TEI: 1. Deploy TEI service directly through the Milvus Helm Chart 2. Use external TEI service with Milvus integration ## Deploy TEI through Milvus Helm Chart ### Basic Configuration Enable TEI service in `values.yaml`: ```yaml tei: enabled: true modelId: "BAAI/bge-large-en-v1.5" # Specify the model to use ``` This is the simplest configuration, just specify `enabled: true` and the desired `modelId`. ### Complete Configuration Options ```yaml tei: enabled: true # Enable TEI service name: text-embeddings-inference # Service name replicas: 1 # Number of TEI replicas image: repository: ghcr.io/huggingface/text-embeddings-inference # Image repository tag: cpu-1.6 # Image tag (CPU version) pullPolicy: IfNotPresent # Image pull policy service: type: ClusterIP # Service type port: 8080 # Service port annotations: {} # Service annotations labels: {} # Service labels resources: # Resource configuration requests: cpu: "4" # CPU request memory: "8Gi" # Memory request limits: cpu: "8" # CPU limit memory: "16Gi" # Memory limit persistence: # Persistence storage configuration enabled: true # Enable persistence storage mountPath: "/data" # Mount path annotations: {} # Storage annotations persistentVolumeClaim: # PVC configuration existingClaim: "" # Use existing PVC storageClass: # Storage class accessModes: ReadWriteOnce # Access modes size: 50Gi # Storage size subPath: "" # Sub path modelId: "BAAI/bge-large-en-v1.5" # Model ID extraArgs: [] # Additional command line arguments for TEI, such as "--max-batch-tokens=16384", "--max-client-batch-size=32", "--max-concurrent-requests=128", etc. nodeSelector: {} # Node selector affinity: {} # Affinity configuration tolerations: [] # Tolerations topologySpreadConstraints: [] # Topology spread constraints extraEnv: [] # Additional environment variables ``` ### Using GPU Acceleration If you have GPU resources, you can use the GPU version of the TEI image to accelerate inference: ```yaml tei: enabled: true modelId: "BAAI/bge-large-en-v1.5" image: repository: ghcr.io/huggingface/text-embeddings-inference tag: 1.6 # GPU version resources: limits: nvidia.com/gpu: 1 # Allocate 1 GPU ``` ## Frequently Asked Questions ### How to determine the embedding dimension of a model? Different models have different embedding dimensions. Here are the dimensions of some commonly used models: - BAAI/bge-large-en-v1.5: 1024 - BAAI/bge-base-en-v1.5: 768 - nomic-ai/nomic-embed-text-v1: 768 - sentence-transformers/all-mpnet-base-v2: 768 You can find this information in the model's documentation or get it through the TEI service's API. ### How to test if the TEI service is working properly? After deploying the TEI service, you can use the following commands to test if the service is working properly: ```bash # Get the TEI service endpoint export TEI_SERVICE=$(kubectl get svc -l component=text-embeddings-inference -o jsonpath='{.items[0].metadata.name}') # Test the embedding functionality kubectl run -it --rm curl --image=curlimages/curl -- curl -X POST "http://${TEI_SERVICE}:8080/embed" \ -H "Content-Type: application/json" \ -d '{"inputs":"This is a test text"}' ``` ### How to use TEI-generated embeddings in Milvus? In Milvus, you can use TEI-generated embeddings for the following operations: 1. When creating a collection, specify the vector dimension to match the TEI model output dimension 2. Before inserting data, use the TEI service to convert text to vectors 3. When searching, similarly use the TEI service to convert query text to vectors ## Using Milvus Text Embedding Function Milvus provides a text embedding function feature that allows you to generate vector embeddings directly within Milvus. You can configure Milvus to use TEI as the backend for this function. ### Using the Text Embedding Function in Milvus 1. Specify the embedding function when creating a collection: ```python from pymilvus import MilvusClient, DataType, Function, FunctionType # Connect to Milvus client = MilvusClient(uri="http://localhost:19530") # 1. Create schema schema = client.create_schema() # 2. Add fields schema.add_field("id", DataType.INT64, is_primary=True, auto_id=False) # Primary key schema.add_field("text", DataType.VARCHAR, max_length=65535) # Text field schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=768) # Vector field, dimension must match TEI model output # 3. Define TEI embedding function tei_embedding_function = Function( name="tei_func", # Unique identifier for this embedding function function_type=FunctionType.TEXTEMBEDDING, # Indicates a text embedding function input_field_names=["text"], # Scalar field(s) containing text data to embed output_field_names=["embedding"], # Vector field(s) for storing embeddings params={ # TEI specific parameters "provider": "TEI", # Must be set to "TEI" "endpoint": "http://tei-service:8080", # TEI service address # Optional: "api_key": "your_secure_api_key", # Optional: "truncate": "true", # Optional: "truncation_direction": "right", # Optional: "max_client_batch_size": 64, # Optional: "ingestion_prompt": "passage: ", # Optional: "search_prompt": "query: " } ) schema.add_function(tei_embedding_function) # 4. Create collection with schema and index param index_params = client.prepare_index_params() index_params.add_index( field_name="embedding", index_name="embedding_index", index_type="HNSW", metric_type="COSINE", ) client.create_collection( name="test_collection", schema=schema, index_params=index_params ) client.load_collection( collection_name="test_collection" ) ``` 2. Automatically generate embeddings when inserting data ```python # Insert data, Milvus will automatically call the TEI service to generate embedding vectors client.insert( collection_name="test_collection", data=[ {"id": 1, "text": "This is a sample document about artificial intelligence."}, {"id": 2, "text": "Vector databases are designed to handle embeddings efficiently."} ] ) ``` 3. Automatically generate query embeddings when searching: ```python # Search directly using text, Milvus will automatically call the TEI service to generate query vectors results = client.search( collection_name="test_collection", data=["Tell me about AI technology"], anns_field="embedding", search_params={ "metric_type": "COSINE", "params": {} }, limit=3 ) ```