Skip to main content

Google Generative AI Embeddings

Connect to Google’s generative AI embeddings service using the GoogleGenerativeAIEmbeddings class, found in the langchain-google-genai package.

Installation

%pip install -U langchain-google-genai

Credentials

import getpass
import os

if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass("Provide your Google API key here")

Usage

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector = embeddings.embed_query("hello, world!")
vector[:5]
[0.05636945, 0.0048285457, -0.0762591, -0.023642512, 0.05329321]

Batch

You can also embed multiple strings at once for a processing speedup:

vectors = embeddings.embed_documents(
[
"Today is Monday",
"Today is Tuesday",
"Today is April Fools day",
]
)
len(vectors), len(vectors[0])
(3, 768)

Task type

GoogleGenerativeAIEmbeddings optionally support a task_type, which currently must be one of:

  • task_type_unspecified
  • retrieval_query
  • retrieval_document
  • semantic_similarity
  • classification
  • clustering

By default, we use retrieval_document in the embed_documents method and retrieval_query in the embed_query method. If you provide a task type, we will use that for all methods.

%pip install --quiet matplotlib scikit-learn
Note: you may need to restart the kernel to use updated packages.
query_embeddings = GoogleGenerativeAIEmbeddings(
model="models/embedding-001", task_type="retrieval_query"
)
doc_embeddings = GoogleGenerativeAIEmbeddings(
model="models/embedding-001", task_type="retrieval_document"
)

All of these will be embedded with the ‘retrieval_query’ task set

query_vecs = [query_embeddings.embed_query(q) for q in [query, query_2, answer_1]]

All of these will be embedded with the ‘retrieval_document’ task set

doc_vecs = [doc_embeddings.embed_query(q) for q in [query, query_2, answer_1]]

In retrieval, relative distance matters. In the image above, you can see the difference in similarity scores between the “relevant doc” and “simil stronger delta between the similar query and relevant doc on the latter case.