Using an OpenAI-compatible API with Pipelines

Suggest edits

To make use of an OpenAI-compliant API, you can use the embeddings or completions model providers. A retriever needs to encode first, so you can only use the embeddings model provider with a retriever.

Why use an OpenAI-compatible API?

Some examples of why you might want to use an OpenAI-compatible API include:

If you have a local system running Ollama and you want that local system to handle embeddings. This assumes you've configured Ollama as a server.
If you have access to a service that provides different or specifically tuned models, you can use it instead of other models.

Creating the model

The starting point for this process is creating a model. When you create a model, you can pass options and credentials to the registration. The defaults point to the OpenAI service endpoint. By overriding the defaults, you can point to any service.

This example creates a model that uses a local Ollama server:

select aidb.create_model(
'my_local_ollama',
'embeddings',
'{"model":"llama3.2", "url":"http://llama.local:11434/v1/embeddings", "dimensions":3072}'::JSONB);

Model name and model provider

The model name is the first parameter. For the example, it's set to my_local_ollama.

Specify the model provider as embeddings, which is the provider that defaults to using OpenAI servers. You can use the configuration parameter, to override this value to talk to any compliant server.

Configuration

The next parameter is the configuration. This is a JSON string. When expanded, it has three parameters: the model, the url, and the dimensions.

'{"model":"llama3.2", "url":"http://llama.local:11434/v1/embeddings", "dimensions":3072}'::JSONB

In this case, we are setting the model to “llama3.2”, a relatively new and powerful model. Remember to run ollama run llama3.2 to pull and start the model on the server.

The next JSON setting is the important one, overriding the endpoint that the aidb model will use. In this example:

The server is running on a machine called llama.local.
It has port 11434 (the default port for Ollama) open to service requests over HTTP (not HTTPS in this case).
The path to the endpoint on the server is /v1/embeddings, which is the same as OpenAI.

Putting those components together we get http://llama.local:11434/v1/embeddings as our end point.

The last JSON parameter in this example is “dimensions”, which is a hint to the system about how many vector values to expect from the model. If we look up llama3.2’s properties, you can see the llama.embedding_length value is 3072. The provider defaults to 1536 (with some hard-wired exceptions depending on model) but it doesn’t know about llama3.2's embedding length. So in this case, we need to pass "dimensions":3072 to configure aidb accordingly.

That completes the configuration parameter.

If the endpoint requires an API key, that would be passed in the credentials parameter. As this is a local model, we don’t need to pass any credentials.

Note

When using indexing in pgvector, consider the pgvector indexing limitations. Aidb does not automatically configure an index today but if you add one manually, make sure it supports the number of dimensions your model uses.

Using the model

Use the model name you created earlier to use the model just like any other Pipelines model. This example shows how to use the model to get an embedding:

select aidb.encode_text('my_local_ollama','I like it');

Pipelines takes care of all the connection management, freeing you to focus on your data and the model results.

← Prev

Reranking (NIM)

↑ Up

Pipelines models

AI Accelerator Pipelines retrievers

Could this page be better? Report a problem or suggest an addition!

Using an OpenAI-compatible API with Pipelines

Why use an OpenAI-compatible API?

Creating the model

Model name and model provider

Configuration

Note

Using the model

← Prev

↑ Up

Next →