Capabilities

Pipeline lifecycle

This high-level overview describes the lifecycle of a pipeline in AI Accelerator Pipelines.

Optionally, create a storage location

This step is optional and needed only for accessing data external storage.

Data for processing can be stored in the database in a table or in an external storage location. If you want to use an external storage location to access data, you must create a storage location. This storage location can be an S3 bucket or a local file system.

The storage locations can be used by AI Accelerator to create a volume. This volume can then be used by a retriever to access its data.

Create a model

Create a model with AI Accelerator Pipelines. This model can be a machine learning model, a deep learning model, or any other type of model that can be used for AI tasks.

Create a retriever

Create a retriever with AI Accelerator Pipelines. A retriever is a function that retrieves data from a table or volume and returns it in a format that the model can use.

By default, a retriever only needs:

  • A name
  • The name of a model to use

If the retriever is for a table, it also needs:

  • The name of the source table
  • The name of the column in the source table that contains the data
  • The data type of the column

If the retriever is for a volume, it needs:

  • The name of the volume
  • The name of the column in the volume that contains the data

When you create a retriever, by default a vector table is created to store the embeddings of the data that's retrieved. This table has a column to store the embeddings and a column to store the key of the data.

When you create the retriever, you can specify the name of the vector table and the name of the vector column and the key column. This ability is useful if you're migrating to aidb and want to use an existing vector table.

Create embeddings

Embedding sees the data being retrieved from the source table or volume and encoded into a vector datatype. That vector data is then stored in the vector table.

If the source table already has data/rows when the retriever is created, then you need to make a manual bulk embedding call. This call generates the embeddings for all the existing data in the source table.

You can then activate auto-embedding to keep the embeddings in sync going forward. Auto-embedding uses Postgres triggers to detect insertions and updates to the source table and generates embeddings for the new data.

Query data

You can query the embedded data using the retriever. The retriever can return the key to the data or the data itself, depending on the query. You can query the data using a text query or an image query, depending on the type of data that's being retrieved.

Next steps

While auto-embedding is enabled, the embeddings are always up to date, and applications can use the retriever to query the data as needed.

Cleanup

If the embeddings are no longer required, you can delete the retriever, drop the vector table, and delete the model.


Could this page be better? Report a problem or suggest an addition!