Limitations

Blocking operations

Pipelines currently doesn't support asynchronous operations. All embedding operations are blocking and take place in the foreground. The impact of this behavior depends on the type of embedding being performed:

  • When bulk embedding, the system will be blocked until the embedding is complete for all the items currently in the database or storage.
  • When using auto-embedding, the system will be blocked until the embedding is complete for each item being inserted or updated.

Observability

Observability is currently limited to the logs and metrics provided by the underlying components. There isn't a single pane of glass for monitoring the entire system.

Monitoring progress of initial embedding also isn't currently available. We recommend using the system logs to track progress.

Large documents

Pipelines currently doesn't handle chunking large documents.

Data filtering

While Pipelines can limit embedded documents using SQL filters and views based on the content of rows, it currently doesn't support filtering on data in S3 storage. It's limited to using subpaths and prefix filtering.

Load balancing for models

There's currently no load balancing mechanism for model access.

Data formats

Pipelines currently supports only text and image formats. Other formats, including structured data, video, and audio, aren't currently supported.

Upgrading

When upgrading the aidb and pgfs extensions, there's currently no support for Postgres extension upgrades. When upgrading to a new version of the extensions, you must therefore drop and re-create the extensions:

DROP EXTENSION aidb CASCADE;
DROP EXTENSION pgfs CASCADE;
CREATE EXTENSION aidb CASCADE;
CREATE EXTENSION pgfs CASCADE;

Could this page be better? Report a problem or suggest an addition!