Timescale recently expanded its PostgreSQL AI offerings with pgai Vectorizer. This update enables developers to create, store, and manage vector embeddings alongside relational data without the need for external tools or additional infrastructure.
TimescaleDB, an open-source extension for PostgreSQL tailored for time-series data, first augmented PostgreSQL with real-time analytics features. Now, Timescale is enhancing AI integration with the pgai suite and the introduction of pgai Vectorizer, enabling developers to conduct AI development seamlessly within PostgreSQL.
Contributors have noted some challenges during the development process. One contributor, Tostino, highlighted issues with the OpenAI API compliance, noting that the current implementation lacks several arguments necessary for using proxy solutions or custom samplers on open-source inference servers. Additionally, Tostino suggested that functions providing a "simple" wrapper should be built on top of raw functions returning JSON, rather than strict data types, to enhance flexibility.
Building AI systems like search engines and AI agents often requires complex workflows. The pgai Vectorizer streamlines this by integrating the entire AI workflow into PostgreSQL, allowing developers to create advanced AI applications quickly and efficiently using familiar SQL commands.
Source
Timescale argues that the standard approach of treating vector embeddings as standalone data leads to synchronization issues and stale data. The Institute for Ethical AI & Machine Learning comments:
TimescaleDB proposes treating embeddings as derived data similar to database indexes, which is interesting given recent extensions from DBs like planetscale to integrate embeddings natively into indexes, similarly through a "native vectorizer" abstraction. In this case however they still leverage the OSS pgai Vectorizer for PostgreSQL which helps automating the synchronization of embeddings with their source data within the database.
The pgvector and pgvectorscale extensions allow you to store vector embeddings in your database and perform fast and efficient vector searches. The pgai Vectorizer builds on top of these extensions to automatically create and synchronize embeddings for any text data in your database.
With one line of code, you can define a vectorizer that creates embeddings for data in a table. Suvarna Kadam, a machine learning consultant comments:
pgai Vectorizer makes it possible to use one SQL command that will manage your vector embeddings "without" the usual engineering challenges to keep it in sync with your source data!
SELECT ai.create_vectorizer(聽
聽 聽 <table_name>::regclass,
聽 聽 destination => <embedding_table_name>,
聽 聽 embedding => ai.embedding_openai(<model_name>, <dimensions>),
聽 聽 chunking => ai.chunking_recursive_character_text_splitter(<column_name>)
);
In the same week, Neon Database Labs also introduced Pgrag, an experimental PostgreSQL extension aimed at supporting end-to-end retrieval-augmented-generation (RAG) pipelines, further expanding their own AI capabilities.
In addition to the recent launch of the pgai Vectorizer, there has been community interest in expanding the range of supported embedding models beyond OpenAI. Contributor claudeomusic inquired about the possibility of making the choice of embedding models configurable, highlighting the importance of flexibility for users. In response, alejandrodnm from Timescale confirmed that while the current Vectorizer feature supports only OpenAI models, there are plans to include other providers in the future. The team is open to contributions from the community to help achieve this goal. Another contributor, wang, shared his workaround on How to use with Openrouter.
To quickly try out embeddings using a pre-built Docker developer environment, see the Vectorizer quick start. For more detailed technical specifications, see the Vectorizer API reference.