DataStax, a leading data platform vendor, has made its entry into the vector database space with the announcement of vector search availability in its flagship Astra DB cloud database.
Known for its contributions to the open-source Apache Cassandra database, DataStax offers Astra DB as a commercially supported Database-as-a-Service (DBaaS) solution. While Cassandra has traditionally been a NoSQL database, it has expanded its capabilities over the years to accommodate various data types and use cases, including AI/ML.
Throughout 2023, DataStax has been focusing on advancing its platform for AI/ML, evident in its acquisition of AI feature engineering vendor Kaskada in January. The integration of Kaskada’s technology into DataStax’s Luna ML service, launched in May, demonstrated the company’s commitment to enhancing its AI/ML capabilities.
The addition of vector support to Astra DB further strengthens DataStax’s AI/ML offerings, providing organizations with a trusted and widely deployed database platform suitable for both traditional workloads and newer AI workloads.
The vector capability was initially showcased on Google Cloud Platform in June and is now generally available on Amazon Web Services (AWS) and Microsoft Azure as well.
Ed Anuff, Chief Product Officer at DataStax, stated that Astra DB is now as much a native vector database as any other in the field.
Vector databases play a fundamental role in AI/ML operations by storing content as vector embeddings, which are numerical representations of data. Vectors are an effective way to represent the semantic meaning of content and find broad applications in large language models (LLMs) and content retrieval tasks.
The vector database space encompasses various approaches and vendors. Purpose-built vendors like Pinecone and the open-source Milvus vector database are popular options. Additionally, some existing database platforms, such as MongoDB and PostgreSQL, have incorporated vector search as an overlay or extension.
In the case of DataStax, vector search employs vector columns as a native data type in Astra DB. This allows users to query and search vectors just like any other data type.
While the vector database capabilities are being introduced to Astra DB before they are available in the open-source Cassandra project, Anuff clarified that the feature will be included in the upcoming Cassandra 5.0 release later in the year. As a commercial vendor, DataStax can integrate the feature into its platform earlier, providing Astra DB with vector capabilities ahead of the open-source release.
Cassandra’s architecture supports extensible data types, enabling the database to incorporate additional native data types over time. As a result, vectors, along with any other data, seamlessly integrate with Cassandra’s distributed index system. This unique architecture allows for the scalability of vectorized data without constraints on the number of vectorized rows.
DataStax’s Astra DB now also supports native integration with the open-source LangChain technology, a common approach for building AI-powered applications that leverage multiple LLMs. This integration enables developers to generate responses by feeding Astra DB’s vector search results into LangChain models. It simplifies the process of building real-time agents that not only make predictions but also provide recommendations based on vector search results from Astra DB and connected LangChain models.
Anuff emphasized that the availability of vector capabilities on the platform marks a significant advancement toward realizing generative AI for enterprise users. With customers expressing interest in deploying generative AI in production, DataStax is ready to support their requirements and is excited about the possibilities it presents.