Building a Personalized Link Search Engine with Supabase pgvector, TransformerJS, and Langchain
Navigating the vast digital world requires efficient tools, especially for link discovery. In our project, we've combined the strengths of Supabase pgvector, transformerJS, and LangChain. Together, these tools power our Fastify-based backend. We also designed a user-friendly frontend using React and Vite. In this article, we'll walk you through our project's inner workings, both backend and frontend. Let's dive in!
Installation and Setup backend
Before diving into the core functionalities of our project, we'll need to set up the necessary tools and libraries. This will ensure a smooth development process, and a seamless integration of Supabase pgvector, transformerjs, and Langchain within our Fastify framework.
1. Setting up Fastify-CLI
Fastify-CLI simplifies the process of bootstrapping and running Fastify applications. Install it globally using the following command:
bash
npm install fastify-cli --global
2. Building the Boilerplate
Once Fastify-CLI is installed, we'll generate the project boilerplate tailored for our needs. We'll be utilizing the ESM (ECMAScript Modules) and TypeScript for a more robust development experience.
Run the following command to generate the boilerplate:
bash
fastify generate --esm --lang=ts server
Installing Dependencies
With the basic structure in place, let's install the necessary dependencies to power our backend:
bash
npm i @xenova/transformers @supabase/supabase-js langchain html-to-text
These packages bring in the essential tools for our link search functionalities, connecting our backend with the capabilities of tansformerJS, Supabase, and Langchain, while also allowing us to convert HTML content to plain text.
Database Setup for Hybrid Search in Supabase
To implement a powerful and efficient search functionality in our backend, we will utilize the hybrid search capabilities provided by the LangChain library. This method integrates vector similarity search with keyword-based search, ensuring both precision and flexibility. Below is a step-by-step breakdown of our database setup:
1. Enabling pgvector Extension
Firstly, we need to enable the
pgvector
extension, which is pivotal for working with embedding vectors.sql
create extension vector;
2. Creating the Documents Table
The
documents
table will store the information and content of the links we wish to index.id
: A unique identifier for each document.content
: The main content of the document, corresponding to html to text content.url
: The link we saved.metadata
: Supplementary data about the url, corresponding toDocument.metadata
.embedding
: A vector representation of the document's content. The dimension384
is commonly used for embeddings.
sql
create table documents (
id bigserial primary key,
content text,
url text,
metadata jsonb,
embedding vector(384)
);
3. Creating a Similarity Search Function
The
match_documents
function is designed to perform a similarity search for documents based on embeddings. This function returns a list of documents ranked by their similarity to a provided query embedding.sql
create function match_documents (
query_embedding vector(384),
match_count int DEFAULT null,
filter jsonb DEFAULT '{}'
) returns table (
id bigint,
content text,
metadata jsonb,
similarity float
)
language plpgsql
as $$
#variable_conflict use_column
begin
return query
select
id,
content,
metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where metadata @> filter
order by documents.embedding <=> query_embedding
limit match_count;
end;
$$;
4. Creating a Keyword Search Function
The
kw_match_documents
function executes a keyword-based search on the documents
table. This is especially useful when you want to search documents based on specific terms or phrases. The results are ranked by their relevance to the provided query.sql
create function kw_match_documents(query_text text, match_count int)
returns table (id bigint, content text, metadata jsonb, similarity real)
as $$
begin
return query execute
format('select id, content, metadata, ts_rank(to_tsvector(content), plainto_tsquery($1)) as similarity
from documents
where to_tsvector(content) @@ plainto_tsquery($1)
order by similarity desc
limit $2')
using query_text, match_count;
end;
$$ language plpgsql;
Integrating Supabase with Fastify
To make our backend efficient and well-integrated with Supabase, we need to set up a plugin within Fastify. This approach not only encapsulates the Supabase connection logic but also ensures that our Supabase client can be conveniently accessed throughout our application. Let's break down the steps and the code:
1. Supabase Credentials
Before diving into the code, ensure you've copied the Supabase URL and the anonymous key from your Supabase dashboard. These will be required to establish a connection to your Supabase project.
2. Creating the Fastify Plugin for Supabase
Navigate to the
plugins
directory and create a new file named supabase.ts
. This file will contain the code for our Fastify plugin.Now, let's dissect the code:
typescript
import fp from "fastify-plugin";
import { FastifyPluginAsync } from "fastify";
import { createClient, SupabaseClient } from "@supabase/supabase-js";
Here, we are importing necessary dependencies:
fastify-plugin
: This module helps in creating Fastify plugins.FastifyPluginAsync
: A type from Fastify for creating asynchronous plugins.createClient
&SupabaseClient
: Functions and types from the@supabase/supabase-js
library.
typescript
declare module "fastify" {
interface FastifyInstance {
supabase: SupabaseClient;
}
}
We extend Fastify's main instance interface to declare that it will also have a
supabase
property. This augmentation lets us attach the Supabase client to any Fastify instance, making it available throughout the application by requesting request.server.supabase
.typescript
const supabasePlugin: FastifyPluginAsync = fp(async (server, options) => {
console.log("Connecting to Supabase");
const supabaseUrl = process.env.SUPABASE_PUBLIC_URL!;
const supabaseKey = process.env.SUPABASE_ANON_KEY!;
const supabase = createClient(supabaseUrl, supabaseKey);
server.decorate("supabase", supabase);
server.addHook("onClose", async (server) => {
server.log.info("Supabase connection closed.");
});
});
export default supabasePlugin;
Here:
- We define the Fastify plugin
supabasePlugin
. - We fetch the Supabase URL and anonymous key from environment variables.
- We initialize the Supabase client using
createClient
. - We attach the Supabase client to the Fastify server instance using the
decorate
method. - An
onClose
hook logs a message when the Fastify server is closed, indicating that the Supabase connection is also closed. - Finally, we export the plugin, making it available for inclusion in our Fastify server setup.
With this plugin, our backend is now effectively integrated with Supabase, enabling smooth data operations and ensuring that Supabase's functionalities are readily accessible throughout the application.
Routes: Handling and Searching Links
As the backbone of our application, managing and querying saved links is crucial. This functionality is enshrined in the
link.ts
file, located inside the src/routes/
directory. Acting as the gateway at http://localhost:3000/links
, this file dictates how our app handles link-related operations. Let's walk through link.ts
to understand its core responsibilities and the magic it brings to our backend.1. Required Modules & Types
First, we import the necessary modules. These span from Fastify core modules to specialized libraries from LangChain that will be instrumental in our link processing and retrieval.
We also define two request types:
SaveLinkRequest
: Accepts a body withurl
andcontent
for saving a link.SearchLinkRequest
: Accepts a body with aquery
(search term) and an optionalcount
to limit the number of search results.
2. Saving a Link (/save
Endpoint)
When a POST request is made to
/links/save
, the application saves the provided link to the Supabase database.typescript
fastify.post(
"/save",
async function (request: FastifyRequest<SaveLinkRequest>, reply) {
const supabase = request.server.supabase;
const { url, content } = request.body;
console.log("Saving", url);
const docs = [
new Document({
pageContent: content,
metadata: {
url: url,
},
}),
];
const splitter = RecursiveCharacterTextSplitter.fromLanguage("html");
const transformer = new HtmlToTextTransformer();
const sequence = splitter.pipe(transformer);
const newDocuments = await sequence.invoke(docs);
const model = new HuggingFaceTransformersEmbeddings({
modelName: "Supabase/gte-small",
});
for (const doc of newDocuments) {
if (doc.pageContent) {
const embeddings = await model.embedDocuments([doc.pageContent]);
const { error } = await supabase
.from("documents")
.insert([{
url,
content: doc.pageContent,
embedding: JSON.stringify(embeddings[0]),
metadata: JSON.stringify(doc.metadata),
}]);
if (error) {
return reply.status(500).send(error);
}
}
}
return {
"message": "success",
};
},
);
Here's the process:
- Extract the Supabase client and request body.
- Create a
Document
instance containing the link's content and URL. - Use LangChain's
RecursiveCharacterTextSplitter
to split the HTML content andHtmlToTextTransformer
to transform HTML into plain text. - Generate embeddings for the transformed content using HuggingFace's transformer model (named
"Supabase/gte-small"
in this case). - Save the link's details (including its embedding) to the Supabase
documents
table.
3. Searching for Links (/search
Endpoint)
When a POST request is made to
/links/search
, the application retrieves relevant links from the Supabase database based on the provided search term.typescript
fastify.post(
"/search",
async function (request: FastifyRequest<SearchLinkRequest>, reply) {
const limit = request.body.count || 3;
const supabase = request.server.supabase;
const embeddings = new HuggingFaceTransformersEmbeddings({
modelName: "Supabase/gte-small",
});
const retriever = new SupabaseHybridSearch(embeddings, {
similarityK: limit,
keywordK: limit,
tableName: "documents",
similarityQueryName: "match_documents",
keywordQueryName: "kw_match_documents",
client: supabase,
});
const results = await retriever.getRelevantDocuments(request.body.query);
return {
results,
};
},
);
Here's the search workflow:
- Define a limit on the number of results based on the
count
in the request body or default to 3. - Initialize the HuggingFace's transformer model for embeddings.
- Use LangChain's
SupabaseHybridSearch
to enable a combination of vector similarity search and keyword-based search. - Retrieve relevant documents (links) based on the provided query.
The
SupabaseHybridSearch
is particularly powerful, offering a flexible search experience by harnessing both embeddings and keyword search.With these routes in place, our backend now possesses the capability to save links efficiently, embedding their content for quick retrievals. Furthermore, our hybrid search ensures that when users search for content, they get the most relevant links, be it by content similarity or keyword match.
Full Code
typescript
import { FastifyPluginAsync, FastifyRequest } from "fastify";
import { Document } from "langchain/document";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { HtmlToTextTransformer } from "langchain/document_transformers/html_to_text";
import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";
import { SupabaseHybridSearch } from "langchain/retrievers/supabase";
export type SaveLinkRequest = {
Body: {
url: string;
content: string;
};
};
export type SearchLinkRequest = {
Body: {
query: string;
count?: number;
};
};
const root: FastifyPluginAsync = async (fastify, opts): Promise<void> => {
fastify.post(
"/save",
async function (request: FastifyRequest<SaveLinkRequest>, reply) {
const supabase = request.server.supabase;
const { url, content } = request.body;
console.log("Saving", url);
const docs = [
new Document({
pageContent: content,
metadata: {
url: url,
},
}),
];
const splitter = RecursiveCharacterTextSplitter.fromLanguage("html");
const transformer = new HtmlToTextTransformer();
const sequence = splitter.pipe(transformer);
const newDocuments = await sequence.invoke(docs);
const model = new HuggingFaceTransformersEmbeddings({
modelName: "Supabase/gte-small",
});
for (const doc of newDocuments) {
if (doc.pageContent) {
const embeddings = await model.embedDocuments([doc.pageContent]);
const { error } = await supabase
.from("documents")
.insert([{
url,
content: doc.pageContent,
embedding: JSON.stringify(embeddings[0]),
metadata: JSON.stringify(doc.metadata),
}]);
if (error) {
return reply.status(500).send(error);
}
}
}
return {
"message": "success",
};
},
);
fastify.post(
"/search",
async function (request: FastifyRequest<SearchLinkRequest>, reply) {
const limit = request.body.count || 3;
const supabase = request.server.supabase;
const embeddings = new HuggingFaceTransformersEmbeddings({
modelName: "Supabase/gte-small",
});
const retriever = new SupabaseHybridSearch(embeddings, {
similarityK: limit,
keywordK: limit,
tableName: "documents",
similarityQueryName: "match_documents",
keywordQueryName: "kw_match_documents",
client: supabase,
});
const results = await retriever.getRelevantDocuments(request.body.query);
return {
results,
};
},
);
};
export default root;
This setup, leveraging Fastify's performance and Langchain's sophisticated tools and supabase database management, ensures a seamless and effective link management system.
Setting Up the Frontend: Integrating Chrome Extension & Search Application
Building a search engine backend without a frontend interface is like constructing a library with no entrance. In this section, we will bring our backend to life by setting up a frontend in the form of a Chrome extension. The extension will serve as an easy mechanism for users to save links and will complement our search application seamlessly. Here's a step-by-step guide to establish the "VexaSearch" Chrome extension:
1. Crafting the Chrome Extension
To begin, let's set up the necessary files for our extension:
Folder Structure:
/VexaSearch
|-- manifest.json
|-- background.js
manifest.json:
The
manifest.json
is the metadata file for Chrome extensions. It provides essential details to the Chrome browser about how the extension should function and what permissions it requires.json
{
"manifest_version": 3,
"name": "VexaSearch",
"version": "1.0",
"description": "Search links made easy",
"permissions": [
"activeTab",
"tabs",
"scripting",
"contextMenus"
],
"host_permissions": [
"http://*/*",
"https://*/*"
],
"background": {
"service_worker": "background.js"
}
}
manifest_version
: Specifies which version of the manifest specification the package requires.name
: The name of our extension, "VexaSearch".version
: The version of our extension.description
: A brief description.permissions
: A list of permissions the extension needs .host_permissions
: Defines which websites our extension can access.background
: Specifies thatbackground.js
will serve as a service worker, acting as the backbone of our extension.
background.js:
This is the brain of our extension, housing the logic that will run in the background.
javascript
chrome.contextMenus.create({
title: "Save url to Vexxa Search",
id: "vexxa",
contexts: ["page"],
});
const fetchCurrentSite = () => {
const url = new URL(window?.location?.href);
const entirePage = document.documentElement.outerHTML;
return {
url: url.href,
content: entirePage,
};
};
chrome.contextMenus.onClicked.addListener(async (info, tab) => {
if (info.menuItemId == "vexxa") {
const tabId = tab.id;
const response = await chrome.scripting.executeScript({
target: {
tabId: tabId,
},
func: fetchCurrentSite,
});
console.log(response);
if (response.length !== 0) {
const data = response[0]["result"];
if (data) {
console.log(data);
const response = await fetch("http://localhost:3000/link/save", {
method: "POST",
body: JSON.stringify(data),
headers: {
"Content-Type": "application/json",
},
});
if (response.status === 200) {
const data = await response.json();
console.log(data);
}
}
}
}
});
- Creating Context Menu : We add a context menu item named "Save url to Vexxa Search". When users right-click on a page, they will see this option, allowing them to save the URL directly.
- Fetching Current Site: The function
fetchCurrentSite
captures the current URL and the entire HTML content of the page. - Event Listener for Menu Click: When the user selects "Save url to Vexxa Search" from the context menu, the event listener fires. It first fetches the current page's URL and content, and then sends a POST request to our backend at
http://localhost:3000/link/save
, saving the link's data.
Loading the Extension:
Before you can use the extension, you need to load it into Chrome:
- Open Chrome Browser: Navigate to the Chrome extensions page by entering
chrome://extensions/
in the address bar or selecting "Extensions" from the Chrome menu. - Enable Developer Mode: On the top-right corner of the extensions page, toggle on the "Developer mode".
- Load Unpacked: You will see three options appear:
Load unpacked
,Pack extension
, andUpdate
. Click on theLoad unpacked
button. - Select Your Extension Folder: Navigate to the directory where you saved the
VexaSearch
folder and select it. Your extension should now be loaded into Chrome and appear in the list of installed extensions.
Testing the Extension:
Once loaded, you can test the extension to ensure it's working as expected:
- Open a Website: Navigate to any website in your Chrome browser.
- Right-click on the Page: In the context menu that appears, you should see the option "Save url to Vexxa Search".
- Select the Option: Clicking on this will trigger the
background.js
script, fetching the current site's data and sending it to our backend for saving.
2. Setting Up the Frontend for VexxaSearch Application
For this frontend setup, we'll be leveraging React as our primary framework, while employing Vite for faster and leaner builds. We'll be using Mantine for UI components, making our application look neat, and TanStack's React-Query to efficiently handle our application's asynchronous data.
Note: This tutorial will not delve into setting up Mantine or React-Query as their respective documentation is quite comprehensive. To set them up:
- Mantine setup with Vite: Follow this guide.
- React-Query setup: Follow this guide.
Frontend Code Explanation
Let's break down the provided
App.tsx
code:tsx
import { useDisclosure } from "@mantine/hooks";
import {
AppShell,
Burger,
Text,
Container,
TextInput,
Card,
} from "@mantine/core";
import React from "react";
import { useQuery } from "@tanstack/react-query";
- Imports: We're importing essential hooks, UI components, and the main React package.
useQuery
from React-Query helps manage and fetch asynchronous data.
tsx
export default function App() {
const [opened, { toggle }] = useDisclosure();
const [search, setSearch] = React.useState<string | undefined>();
- State Initialization: We have two state variables - one for toggling the navbar and another to manage the search query.
tsx
const { data, status } = useQuery(
["searchLinks", search],
async () => {
if (!search) return { results: [] };
const response = await fetch("http://localhost:3000/link/search", {
body: JSON.stringify({ query: search }),
method: "POST",
headers: {
"Content-Type": "application/json",
},
});
const data = await response.json();
return data as {
results: {
pageContent: string;
metadata: string;
}[];
};
},
{
enabled: !!search,
}
);
- Data Fetching: With React-Query's
useQuery
, we fetch data based on the search term. Notice theenabled
option; this ensures that the fetch only occurs when there's a search term.
tsx
return (
<AppShell header={{ height: 60 }} navbar={{ ... }}>
...
<TextInput
value={search}
onChange={(e) => setSearch(e.currentTarget.value)}
required
placeholder="Search your links"
/>
...
<Container>
{status === "success" && data.results.map((result) => (
<Card
shadow="xs"
padding="sm"
radius="sm"
style={{ marginBottom: 10 }}
>
<Text>{result.pageContent}</Text>
<Text size="xs" color="gray">
{JSON.parse(result.metadata).url}
</Text>
</Card>
))}
</Container>
</AppShell>
);
- UI Rendering: This is where Mantine shines, providing us with a clean and modern user interface. Users can search for their saved links, and the results are displayed in neat cards, with the page content and its URL.
Full code:
tsx
import { useDisclosure } from "@mantine/hooks";
import {
AppShell,
Burger,
Text,
Container,
TextInput,
Card,
} from "@mantine/core";
import React from "react";
import { useQuery } from "@tanstack/react-query";
export default function App() {
const [opened, { toggle }] = useDisclosure();
const [search, setSearch] = React.useState<string | undefined>();
const { data, status } = useQuery(
["searchLinks", search],
async () => {
if (!search) return { results: [] };
const response = await fetch("http://localhost:3000/link/search", {
body: JSON.stringify({ query: search }),
method: "POST",
headers: {
"Content-Type": "application/json",
},
});
const data = await response.json();
return data as {
results: {
pageContent: string;
metadata: string;
}[];
};
},
{
enabled: !!search,
}
);
return (
<AppShell
header={{ height: 60 }}
navbar={{ width: 300, breakpoint: "sm", collapsed: { mobile: !opened } }}
padding="md"
>
<AppShell.Header>
<Burger opened={opened} onClick={toggle} hiddenFrom="sm" size="sm" />
<Text m="md" size="xl" fw={700}>
VexxaSearch
</Text>
</AppShell.Header>
<Container style={{ paddingTop: 100 }}>
<TextInput
value={search}
onChange={(e) => setSearch(e.currentTarget.value)}
required
placeholder="Search your links"
/>
</Container>
<Container>
{status === "success" &&
data.results.map((result) => (
<Card
shadow="xs"
padding="sm"
radius="sm"
style={{ marginBottom: 10 }}
>
<Text>{result.pageContent}</Text>
<Text size="xs" color="gray">
{JSON.parse(result.metadata).url}
</Text>
</Card>
))}
</Container>
</AppShell>
);
}
Testing the Application
To test the application, we'll need to start the backend server and the frontend application:
- Start the Backend: Navigate to the
server
directory and runnpm run dev
. This will start the backend server athttp://localhost:3000
. - Start the Frontend: Navigate to the
client
directory and runnpm run dev
. This will start the frontend application athttp://localhost:5173
. - Open the Extension: Navigate to any website and right-click on the page. Select "Save url to Vexxa Search" from the context menu. This will save the link to the backend.
- Open the Application: Navigate to
http://localhost:5173
and search for the link you saved. You should see the link's content and URL displayed in a neat card.
Demo
This snapshot showcases the Chrome extension in action. With a simple right-click, users can easily save their current webpage.
Interface of the VexxaSearch application
You can find the full code for the frontend application here
Conclusion
Building VexxaSearch has been quite a journey! We started with the backend, utilizing Fastify as our server framework and Supabase as our data storage solution. We then set up a convenient Chrome extension that allows users to save URLs with just a simple right-click. And lastly, we created an efficient frontend with React, enhanced with Mantine for UI components and React-Query for state management. This holistic solution ensures a seamless process from saving the links to retrieving and searching them. It's proof of the power that comes from combining different modern technologies to create an efficient, user-friendly application.