Building a Personalized Link Search Engine with Supabase pgvector, TransformerJS, and Langchain
Navigating the vast digital world requires efficient tools, especially for link discovery. In our project, we've combined the strengths of Supabase pgvector, transformerJS, and LangChain. Together, these tools power our Fastify-based backend. We also designed a user-friendly frontend using React and Vite. In this article, we'll walk you through our project's inner workings, both backend and frontend. Let's dive in!
Installation and Setup backend
Before diving into the core functionalities of our project, we'll need to set up the necessary tools and libraries. This will ensure a smooth development process, and a seamless integration of Supabase pgvector, transformerjs, and Langchain within our Fastify framework.
1. Setting up Fastify-CLI
Fastify-CLI simplifies the process of bootstrapping and running Fastify applications. Install it globally using the following command:
bash
npminstallfastify-cli--global
2. Building the Boilerplate
Once Fastify-CLI is installed, we'll generate the project boilerplate tailored for our needs. We'll be utilizing the ESM (ECMAScript Modules) and TypeScript for a more robust development experience.
Run the following command to generate the boilerplate:
bash
fastifygenerate--esm--lang=tsserver
Installing Dependencies
With the basic structure in place, let's install the necessary dependencies to power our backend:
These packages bring in the essential tools for our link search functionalities, connecting our backend with the capabilities of tansformerJS, Supabase, and Langchain, while also allowing us to convert HTML content to plain text.
Database Setup for Hybrid Search in Supabase
To implement a powerful and efficient search functionality in our backend, we will utilize the hybrid search capabilities provided by the LangChain library. This method integrates vector similarity search with keyword-based search, ensuring both precision and flexibility. Below is a step-by-step breakdown of our database setup:
1. Enabling pgvector Extension
Firstly, we need to enable the pgvector extension, which is pivotal for working with embedding vectors.
sql
create extension vector;
2. Creating the Documents Table
The documents table will store the information and content of the links we wish to index.
id: A unique identifier for each document.
content: The main content of the document, corresponding to html to text content.
url: The link we saved.
metadata: Supplementary data about the url, corresponding to Document.metadata.
embedding: A vector representation of the document's content. The dimension 384 is commonly used for embeddings.
The match_documents function is designed to perform a similarity search for documents based on embeddings. This function returns a list of documents ranked by their similarity to a provided query embedding.
The kw_match_documents function executes a keyword-based search on the documents table. This is especially useful when you want to search documents based on specific terms or phrases. The results are ranked by their relevance to the provided query.
sql
createfunctionkw_match_documents(query_text text, match_count int)returnstable (id bigint, content text, metadata jsonb, similarity real)as $$beginreturn query executeformat('select id, content, metadata, ts_rank(to_tsvector(content), plainto_tsquery($1)) as similarityfrom documentswhere to_tsvector(content) @@ plainto_tsquery($1)order by similarity desclimit $2')using query_text, match_count;end;$$ language plpgsql;
Integrating Supabase with Fastify
To make our backend efficient and well-integrated with Supabase, we need to set up a plugin within Fastify. This approach not only encapsulates the Supabase connection logic but also ensures that our Supabase client can be conveniently accessed throughout our application. Let's break down the steps and the code:
1. Supabase Credentials
Before diving into the code, ensure you've copied the Supabase URL and the anonymous key from your Supabase dashboard. These will be required to establish a connection to your Supabase project.
2. Creating the Fastify Plugin for Supabase
Navigate to the plugins directory and create a new file named supabase.ts. This file will contain the code for our Fastify plugin.
We extend Fastify's main instance interface to declare that it will also have a supabase property. This augmentation lets us attach the Supabase client to any Fastify instance, making it available throughout the application by requesting request.server.supabase.
We fetch the Supabase URL and anonymous key from environment variables.
We initialize the Supabase client using createClient.
We attach the Supabase client to the Fastify server instance using the decorate method.
An onClose hook logs a message when the Fastify server is closed, indicating that the Supabase connection is also closed.
Finally, we export the plugin, making it available for inclusion in our Fastify server setup.
With this plugin, our backend is now effectively integrated with Supabase, enabling smooth data operations and ensuring that Supabase's functionalities are readily accessible throughout the application.
Routes: Handling and Searching Links
As the backbone of our application, managing and querying saved links is crucial. This functionality is enshrined in the link.ts file, located inside the src/routes/ directory. Acting as the gateway at http://localhost:3000/links, this file dictates how our app handles link-related operations. Let's walk through link.ts to understand its core responsibilities and the magic it brings to our backend.
1. Required Modules & Types
First, we import the necessary modules. These span from Fastify core modules to specialized libraries from LangChain that will be instrumental in our link processing and retrieval.
We also define two request types:
SaveLinkRequest: Accepts a body with url and content for saving a link.
SearchLinkRequest: Accepts a body with a query (search term) and an optional count to limit the number of search results.
2. Saving a Link (/save Endpoint)
When a POST request is made to /links/save, the application saves the provided link to the Supabase database.
Define a limit on the number of results based on the count in the request body or default to 3.
Initialize the HuggingFace's transformer model for embeddings.
Use LangChain's SupabaseHybridSearch to enable a combination of vector similarity search and keyword-based search.
Retrieve relevant documents (links) based on the provided query.
The SupabaseHybridSearch is particularly powerful, offering a flexible search experience by harnessing both embeddings and keyword search.
With these routes in place, our backend now possesses the capability to save links efficiently, embedding their content for quick retrievals. Furthermore, our hybrid search ensures that when users search for content, they get the most relevant links, be it by content similarity or keyword match.
This setup, leveraging Fastify's performance and Langchain's sophisticated tools and supabase database management, ensures a seamless and effective link management system.
Setting Up the Frontend: Integrating Chrome Extension & Search Application
Building a search engine backend without a frontend interface is like constructing a library with no entrance. In this section, we will bring our backend to life by setting up a frontend in the form of a Chrome extension. The extension will serve as an easy mechanism for users to save links and will complement our search application seamlessly. Here's a step-by-step guide to establish the "VexaSearch" Chrome extension:
1. Crafting the Chrome Extension
To begin, let's set up the necessary files for our extension:
Folder Structure:
/VexaSearch|-- manifest.json|-- background.js
manifest.json:
The manifest.json is the metadata file for Chrome extensions. It provides essential details to the Chrome browser about how the extension should function and what permissions it requires.
manifest_version: Specifies which version of the manifest specification the package requires.
name: The name of our extension, "VexaSearch".
version: The version of our extension.
description: A brief description.
permissions: A list of permissions the extension needs .
host_permissions: Defines which websites our extension can access.
background: Specifies that background.js will serve as a service worker, acting as the backbone of our extension.
background.js:
This is the brain of our extension, housing the logic that will run in the background.
javascript
chrome.contextMenus.create({title:"Save url to Vexxa Search",id:"vexxa",contexts:["page"],});constfetchCurrentSite= () => {consturl=newURL(window?.location?.href);constentirePage=document.documentElement.outerHTML;return {url:url.href,content:entirePage,};};chrome.contextMenus.onClicked.addListener(async (info, tab) => { if (info.menuItemId=="vexxa") {consttabId=tab.id;constresponse=awaitchrome.scripting.executeScript({target:{tabId:tabId,},func:fetchCurrentSite,});console.log(response); if (response.length!==0) {constdata=response[0]["result"]; if (data) {console.log(data);constresponse=awaitfetch("http://localhost:3000/link/save", {method:"POST",body: JSON.stringify(data),headers:{"Content-Type":"application/json",},}); if (response.status===200) {constdata=awaitresponse.json();console.log(data); } } } }});
Creating Context Menu : We add a context menu item named "Save url to Vexxa Search". When users right-click on a page, they will see this option, allowing them to save the URL directly.
Fetching Current Site: The function fetchCurrentSite captures the current URL and the entire HTML content of the page.
Event Listener for Menu Click: When the user selects "Save url to Vexxa Search" from the context menu, the event listener fires. It first fetches the current page's URL and content, and then sends a POST request to our backend at http://localhost:3000/link/save, saving the link's data.
Loading the Extension:
Before you can use the extension, you need to load it into Chrome:
Open Chrome Browser: Navigate to the Chrome extensions page by entering chrome://extensions/ in the address bar or selecting "Extensions" from the Chrome menu.
Enable Developer Mode: On the top-right corner of the extensions page, toggle on the "Developer mode".
Load Unpacked: You will see three options appear: Load unpacked, Pack extension, and Update. Click on the Load unpacked button.
Select Your Extension Folder: Navigate to the directory where you saved the VexaSearch folder and select it. Your extension should now be loaded into Chrome and appear in the list of installed extensions.
Testing the Extension:
Once loaded, you can test the extension to ensure it's working as expected:
Open a Website: Navigate to any website in your Chrome browser.
Right-click on the Page: In the context menu that appears, you should see the option "Save url to Vexxa Search".
Select the Option: Clicking on this will trigger the background.js script, fetching the current site's data and sending it to our backend for saving.
2. Setting Up the Frontend for VexxaSearch Application
For this frontend setup, we'll be leveraging React as our primary framework, while employing Vite for faster and leaner builds. We'll be using Mantine for UI components, making our application look neat, and TanStack's React-Query to efficiently handle our application's asynchronous data.
Note: This tutorial will not delve into setting up Mantine or React-Query as their respective documentation is quite comprehensive. To set them up:
Imports: We're importing essential hooks, UI components, and the main React package. useQuery from React-Query helps manage and fetch asynchronous data.
Data Fetching: With React-Query's useQuery, we fetch data based on the search term. Notice the enabled option; this ensures that the fetch only occurs when there's a search term.
UI Rendering: This is where Mantine shines, providing us with a clean and modern user interface. Users can search for their saved links, and the results are displayed in neat cards, with the page content and its URL.
To test the application, we'll need to start the backend server and the frontend application:
Start the Backend: Navigate to the server directory and run npm run dev. This will start the backend server at http://localhost:3000.
Start the Frontend: Navigate to the client directory and run npm run dev. This will start the frontend application at http://localhost:5173.
Open the Extension: Navigate to any website and right-click on the page. Select "Save url to Vexxa Search" from the context menu. This will save the link to the backend.
Open the Application: Navigate to http://localhost:5173 and search for the link you saved. You should see the link's content and URL displayed in a neat card.
Demo
This snapshot showcases the Chrome extension in action. With a simple right-click, users can easily save their current webpage.
Interface of the VexxaSearch application
You can find the full code for the frontend application here
Conclusion
Building VexxaSearch has been quite a journey! We started with the backend, utilizing Fastify as our server framework and Supabase as our data storage solution. We then set up a convenient Chrome extension that allows users to save URLs with just a simple right-click. And lastly, we created an efficient frontend with React, enhanced with Mantine for UI components and React-Query for state management. This holistic solution ensures a seamless process from saving the links to retrieving and searching them. It's proof of the power that comes from combining different modern technologies to create an efficient, user-friendly application.