The Future of E-Commerce: Shopping Online with your AI Assistant

image

The online shopping jungle (AI-generated image)

Let’s face it, online shopping can be an absolute rollercoaster ride of frustration. Trust me, I’ve been there. I actually tried buying a sofa online a few days ago, and what a nightmare! I was on a mission: find the perfect convertible corner sofa without breaking the bank. Juggling multiple tabs with different websites, each having its own confusing menu and filters, led me to choices that were not even close to what I was looking for. Comparing all the options, keeping track of prices, and comparing delivery delays quickly gave me a headache. Whenever I came across a corner sofa that I liked, it turned out to be non-convertible, and vice versa. And so-called intelligent search services like Google Shopping were useless to me. Not fun!

At Joko, we are convinced that this cannot be the future of online shopping, and we are putting a lot of effort into revolutionizing it. Our custom mobile browser and intelligent browser extensions help you save time, money, and shop more responsibly – no matter what e-commerce site you’re on. Today, we provide features like price tracking, price comparison, carbon footprint calculation, auto-applying promo codes, and cashback; and our goal tomorrow is to become your ultimate trusted guide through the online shopping jungle.

Toward the universal shopping assistant

Wouldn’t it have been amazing to have my own dedicated personal shopper during my search for the perfect sofa? Someone who would know exactly what I wanted and would have been able to offer me a hand-picked selection that matched both my taste and budget? And of course, the dream would be to extend this luxury to all online purchases, and to have at your disposal custom assistance for anything you can shop online, whether it is running shoes or headphones – an ultimate companion that guides you in the overwhelming sea of online shopping options, making the whole experience a lot less stressful.

The good news is that this dream is now achievable with the progress of Machine Learning. The recent breakthroughs in Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) in particular, have paved the way toward shaping incredible applications that were completely out of reach a few years ago. Building a universal shopping assistant that’s able to help users on virtually any shopping request is one such application in which Joko is investing a lot of resources today.

Despite the latest advancements in the field, crafting the perfect shopping assistant remains a highly complex challenge from both a product and engineering perspective. In this blog post, we intend to sketch out the fundamental steps to develop a prototype assistant, without diving deep into the complex details of this complex task. For the sake of clear understanding, we’ll simplify various aspects, yet ensure to touch upon all the crucial elements required to build a simple yet powerful shopping assistant.

Let’s get started

What could our assistant prototype look like? Picture an advanced algorithm that chats with you just like a store clerk, but with the added benefit of instantly pulling up the most relevant products from a database. Imagine a user interface where you chat with this virtual assistant, similar to ChatGPT, alongside a display featuring top product recommendations. Keep this vision in mind; it will be useful for understanding what we’ll discuss next.

image

Illustrative demonstration of our universal shopping assistant

Our problem can naturally be split into two blocks: running the conversation itself (the chatbot) and retrieving relevant products (the search). We will describe how each of these blocks can be built before addressing how we can assemble them together.

Creating an effective chatbot

The go-to way to create a chatbot today is to use a Large Language Model (LLM). Recent LLMs are extremely powerful for handling any query and providing human-like interactions. If you’re not up to speed on this ongoing revolution, you should read about it here for an overview (or here for more technical details).

Choosing the ideal LLM that fits the needs and constraints of our chatbot is easier said than done. With a constantly evolving landscape made of a wide variety of proprietary and open-source solutions of all kinds, it is easy to get lost. Moreover, today’s insights could be tomorrow’s old news. That being said, the landscape generally narrows when you match models to your specific needs. For our shopping assistant, we want responses that can hold their own against human conversations, and we need them fast – ideally in just a few seconds – to keep the user experience smooth. Not all models can offer this, far from it.

One thing that is certain: if the goal is to quickly develop an educational prototype (which is our objective in this article), using open-source models typically introduces additional complexity to the task. Setting up open-source models is not a straightforward process, and accessing GPUs for computation can present challenges. Proprietary models, like GPT-3 and GPT-4 by OpenAI, generally provide user-friendly APIs, come with good documentation, and deliver exceptional quality out of the box, which is difficult to surpass. That is why we decided to use gpt-3.5-turbo to build the small demo that we will present below.

Note that while we’re mindful of cost, it’s not a deal-breaker for the prototype we’re building together here. The equation can change drastically if cost becomes a priority, which is generally the case for production use. Of course, in our R&D work at Joko, we closely study open source models to address these types of questions (we will come back to this at the end).

Now that we have put the thorny question of chatbot construction behind us, let’s turn to our second block: the search.

To reproduce the level of assistance that a store clerk can provide in a physical store, we have to go beyond traditional search approaches like keyword search. Instead of relying only on keywords, we want to take into account the whole context and intent behind our user’s queries, which is usually referred to as semantic search.

How does textual semantic search work in practice? Using Natural Language Processing (NLP) models, we can represent any piece of text as a vector, often referred to as an embedding. This vector encodes the meaning of the text into numbers. This is particularly interesting because it allows mathematical operations on the text and makes it possible to quantify the concept of similarity. This way, we can typically obtain a number between -1 and 1 that indicates how close two pieces of text are in terms of semantic meaning (a value close to 1 indicates a high semantic similarity).

image

Example of embeddings in a vector space

Using these embeddings, we can build a semantic search workflow to retrieve products from our catalog, essentially a database with each element being a set of attributes of a product: name, price, description, etc. The initial step is to represent each product as a single piece of text and to embed this piece of text. This way, each product of our catalog can be represented by a vector. Following this, when a user submits a search query, we embed this as well, resulting in a vector that encapsulates the user’s search intent. Finally, we extract the most similar products, i.e. those whose embeddings have the highest similarity value with the embedding of our query (this step is called KNN search).

image

Semantic search algorithm

To implement this in practice, we need to choose an embedding model. As for the choice of an LLM to build the chatbot, there is a huge variety of options out there. For the small demo that we present below, we chose a cheap, easy-to-use, low latency, but yet powerful model developed by OpenAI, called text-embedding-ada-002.

We also need to find a way to represent each product with a single piece of text. A natural option consists of concatenating the different properties that we have for each product. For instance, the text representation of the Nike Air Max shoes would be the following:

"""
Category: shoes
Name of the product: Nike Air Max
Description: The iconic Air Max shoe from Nike
"""

We can do the same for every product and then rely on the embedding algorithm to convert this information into vectors in a meaningful way.

Now that we have the chatbot and the semantic search, the last part of the project is to combine them together!

Integrating chatbot and search: birth of a shopping assistant

The pipeline that we built for our small prototype works as follows. It takes as input the last query of the user and the former messages of the conversation. Then, based on this information, it decides if retrieving products is necessary or not. This is important because we don’t want to return products for all queries, like “Hey Joko! Can you help me?”. Finally, when providing a final answer to the user, we can help them refine their query, typically by asking follow-up questions. This flow is repeated each time the user inputs a new query.

image

Shopping assistant pipeline

Such a pipeline can be implemented with the agent paradigm, which consists of providing the LLM with a list of tools, along with a description of when and how to use them. Based on the user’s query, the LLM can decide which tool to use, and determine the arguments with which the tool needs to be used. The chosen tool can then be called with the information extracted by the model. The landscape around LLM agents is evolving quickly, and there are many ways to implement this in practice. For a quick introduction on this topic, we recommend OpenAI’s approach to this paradigm, called function calling.

In our scenario, we need to define a product-retrieval tool and provide its description to the LLM, so that it can be invoked when responding to the user’s query requires searching for products in the catalog. This tool consolidates its arguments into a single text, then executes the semantic search logic discussed in the previous section, and forwards the information about the retrieved products to an LLM to generate a final answer.

Let’s take a concrete example and break it down. Suppose a user inputs “I want to buy a new iPhone”. The LLM will recognize the need to use our product-retrieval tool with the arguments {category: 'smartphone', name: 'iPhone'}. The tool will then use our semantic search with the following input text:

"""
Category: smartphone
Name of the product: iPhone
"""

This will enable us to display some iPhone options to the user. Simultaneously, the LLM will generate a response to the initial query, and help the user refine their needs.

By following the steps outlined above, we’ve created a small demo using Python and Streamlit. You can view its performance in the video below. It’s surprising to see how good the results can be with such a straightforward setup!

image

Demo shopping assistant web application

And… that’s it! If you got here, you now have all the ingredients to build your own assistant prototype.

Scaling up, from proof of concept to production

The prototype presented above is interesting to demonstrate the feasibility of such a project and illustrate the basic concepts around LLMs, but building an assistant that can truly help millions of users find the right products in a catalog containing millions of products is a whole different challenge.

Firstly, as briefly mentioned above, choosing the right models is a challenge in itself. Excellent performance can be achieved with proprietary models like GPT-4, but they present many issues, especially in terms of cost and privacy. These problems can be mitigated by using open-source models, but other questions then arise, such as the quality of results and access to GPUs. Things in this field are evolving very quickly, and the recent releases of LLaMA 2, Falcon, and Mistral models have paved the way toward open-source models that can really compete with proprietary ones.

Another technical challenge is to be able to perform efficient vector search on a product catalog containing tens, or even hundreds of millions of products. Specialized databases for vector search, such as Pinecone, Weaviate, or Milvus, exist for this purpose, but the landscape in this domain is also constantly evolving and many implementation challenges arise in practice, like the combination of a semantic search and a structured search.

Note that we have also entirely bypassed discussing the construction of a substantial product catalog in this blog post, which is a complex and fascinating problem that Joko invests considerable efforts into.

Crafting tomorrow’s shopping experience: a product challenge of unprecedented complexity

A completely different type of challenge concerns the product design itself. Being able to interact with a shopping assistant solely through natural language, as we do today with ChatGPT, is clearly not the best UX. Allowing users to naturally navigate through a sea of products, filter results with a few clicks, and specify their intent as smoothly as possible is a real puzzle that requires inventing a whole new experience that simply doesn’t exist today.

This challenge can only be met with an obsession for product quality and relentless pursuit of the right user experience. This goal is incredibly difficult to achieve and will only be within reach of companies that place product excellence at the center of their culture, just as we strive to do at Joko.

Although LLM applications are currently flourishing, the question remains as to which applications will survive the hype and bring real value to businesses and users. One thing is sure: e-commerce is clearly one of the fields where GenAI can completely shake things up. We believe that the current online shopping experience is broken, and that LLMs can play a key role in re-inventing this experience, provided that they are applied with a user-centric mindset.

If you are interested in helping us make a real difference in the online world, do reach out, we are hiring!