top of page


  • Writer's pictureRoman Guoussev-Donskoi

Leveraging OpenAI and Azure Services to chat with your own data.

Updated: Jul 22

In this blog post we explore an approach to data search, leveraging the power of OpenAI and Azure services. We use RAG (Retrieval-Augmented Generation) pattern to create a chatbot that can converse with users using domain-specific data that you provide.

The code sample employs the use of vector databases, a technique that it seems is becoming the standard for grounding large language models (LLMs) in large datasets. The amount of code required is minimal. All of the steps can be efficiently executed within a single Jupyter notebook. The example code is provided in GitHub and is a combination of examples provided by Microsoft:

A High level description of steps is below:

  1. Extract text from PDF: Extract text from PDF files with Azure Form Recognizer.

  2. Create a Vector-Based Searchable Database: We generate vector embeddings for each document using OpenAI's text-embedding-ada-002 model. By creating Azure Cognitive Search vector-based database we provide a more accurate and contextually aware search.

  3. Generate embeddings for user input: To enable search vector database we now generate embeddings for user input using OpenAI text-embedding-ada-002 model.

  4. Search Vector Database and pass results to ChatGPT: The results obtained from our vector-based search are integrated into the prompt of OpenAI's gpt-3.5-turbo, effectively grounding ChatGPT with our own data. This allows the model to generate responses that are specifically tailored to the information within our dataset.

Visual representation of this is in diagram below:

Can work with this example in Microsoft VSCode assuming installed "Jupyter" extension:

In conclusion, by combining OpenAI models with Azure services, we've created a powerful, intelligent search system. This approach allows us to unlock the potential of our data, transforming static PDFs into a dynamic, interactive resource.

76 views0 comments

Recent Posts

See All

LLMs (such as OpenAI) are good for reasoning but they lack capability interface with outside world. This is where Langchain agents step in: agents provide LLMs with tools to perform actions (for examp

Summary Langchain framework makes building of LLM applications much easier, extends capabilities of LLM applications, and introduces structured approach, which facilitates supporting and managing app

Azure Active Directory (Azure AD) Conditional Access is a policy-based system that provides automated access control decisions for accessing your cloud apps. For example Conditional Access policy sess

Home: Blog2


Home: GetSubscribers_Widget


Your details were sent successfully!

Home: Contact
bottom of page