Thursday 30 November 2023

Simple chat app using openai client.chat.completions endpoint after openai version >= 1.2.0

import streamlit as st
import openai
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv
# Load environment variables
load_dotenv(find_dotenv())

st.title('OpenAI Chat App')

client = OpenAI()

user_input = st.text_input("You: ", "")
if st.button('Send'):
    response = client.chat.completions.create(
      model="gpt-3.5-turbo",
      messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": user_input}
        ],
    temperature=0
    )
    st.write('Assistant: ', response.choices[0].message.content)

This is a Python script that uses the Streamlit library to create a web application, and the OpenAI API to generate responses from the GPT-3 model.

Here’s a breakdown of what each part of the code does:

Importing necessary libraries:

import streamlit as st
import openai
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv

These lines import the necessary libraries. streamlit is a library for building web applications, openai is the OpenAI API client, and dotenv is used to load environment variables from a .env file.

Loading environment variables:

load_dotenv(find_dotenv())

This line loads environment variables from a .env file in your project directory. This is typically used to securely store sensitive information like API keys.

Setting up the Streamlit app:

st.title('OpenAI Chat App')

This line sets the title of the web application to ‘OpenAI Chat App’.

Creating an OpenAI client:

client = OpenAI()

This line creates an instance of the OpenAI client, which is used to interact with the OpenAI API.

Getting user input:

user_input = st.text_input("You: ", "")

This line creates a text input field in the web application where the user can enter their message.

Sending the user’s message to the OpenAI API and displaying the response:

if st.button('Send'):
    response = client.chat.completions.create(
      model="gpt-3.5-turbo",
      messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": user_input}
        ],
    temperature=0
    )
    st.write('Assistant: ', response.choices[0].message.content)

When the ‘Send’ button is clicked, this block of code sends the user’s message to the OpenAI API, along with a system message that sets the behavior of the assistant. The response from the API is then displayed in the web application.

The temperature parameter controls the randomness of the AI’s output. A value of 0 makes the output deterministic, meaning the AI will always choose the most likely next word when generating its response.

Simple Chat App with PDF documents using OpenAIEmbeddings, FAISS index, langchain and streamlit

import streamlit as st
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv
from PyPDF2 import PdfReader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS 
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
import pandas as pd

# Load environment variables
load_dotenv(find_dotenv())

# Create an instance of OpenAI
client = OpenAI()

# Create an instance of the OpenAI embeddings
embeddings = OpenAIEmbeddings()

# Create a text splitter
text_splitter = CharacterTextSplitter(separator="\n", chunk_size=1000, chunk_overlap=200, length_function=len)

# Load the question answering chain
chain = load_qa_chain(OpenAI(), chain_type="stuff")

# Streamlit app
st.title('Question Answering App')

# Upload the PDF file
uploaded_file = st.file_uploader("Choose a PDF file", type="pdf")

if uploaded_file is not None:
    # Read the PDF file
    doc_reader = PdfReader(uploaded_file)

    # Split the text into chunks
    raw_text = ''
    for i, page in enumerate(doc_reader.pages):
        text = page.extract_text()
        if text:
            raw_text += text
            
    texts = text_splitter.split_text(raw_text)

    # Create a FAISS index from the texts
    docsearch = FAISS.from_texts(texts, embeddings)

    # Input the question
    query = st.text_input('Enter your question:')

    if st.button('Answer'):
        # Perform a similarity search
        docs = docsearch.similarity_search(query)

        # Run the chain and get the answer
        answer = chain.run(input_documents=docs, question=query)

        # Display the answer
        st.write(answer)

This Python script is a Streamlit application that uses OpenAI’s language model to answer questions about the content of a PDF file. Here’s a breakdown of what each part of the code does:

Import necessary libraries: The script starts by importing necessary libraries such as streamlit, openai, dotenv, PyPDF2, and several modules from langchain.

Load environment variables: The load_dotenv(find_dotenv()) line loads environment variables from a .env file in your project directory. This is typically used to securely manage sensitive information like API keys.

Create instances of OpenAI and OpenAIEmbeddings: These instances are used to interact with the OpenAI API and to create embeddings (vector representations of text), respectively.

Create a text splitter: This is used to split the text from the PDF into manageable chunks.

Load the question answering chain: This is a sequence of operations that takes a question and a set of documents as input and returns an answer.

Streamlit app setup: The st.title('Question Answering App') line sets the title of the Streamlit app.

Upload the PDF file: The st.file_uploader("Choose a PDF file", type="pdf") line creates a file uploader in the Streamlit app that accepts PDF files.

Read the PDF file and split the text into chunks: If a PDF file is uploaded, the script reads the file, extracts the text from each page, and splits the text into chunks using the text splitter created earlier.

Create a FAISS index from the texts: FAISS is a library for efficient similarity search and clustering of dense vectors. The script creates a FAISS index from the chunks of text, which allows it to quickly find chunks of text that are similar to a given query.

Input the question: The st.text_input('Enter your question:') line creates a text input in the Streamlit app where you can enter your question.

Answer the question: If the ‘Answer’ button is clicked, the script performs a similarity search to find chunks of text that are similar to the question, runs the question answering chain on these chunks, and displays the answer in the Streamlit app.

Simple Chat with PDF documents using OpenAIEmbeddings, FAISS index and langchain

07 chat with pdf document using langchain

In [ ]:

from openai import OpenAI  # Works only with openai version >= 1.2.0
from dotenv import load_dotenv,find_dotenv
import pandas as pd
load_dotenv(find_dotenv())
client = OpenAI()
from PyPDF2 import PdfReader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

Reading in the PDF

In [ ]:

doc_reader = PdfReader('impromptu-rh.pdf')

In [ ]:

doc_reader

Out[ ]:

<PyPDF2._reader.PdfReader at 0x26b1132ab30>

In [ ]:

# read data from the file and put them into a variable called raw_text
raw_text = ''
for i, page in enumerate(doc_reader.pages):
    text = page.extract_text()
    if text:
        raw_text += text

In [ ]:

len(raw_text)

Out[ ]:

In [ ]:

raw_text[:100]

Out[ ]:

'Impromptu\nAmplifying Our Humanity \nThrough AI\nBy Reid Hoffman  \nwith GPT-4Impromptu: AmplIfyIng our '

Text Splitter This takes the text and splits it into chunks. The chunk size is characters not tokens

In [ ]:

# Splitting up the text into smaller chunks for indexing
text_splitter = CharacterTextSplitter(        
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap  = 200, #striding over the text
    length_function = len,
)
texts = text_splitter.split_text(raw_text)
len(texts)

Out[ ]:

In [ ]:

texts[20]

Out[ ]:

'Because, really, an AI book? When things are moving so \nquickly? Even with a helpful AI on hand to speed the process, \nany such book would be obsolete before we started to write it—\nthat’s how fast the industry is moving.\nSo I hemmed and hawed for a bit. And then I thought of a frame \nthat pushed me into action.\nThis didn’t have to be a comprehensive “book” book so much as \na travelog, an informal exercise in exploration and discovery, \nme (with GPT-4) choosing one path among many. A snapshot \nmemorializing—in a subjective and decidedly not definitive \nway—the AI future we were about to experience.\nWhat would we see? What would impress us most? What would \nwe learn about ourselves in the process? Well aware of the brief \nhalf-life of this travelog’s relevance, I decided to press ahead.\nA month later, at the end of November 2022, OpenAI released \nChatGPT, a “conversational agent,” aka chatbot, a modified \nversion of GPT-3.5 that they had fine-tuned through a process'

In [ ]:

texts[10]

Out[ ]:

'one. They’re just going to stare at you with that smug \nlook on their face and say, “You know this is a serious \nsafety hazard. You could have a fire, you could have a \nlawsuit, you could have a bad Yelp review. You better get \non that, pal.” And then they leave, and you’re left with a \ndark corner of your dining room, where no one wants to \nsit, where the food looks unappetizing, where the mood \nis gloomy. And you think to yourself, how many restau-\nrant inspectors does it take to change a lightbulb? None. \nThey don’t change lightbulbs. They just make you feel \nbad about yours.\nPretty good, right? Here’s another:\nReid: How many restaurant inspectors does it take to \nchange a lightbulb? Answer in the style of the philoso -\npher Ludwig Wittgenstein. \nGPT-4:  The question of how many restaurant inspectors \nit takes to change a lightbulb cannot be answered with \na simple number. In fact, the answer to this question lies \nmore in the realm of language than it does in the realm'

Making the embeddings

In [ ]:

# Download embeddings from OpenAI: The OpenAIEmbeddings() function is used to download embeddings from OpenAI. 
#These embeddings are vector representations of words or phrases, which capture their meanings.
embeddings = OpenAIEmbeddings()
#The FAISS.from_texts(texts, embeddings) function is used to create a FAISS (Facebook AI Similarity Search) index 
# from the texts using the downloaded embeddings. This index can be used to perform efficient similarity search 
# among the texts.
docsearch = FAISS.from_texts(texts, embeddings)
docsearch.embedding_function

Out[ ]:

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x0000026B34CE2C80>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x0000026B34D04F40>, model='text-embedding-ada-002', deployment='text-embedding-ada-002', openai_api_version='', openai_api_base=None, openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key='sk-xxxxxxxx', openai_organization=None, allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, http_client=None)

In [ ]:

#The docsearch.similarity_search(query) function is used to perform a similarity search in the FAISS index
# using the query. This returns a list of documents that are most similar to the query.
query = "how does GPT-4 change social media?"
docs = docsearch.similarity_search(query)
len(docs)

Out[ ]:

In [ ]:

docs[0]

Out[ ]:

Document(page_content='cian, GPT-4 and ChatGPT are not only able but also incredi-\nbly willing to focus on whatever you want to talk about.4 This \nsimple dynamic creates a highly personalized user experience. \nAs an exchange with GPT-4 progresses, you are continuously \nfine-tuning it to your specific preferences in that moment. \nWhile this high degree of personalization informs whatever \nyou’re using GPT-4 for, I believe it has special salience for the \nnews media industry.\nImagine a future where you go to a news website and use \nqueries like these to define your experience there:\n4  Provided it doesn’t violate the safety restrictions OpenAI has put on \nthem.93Journalism\n● Hey, Wall Street Journal, give me hundred-word summa-\nries of your three most-read tech stories today.\n● Hey, CNN, show me any climate change stories that hap-\npened today involving policy-making.\n● Hey, New York Times, can you create a counter-argument \nto today’s Paul Krugman op-ed, using only news articles')

Plain QA Chain

In [ ]:

from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

In [ ]:

#load_qa_chain(OpenAI(), chain_type="stuff") is loading a specific type of tool called a “question answering chain”. 
#This tool is designed to answer questions based on a set of documents.
#The chain_type="stuff" part is specifying the type of question answering chain to load. 
#In this case, it’s a “stuff” type, which means all the documents are going to be “stuffed” or 
#inputted into the chain at once.
chain = load_qa_chain(OpenAI(), 
                      chain_type="stuff") # we are going to stuff all the docs in at once
chain

Out[ ]:

StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:"), llm=OpenAI(client=<openai.resources.completions.Completions object at 0x0000026B3B81D330>, async_client=<openai.resources.completions.AsyncCompletions object at 0x0000026B3B839540>, openai_api_key='sk-xxxxxx', openai_proxy='')), document_variable_name='context')

In [ ]:

# check the prompt
chain.llm_chain.prompt.template

Out[ ]:

"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:"

In [ ]:

#The chain.run(input_documents=docs, question=query) function is used to run the chain on the input documents 
#and question. This generates an answer to the question based on the input documents.
query = "who are the authors of the book?"
docs = docsearch.similarity_search(query)
chain.run(input_documents=docs, question=query)

Out[ ]:

' Reid Hoffman and Sam Altman.'

In [ ]:

query = "who is the author of the book?"
query_02 = "has it rained this week?"
docs = docsearch.similarity_search(query_02)
chain.run(input_documents=docs, question=query)

Out[ ]:

" I don't know."