Knowledge Graphs & LLMs with Supply Chain use case:
The first wave of hype for Large Language Models (LLMs) came from ChatGPT and similar web-based chatbots, where the models are so good at understanding and generating text that it shocked people, myself included.
Following the rise of LLM's popularity, many individuals began exploring its integration into their applications. However, merely creating a wrapper around an LLM API proved insufficient, as it lacked the ability to offer additional value, resulting in the potential failure of such applications.
The major drawbacks of LLM are hallucinations and limited knowledge. One way to overcome the issue is to use retrieval-augmented approach. The idea behind the retrieval-augmented approach is to reference external data at question time and feed it to an LLM to enhance its ability to generate accurate and relevant answers. Lets explore how we can use knowledge data to reference external data.
Knowledge Graph as Condensed Information Storage
If you are paying close attention to the LLM space, you might have come across the idea of using various techniques to condense information for it to be more easily accessible during query time.
For example, you could use an LLM to provide a summary of documents and then embed and store the summaries instead of the actual documents. Using this approach, you could remove a lot of noise, get better results, and worry less about prompt token space.
The process of extracting structured information in the form of entities and relationships from unstructured text has been around for some time. Which is generally called an IEP (Information Extraction Pipeline).
The art of combining an information extraction pipeline with knowledge graphs is that you can process each document individually, and the information from different records gets connected when the knowledge graph is constructed or enriched.
In order to retrieve information from the knowledge graph at query time, we have to construct an appropriate Cypher statement. Luckily, LLMs are pretty good at translating natural language to Cypher graph-query language.
In the above example, the application uses an LLM to generate an appropriate Cypher statement to retrieve relevant information from a knowledge graph. The relevant information is then passed to another LLM call, which uses the original question and the provided information to generate an answer. In practice, you could use different LLMs for generating Cypher statements and answers or use various prompts on a single LLM.
Use Case in Supply Chain
Now that we have seen how we can overcome the problems of LLM with Knowledge base. We will do a small hands on to get our hands dirty.
Incorporating supply chain data into knowledge graphs can significantly enhance the capabilities of large language applications. This approach allows us to structure complex supply chain information into nodes and relationships, thereby generating a holistic picture of how materials, components, and products flow from suppliers to customers.
AI model like ChatGPT can leverage this data structure to produce more accurate and insightful responses about supply chain scenarios, disruptions, or management strategies.
Lets do a small POC using Python.
First, make sure you have the necessary dependencies installed. We will use the pandas library for data processing and the py2neo library to interact with the Neo4j graph database. We will use langchain to form the cypher query and interact with the graoh. You can install them using pip:
pip install pandas py2neo langchain neo4j
Now, let's write the Python module (create_knowledge_graph.py) that creates supply chain knowledge graph.
import pandas as pd from py2neo import Graph, Node, Relationship # Replace the following with your Neo4j database credentials NEO4J_URI = "bolt://localhost:7687" NEO4J_USERNAME = "your_username" NEO4J_PASSWORD = "your_password" def create_knowledge_graph(data_file): # Read the supply chain data from CSV df = pd.read_csv(data_file) # Connect to the Neo4j database graph = Graph(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD)) # Create unique constraints to ensure no duplicates in the graph graph.run("CREATE CONSTRAINT ON (p:Product) ASSERT p.ProductID IS UNIQUE") graph.run("CREATE CONSTRAINT ON (s:Supplier) ASSERT s.SupplierID IS UNIQUE") graph.run("CREATE CONSTRAINT ON (c:Customer) ASSERT c.CustomerID IS UNIQUE") # Iterate through each row in the DataFrame and create nodes and relationships for _, row in df.iterrows(): product_id = row["ProductID"] supplier_id = row["SupplierID"] customer_id = row["CustomerID"] # Create or retrieve nodes for products, suppliers, and customers product_node = Node("Product", ProductID=product_id) supplier_node = Node("Supplier", SupplierID=supplier_id) customer_node = Node("Customer", CustomerID=customer_id) graph.merge(product_node, "Product", "ProductID") graph.merge(supplier_node, "Supplier", "SupplierID") graph.merge(customer_node, "Customer", "CustomerID") # Create relationships between product and supplier, and product and customer supplier_relationship = Relationship(product_node, "SUPPLIER", supplier_node) customer_relationship = Relationship(product_node, "CUSTOMER", customer_node) graph.create(supplier_relationship) graph.create(customer_relationship) print("Knowledge graph created successfully!") if __name__ == "__main__": data_file = "supply_chain_data.csv" create_knowledge_graph(data_file)
You can download the supply_chain_data.csv from here. Now create a python file (query_graph.py) with below content.
import os from langchain.chat_models import ChatVertexAI from langchain.chains import GraphCypherQAChain from langchain.chat_models import ChatVertexAI from langchain.chains import GraphCypherQAChain from langchain.graphs import Neo4jGraph os.environ['OPENAI_API_KEY'] = "replace with your open api key" graph = Neo4jGraph( url="bolt://localhost:7687", username="your_username", password="your_password" ) chain = GraphCypherQAChain.from_llm( ChatVertexAI(temperature=0), graph=graph, verbose=True ) print(chain.run("Which products are supplied by XYZ Tech (SupplierID: 102)?"))
The above code will print the results for the products supplied from xyz tech. In this blog we saw who we can overcome the issue with LLM using knowledge base and a small supply chain use case. In the next blog, I will elaborate the use case with different scenarios. Stay tuned!