KNOWLEDGE GRAPHS IN BUSINESS

Artificial intelligence relevance has been increasing over time. Most of these applications use data to draw insights and make some predictions that help in easy decision-making. They can be extremely useful if context can be incorporated into them. Without context, the applications may be constrained to a single domain and therefore its usage can be limited. This is where knowledge graphs are extremely useful. Knowledge Graphs can incorporate the context in the form of ontologies which can be used in company day-to-day applications. Also, knowledge graphs can be used to combine various datasets from different domains which are useful in providing the overall scenarios for companies to make decisions. This helps in automating many tasks that require inputs from different sectors of companies which in general are tedious and consume lots of resources. This is one of the reasons why many tech companies like Google, Amazon, and Microsoft are investing in Knowledge graphs research [2]. This article focuses on the use cases of knowledge graphs in business scenarios, describes the formulation of knowledge graph for balance sheets, and using it for answering questions in Pearson exam, description of the google knowledge graph.

Introduction

Knowledge graphs are collections of entities and relations. Relations and semantic data are used to add context to the knowledge graphs. Entities have a description that allows both humans and machines to process the data easily. Knowledge Graphs are built using the existing databases and combining them irrespective of the domain by adding context to it. The datasets can be both structured or unstructured. Structured has definite columns in the dataset while unstructured does not have specific columns which makes unstructured data difficult to process. “Context enables the knowledge graph to access closely connected objects and create a connection between them” [2]. Knowledge Graphs also provide the reasoning capabilities to draw new inferences and connections which are not possible in the case of traditional machine learning approaches. While the companies have started to incorporate the knowledge graphs in their business there is still ongoing research to connect the knowledge graphs with business requirements. Knowledge Graphs in the business domain are classified into two systems in [2]:

System 1- This is a system based on the traditional machine learning-based approach. This system is intuition-based and thinks fast. These systems are generally used to make predictions by using historical data. The normal black-box machine learning models come into this classification. For example Graph, neural networks can be used with knowledge graphs to make some predictions.

System 2- This is a logic-based system that thinks slow. This uses logic with symbolic AI approaches. They generally draw inferences and show the logic behind the prediction i.e they show the reasoning by which predictions are done.

Generally, symbolic AI systems work well when there is predefined logic, rules are clear and input can be easily transformed into symbols. They don’t perform well when dealing with messy inputs. One such example is computer vision applications. It is difficult to predict what object is in the images by comparing every pixel with the original pixel which the logic-based system suggests. This only works when you have the same image. One modification that can be done is taking the pictures in different angles and creating rules for all different angles to compare the input image with all those. But even that won’t include all the cases which makes it extremely difficult to use symbolic AI approaches in some cases. To overcome the deficiency people use complicated symbolic AI systems called expert systems which have hardcoded rules and knowledge to tackle complicated problems. But when generalized it will become extremely difficult to add all the rules.

Some of the use cases of the knowledge graph in business are discussed in the next section

Use cases of the knowledge graph in Business

Integrating various data silos from different domains and applying machine learning algorithms provides enormous insights for the business to function more efficiently. Knowledge graphs help to do that with ease. Some of the use cases of knowledge graphs in business [2] are

1) Risk and Value

2) Managing Compliance

3) Data lineage

4) Fraud Detection

5) Recommender system and conversational AI

6) Investing

All these use cases are described one by one in the next sections

Risk and Value

To assess the Risk and Value it is very important to have the overall picture [fig 1] across different domains and sectors of a company [2]. Knowledge graphs help to provide that overall picture because of their ability to integrate various forms of data. The property of storing semantic knowledge can further be useful in the case of integrating datasets from different domains and make inferences. Especially with the amount of data available these days and the size of the data, companies use big data technologies to store the data and knowledge graphs to integrate the data.

Fig 1: overall picture connecting different domains [2]

Let’s take an example of a bank, Many organizations take loans from the bank. we assume that a third-party company is facing financial hardship. If we can get the data of its suppliers and vendors of the company using a third party. We can combine that supplier’s and vendor’s data with the company’s internal data to understand which vendors and suppliers are going to get affected and classify the companies by adding risk value to it. The integration of data is done by mapping the entities and the schema. More deep analysis can also be performed from the knowledge graph created.

Managing Compliance

Knowledge graphs can be used by corporates to integrate company data with third-party compliance data sources to make decision-making and querying easier [2]. The regulations, policies, and contracts keep on changing over time and are difficult to track. This makes it even more difficult to see how it affects the company. The semantic metadata incorporated by the knowledge graph allows the companies to integrate various types of compliances. Besides, it also helps to automate compliance monitoring by applying some rules to it. Natural Language processing techniques can be used in combination with domain knowledge to process compliance rules and perform the compliance check.

Recently, the regulatory agencies have implemented many rules and measures to correct issues that may cause an economic crisis but most of the organizations did not comply with these rules. The organizations not only failed to meet the standards but also made no effort in meeting those standards. The reason that they quoted was that they had issues with Data Architecture, Management, and infrastructure. Knowledge graphs solve all these problems and make things semi-automated.

For example, A company seeking to expand into other countries and markets. To trade abroad and to localize its products and services to other countries, the company needs to take into account the legal barriers applicable to the target market. Dealing with legal and regulatory compliance data is a cumbersome task that needs to access huge volumes of digital compliance documents and integrate them with internal company data. Utilizing a Knowledge Graph allows this company to efficiently identify relevant regulations, link its data to those regulations, and define patterns for automatic compliance checks.

Semantic arts developed a knowledge graph-based system for a Legal and Regulatory Information provider. The firm invests millions of documents a day that are of court proceedings and changes to laws and regulations. Previously these documents were tagged by humans and required lots of resources. The accuracy of tagging also went down because of human errors. Semantic arts developed an ontology and knowledge graph while another firm called NetOwl worked on extracting the linguistic data from the documents. This helped the firm boost their work speed and reduce errors and now they are working on developing an inference model with a knowledge graph.

Data lineage

Data lineage is the lifecycle that contains where the data originated and how it moved over the period. It helps in identifying any errors or quality issues with the data. Every data is transformed into different ways in different use cases for the company. Knowledge graphs are useful to track these changes or transformations with context. A semantic data layer in knowledge graphs that contains the metadata and relationships of data at different stages enables users to use the data to which they have access at any stage [2]. A representation of dataflow is shown in Fig 3.

Fig 3: Tracking data in different forms [2]

In case a different team in the same company requires data created by some other team for the application. It is important for the team that receives the data about where the data is coming from and what are the transformations or operations applied to the data. This is done to ensure the quality of the data and check whether the previous operations applied to the data impacts the use case for which they are using the data. If in case they need the data before the transformations are applied they can retrieve that data tracking back using the semantic data.

Fraud Detection

Financial frauds offer poor customer experience and damage trust in the company. Some types of frauds are credit card frauds, insurance frauds, e-commerce frauds, etc. Frauds can be done by a single person or sometimes a group of persons [2]. It is more important to prevent fraud than to detect the fraud after it happened. When fraud is done in groups they use overlapping identity documents. In the case of traditional systems, we have millions of rows which make it very difficult to query and find patterns that have overlapped identities. Knowledge graphs with machine learning techniques can be used in this case to help companies identify fraudulent patterns by traversing through links and finding closed loops. A representation of potential points of threats is shown in fig 4.

Fig 4: Model for monitoring, categorizing, and predicting potential threats

In the case of credit card fraud, people apply for credit cards using false identities. Normally identifying the group of people who does this fraud is very difficult because of the large data and the queries that are required. Normally people recycle the forged identities to create many more of them. If a knowledge graph is built with the data and applying entity resolution can easily track these fraudsters. Sometimes when they use the same information again we can connect it with the original data and therefore a close loop is formed which can be used to detect the fraud.

Recommender System and Conversational AI

Knowledge graphs store semantic data which enables it to incorporate domain-specific knowledge [2]. This makes the knowledge graphs-based recommendation more personalized for a particular domain. In the case of conversations, the knowledge graph can extract the entity and match it with closely matched entities based on semantic similarities to provide recommendations. In case of conversation AI to provide an appropriate response the knowledge graph should filter out the unnecessary words and obtain the center point of the conversation based on the context. The response should also be based on the words that are in the same domain. Sometimes the graph obtains the user’s intent, preferences, and interest, and the knowledge graph looks at the semantic similarity rather than the structural similarity which makes the conversation more dynamic and accurate. This is because the same word might have different meanings in different domains. A typical description of knowledge graph-based recommendation for insurance based on people who knew each other is represented in fig 5.

In the case of large companies that have millions of customers and products, providing recommendations should take attributes like user’s interest, product category, etc into consideration. In the case of traditional approaches like rule-based approaches, it is a cumbersome task to derive recommendations out of all those rows and perform matching. Representing this in the form of a knowledge graph allows it to store the semantic knowledge and also allows to apply all machine learning applications that are compatible with graph data.

Fig 5: Model describing knowledge graph for context-driven recommendation [2]

Investing

The semantic capability of the knowledge graphs when used in combination with the Computational Linguistics or Natural Language Processing Techniques are used to check whether a company is exposed to some areas like AI, Robotics This can be done by processing the documents of patents, filings which provide a view of the company’s business, products, services, and Intellectual properties.

One such example is growing investment in ETFs which are exposed to particular technologies. Many of these ETFs use knowledge graphs developed by Yewno[8].

Some of them are:

  1. Nasdaq Yewno Future Mobility Index
  2. Nasdaq Yewno AI & BIgdaat Index
  3. Dw’s AI & BIgdata ETF
  4. iSTOXX Yewno Developed Markets Blockchain index

Yewno knowledge graph [8] is used by many of these ETF’s and it can draw inferences from various data points across different domains. They currently offer a portfolio of different data feeds which is licensed to many hedge funds and asset management firms.

Knowledge Graph creation from balance sheets

In this section, we go through the construction of a financial knowledge graph described in [1]. This knowledge graph integrates different balance sheets from different companies belonging to different sectors.

“A balance sheet is a summary of the financial situation of a given company at a given time which bears witness to specificities and particularities of this later one regarding its financial situation” [1]

It is the task of the domain experts like chartered accountants to first create a normalized balance sheet that fits any sector. It is the most important and tedious task in creating a knowledge graph. Normalizing is combining all the balance sheets from different sectors and companies into a single template as shown in Fig 6. This helps in creating a template that is common across all the companies and makes it easier to process the information and make predictions or forecasting. In addition, the whole information coming from Balance Sheets or Income Statements from many companies and sectors is kept in an easily processable format.

Fig 6: Demonstration of flow to obtain normalized balance sheets [1]

Fig 7: Our objective of achieving an automated reasoning system using knowledge graphs [1]

Instead of using human resources, all the balance sheets are combined using the knowledge graphs in this work Fig 7. We can use machine learning algorithms on the knowledge graphs because of which achieving the automatic reasoning system becomes easier. Knowledge graphs best suit this purpose because they can incorporate context in the form of ontology and can have various types of raw data. They can also help to achieve new knowledge by applying reason to achieve new knowledge. Knowledge graphs are also useful to interpret the reasons for the prediction, unlike the traditional black-box algorithms. The knowledge graph obtained is used for hacking the Pearson exam.

Hacking Pearson is a project described by Andrei Seig in [1] the knowledge graph built was used to answer questions from the Pearson Education Test series on the Balance sheet, Income statement, or cash flow statements.

Fig 8: steps involved in Hacking Pearson exam [1]

The steps involved are shown in Fig 8 :

  1. Multiple choice questions: The question contains a balance sheet and a multiple choice question with 4 different answers
  2. Knowledge Graph initialization: The knowledge graph is created from the balance sheet provided in the question by mapping entities and their attributes and entities with other related entities.

For example, equity is a hyper entity of liability so equity is linked with liability to make query processing easier

  1. Query Mapping: The query is mapped to appropriate entities. For example, a query regarding liabilities is connected with equity since equity is its hyper entity.
  2. Fuzzy Matching: This is done when the exact word is not found in the knowledge graph entities. The question asked is converted into words that are similar to in the balance sheets by using fuzzy matching here it is assets and liabilities. This way the request is translated into a query that needs to be executed to get the answer. The red color dots represent the liabilities while the violet color dots represent the assets. The words that are close by has the similar meaning,
  3. Answering: The answers are computed from the attribute values of the entity and are marked in the options available.

They used the Selenium web testing framework to answer the questions and they got 13 out of 15 questions right.

The knowledge graph obtained is shown in Fig 9

Fig 9: Knowledge graph obtained for Pearson test to a given query.[2]

The elements of the knowledge graph [1] obtained are:

Entities and Relationships:

The knowledge Graph contains the Entities, Relationships, and attributes. The balance sheet is modeled using the Entity-Relationship model.

Entities include Assets, Current Assets, Liabilities, etc

Attributes include cash and bank, trade receivables, trade payables, etc.

Types:

The knowledge graph has type inheritance into the knowledge model. For example, intangible assets and goodwill come under noncurrent assets while Trade and other receivables, cash, and cash equivalents come under current assets but both current and noncurrent assets come under Total assets.

Rules:

The accounting rules allow the transformation of rules into a knowledge schema. New conclusions are obtained by applying logic when the logical form is satisfied.

Inference:

A single query is transformed into different queries and the most well-suited answer is taken as the final option. They have logical and type-based inferences. Type-based inference automatically detects the data type while rule-based inference applies rules to the knowledge base to get new information.

In the next section, we discuss a knowledge graph created by google for its search to display the information.

Google Knowledge Graph

The Google Knowledge Graph was launched by Google in May 2012 that has facts about people, places, and things to connect search results to present them as useful answers rather than just links.

Google displays information in two ways [Fig 10] in a boxed section called Knowledge Graph Card appearing on the right-hand side of the search results page and in a carousel that appears at the top of the results page[2].

Google criteria to include a topic in the knowledge graph are [6]:

User behavior- Google tracks the amount of time a user is spending on the website. They connect the information about what the user is searching for, and what they are clicking on to create results. It is mainly the user queries that determine what information is displayed in the form of a knowledge graph.

Semantic Search — The relevant search results are produced by a semantic search that produces the relevant search results.

Indexing of Entities- Google Maps all the entities by using entity recognition and disambiguation to get factual information about the entity.

Knowledge graphs also help to define hierarchies which is the underlying idea of semantic web [4] i.e to connecting all the information on the internet and machine-readable. Some standards like RDF and schema, org are also developed for this [1].

An example of a knowledge graph obtained when the query is done for “Balance sheet” is described in the figure below.

Fig 10: Example of knowledge graph built from “balance sheet query”[1]

The creation of the knowledge graph for investment is described in [7] by Xu Liang. This article is focused on solving practical problems in the case of building enterprise knowledge graphs [3][5]. In this article, they discussed building knowledge graphs, Business and Technology challenges along deployment scenarios. The steps followed to build a knowledge graph in this article are :

  1. Schema Design- This contains what data values and fields are to be included in the design
  2. D2R Transformation- Atomic relation tables are converted into RDF format
  3. Information Extraction- Required data is extracted from the present data format by methods like extraction, candidate detection, and disambiguation
  4. Data fusion with instance matching- They used normal matching by checking whether the name and company name matched in both the knowledge graphs
  5. Storage Design and Query optimization- They used MongoDB to install all the bases.

It is highly recommended to look at [7] it for more in-depth knowledge of enterprise knowledge graphs for investment.

Conclusion

In summary, knowledge graphs are a combination of Database and Artificial intelligence with a graph-like structure. The domain knowledge can be stored in the knowledge graph and machine learning techniques can be used to query this information. The data is generally scattered across different domains and systems. This creates a challenge for organizations to utilize the data in the best possible way. Knowledge graphs are the best available solution to integrate these data across different domains because they incorporate semantic knowledge. This can be used to create a real-world model of the knowledge domain. The key to integrating the data is to provide knowledge and representation in machine-readable representation. The value that knowledge graphs bring to the table outperforms the traditional approaches that require human experts to draw insights when data belong to different domains. Many companies realized the value that knowledge graphs bring to the table started investing in the research.

References

[1] Adrien seig “ Hacking financial s with knowledge Graphs and ML” Medium,2019.

[2]. Ali Khali, sui obeoi, “Knowledge Graphs for Financial Services” Deloitte report, 2020.

[3] Bellomarini, Luigi, Daniele Fakhoury, Georg Gottlob, and Emanuel Sallinger. “Knowledge graphs and enterprise AI: the promise of an enabling technology.” In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 26–37. IEEE, 2019.

[4] X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang, “Knowledge vault: A web-scale approach to probabilistic knowledge fusion,” in SIGKDD. ACM, 2014, pp. 601–610.

[5] Galkin, Mikhail, Sören Auer, Maria-Esther Vidal, and Simon Scerri. “Enterprise Knowledge Graphs: A Semantic Approach for Knowledge Management in the Next Generation of Enterprise Information Systems.” In ICEIS (2), pp. 88–98. 2017.

[6] Shane Barker, Everything You need to know about google knowledge graph, Hackernoon, 2018.

[7] Xu Liang, A practical guide to build enterprise knowledge graph for investment analysis,, towards data science.

[8] Gramatica, R., & Pickering, R. (2017). Start-up story: Yewno: an AI-driven path to a knowledge-based future. Insights, 30(2), 107–111. DOI: http://doi.org/10.1629/uksg.369