AdamTechLouis's talk: Deep Learning with Knowledge Graphs
Recently, I delivered a presentation at Connected Data London to introduce our approach developed at Octavian. The focus of the talk was on utilizing neural networks to execute tasks across knowledge graphs.
Here’s the recording of the talk, from Connected Data London:
Upon reading this post, I aim to condense its majority slides and offer links to the papers that have most significantly influenced our understanding.
Explore further into our vision on a novel method for constructing the next generation of database query engine. For more details, refer to our recent article see our recent article.
What is a graph?

Two functionally identical graph models
Applying either property graphs or attributed graphs as our foundation, we leverage nodes, referred to as vertices, and relations, termed edges, each possess their own attributes. Additionally, our neural network maintains an external global state. As illustrated on the slide, two distinct representations of this model are provided—one from Neo4j and another from DeepMind, noting that both are nearly indistinguishable.
Why are we interested in graphs?

图形具有悠久的历史,在这一历史长河中从18世纪莱昂哈德·欧拉(Leonhard Euler)的开创性工作开始不断演进和发展。到如今为止已经发展出了涵盖当今多种多样的类型。在计算机科学领域中,图形应用广泛:包括但不限于图形数据库、知识图谱、语义图谱、计算图谱、社会网络、交通网络以及众多其他类型。
Graphs have played a central role in the emergence of Google (their first breakthrough was leveraging PageRank for search capabilities, while their Knowledge Graph has become increasingly vital) and Facebook. Spanning from political analysis to budget-friendly international air travel, graph algorithms have significantly influenced various aspects of our global environment.
What is Deep Learning?

Deep learning serves as a specialized field within machine learning, focused on training multi-layer ("deep") neural networks by employing gradient descent techniques. Among the fundamental components of neural networks, the dense (or fully connected) network stands out as a crucial element.

A deep neural network using dense layers
Through deep learning, we have enabled computers to address a broad spectrum of tasks that were once considered difficult, including playing Go and achieving superhuman performance in image recognition.

MacNets, along with other exceptional examples in the realm of superhuman image analysis neural networks.
Machine Learning
Generally speaking, machine learning is a simple concept. We build models to understand how things operate, such as the equation y = mx + c.
house_price = m • number_of_bedrooms + c
AI助手

We train or optimize the model parameters (m and c in this example) by utilizing the available data. Once trained, we obtain learned parameter values and can employ this model to make predictions.
Parameters often prove useful on their own (for instance, when employing a neural network to train a word embedding like Word2Vec).

Deep Learning on Graphs
At Octavian, a pivotal inquiry emerged: what would we desire for machine learning on graphs as viewed from 20,000ft.
To help answer this question, we compared traditional forms of deep learning to the world of graph learning:

Comparing graph machine learning with other setups
These specific tasks have been delineated as requiring dedicated graph-based approaches: regression analysis, classification techniques, and embedding methods.
Asides: there are also graph-related tasks, such as link prediction, that aren't straightforwardly incorporated into the three main categories mentioned above.
We noticed that many existing machine learning techniques designed for graphs exhibit significant limitations.
Many graph-based methods are ineffective when dealing with unseen graphs, primarily because they necessitate the initial training of a graph embedding. Additionally, several approaches involve converting the graph into a tabular format while discarding its inherent structural information, such as methods that rely on random walk sampling to capture graph properties.
Existing Work

Performance of DL models on graph problems is not superhuman
Significant portions of the existing body of research in graph-based deep learning emphasize distinctive research domains.
- Making predictions about molecules, including proteins and their properties and reactions.
- Node categorization in large-scale, unchanged graphs.
Graphs, Neural Networks and Structural Priors
There is widespread consensus that Deep Learning has proven effective with unstructured data — specifically in areas such as multimedia data, textual information, and self-learning systems.
But our superhuman neural networks are actually dealing with very specifically structured information, whereas the neural network architectures are designed to align with the structure of the information they process effectively.

Data structures that work with neural networks
Images possess a structured nature, characterized by a rigid framework in either two or three dimensions. In such arrangements, pixels situated adjacently exhibit a stronger correlation compared to those separated by distance. Similarly, sequences, such as those observed over temporal dimensions, follow an ordered layout where proximity among elements carries greater emphasis on their interconnections than does separation.

Dense layers are applicable in Go where spatially distant regions can exert equal influence on each other.
When working with images and sequences, fully connected layers (e.g., where each input is fully interconnected to all outputs) are not effective. Neural network layers that incorporate and leverage the structure of the data source yield optimal results.
For recurrent neural networks, sequences are typically employed. For images, convolutional neural networks are usually applied.

Convolutional Neural Networks organize hierarchically such that the significance of nearby pixels holds greater importance compared to pixels that were far apart.
In a convolutional neural network, each hidden layer pixel relies upon a set of adjacent input pixels (this contrasts with a fully connected layer, where each hidden unit interacts with all input units).

Neither dense nor convolutional networks are suitable for a transit graph
与图像中相邻像素或序列中相邻项的关系不同的是,在图中节点之间的关系并非固定。要使深度学习成功应用于图结构,并非只需将图转换为矩阵形式作为现有神经网络模型输入即可完成任务。因此我们需要开发一套适合处理图数据的神经网络模型架构。

This research presents an original approach that challenges my own stance — see my work.
Others have also explored this topic. There are other notable researchers at DeepMind, Google Brain, MIT, and the University of Edinburgh who have contributed to the field with a comparable stance. The paper on Relational Inductive Biases offers valuable insights for readers exploring these areas.
The paper proposes an algorithm designed to transmit information across graphs. It posits that employing neural networks to acquire six specific functions for aggregations and transformations within the graph's structure enables them to attain state-of-the-art performance on various graph-related tasks.

One algorithm to rule them all?
By diffusing information between nodes with a focus on the graph edges, the authors believe they are preserving the relational inductive biases inherent to the graph structure.
The architecture of MacGraph, which is under development at Octavian, shares similarities with the relational inductive biases approach. It is characterized by a global state that exists outside the graph and facilitates information propagation between its nodes.
Octavian’s experimental results
Before I could share my findings with you at Octavian, it is necessary for me to refer to the specific task employed in testing our neural graph architecture.

our synthetic benchmark dataset
You are invited to explore CLEVR-Graph further, which is accessible at this link. It is a synthetically generated dataset comprising 10,000 fictional transit networks inspired by the structure of the London Underground. Each randomly generated transit network graph features one specific question paired with its correct answer.

A collection of typical questions belonging to the CLEVR-Ggraph question bank, along with a corresponding example graph.
The central aspect of this task involves utilizing test graphs that the network has not previously encountered. Consequently, it cannot merely rely on memorization of question answers but must instead learn how to derive them from new graphs.
At the time of writing, MacGraph is delivering nearly flawless outcomes in tasks involving six distinct skill sets; setting a new benchmark for multi-skill performance.

MacGraph’s latest results on CLEVR-graph
Clearly stands out as one of the most impressive capabilities, MacGraph's ability to provide answers to questions like "How many stations are between{station} and {station}" is particularly remarkable. Since solving this query requires determining the shortest path between the stations (Dijkstra’s algorithm), it's not surprising that such a sophisticated approach is employed. This method, based on Dijkstra’s algorithm, is both complex and highly suited for graph-related problems.
How does MacGraph work?
Simply propagating information between nodes within a graph using transformation and aggregation functions is insufficient for addressing natural language questions about graphs requiring natural language answers. Transforming input queries into corresponding graph states is essential, as well as extracting relevant information from these states to produce accurate responses.

Our approach for converting across natural language and graph state leverages attention mechanisms. To learn more, refer to how this works here.

an alternative to dense layers
Attention cells are essentially different from dense layers. Attention cells operate upon sets of information, extracting individual elements based on their content or position.
These properties enable attention cells to perform selection tasks on the lists of nodes and edges that constitute a graph.
In MacGraph, the write attention mechanism is utilized to input a signal to the nodes in the graph based on the query and their properties. This signal should then prime the graph's message-passing process to interact with the most relevant nodes concerning the query.

Prime the graph with the query using write attention
Once data has been diffused through the nodes of the graph, the attention mechanism is employed to extract the answer from the graph.

Read from the graph using attention
Integrating write and read attention with message passing across nodes in the graph based on its structural properties, we obtain the core of MacGraph.

Mac Graph architecture
Conclusion
There is convincing evidence that obtaining superhuman performance on graph-based tasks depends on employing graph-specific neural network architectures.
The following experiments demonstrate with MacGraph that neural networks are capable of learning to extract node properties from graphs when presented with questions, and also demonstrate that neural networks are capable of performing graph algorithms, such as finding the shortest path, on graphs they have never processed.

