IJCAI 2021 Tutorial on Neural Machine Reasoning

Part 1: Theory (180 mins)

This part is further divided into six sub-topics: concepts, dual process theories, neural memories, reasoning over sets, reasoning over graphs, and neuro-symbolic integration.

Lecture 1: Concepts in neural machine reasoning (30 mins)

In this part, we will review the key concepts of learning and reasoning, and how these two intelligence faculties interact. In particular, we will start with the formal framework of learning to reason, where the task is to determine if the data entails a conclusion [23]. We then show how question answering and most of supervised machine learning tasks can be reformulated under this framework. We then explain how modern neural networks can serve as an underlying mechanism for learning and reasoning in this framework. One of the key components is attention, which is found in most recent works. We also discuss how reasoning can also be considered as an instance of conditional computation where the computational graphs are dynamically co-determined by the query in conjunction with available data. An extreme form of this is program synthesis, where a predicate-chaining program is automatically generated from the query in the context of the data, and the execution of the program will deliver the answer.

Lecture 2: Dual system of reasoning (30 mins)

We will briefly review a well-established framework in human reasoning known as dual-process theories [11], otherwise known informally as fast and slow thinking [22]. This topic is of great importance in recent years within the AI community, e.g., as discussed in the AAAI panel 2019 with attendance of Nobel laureate Kahneman and Turing Award winner Joshua Bengio. In particular, the fast thinking process, also known as System 1, is typically parallel, reactive and domain-specific, and it is equivalent to most of current deep learning models. On the other hand, the slow thinking process, also known as System 2, is sequential, deliberative and domain-agnostic. We will explain how System 2 plays the role in core reasoning forms, including compositional, relational, temporal and causal reasoning. Finally, we will discuss how System 1 and System 2 interact.

Lecture 3: Neural memories (30 mins)

For this part, we cover one of the most essential aspect that underlies the reasoning process: the memory [12] – a mental faculty that allows us to remember, retrieve, manipulate information and simulate unseen scenarios. We will cover three distinct concepts that are essential for high-order reasoning: memory of entities, memory of relations, and memory of programs. Neural memory for entities has been widely studied, and this goes under the umbrella of Memory-Augmented Neural Networks [18, 47, 49]. Less studied, but extremely essential to high-order reasoning, is memory of relations which allows us to explicitly store, retrieve and manipulate known and freshly formed relations in the long predicate-chaining process [24]. We will describe how relation memory is implemented, using either tensors [28, 42, 45] or graphs [37]. In these models, attention is the common operator to leverage the relationship modeling. Finally, we will explain how a powerful recent concept known as program memory will be essential for conditional computation and automatic neural program synthesis – the underlying computational processes behind reasoning. Two approaches to program memory are covered: the modular neural networks [2], and the stored-program memory [27].

Lecture 4: Reasoning over unstructured sets (30 mins)

Much of recent works involved neural reasoning can be formulated as reasoning over unstructured sets. In these settings we have a set of query words, and a set of items in the knowledge base (which can be words in text or extracted visual features in image). The task of reasoning is to construct a sequential process in that the items in the two sets are iteratively attended to and interact in a compositional way. This could be an iterative conditioning process [39], or a recurrent model of composition and attention [17, 21].

Lecture 5: Reasoning over graphs (30 mins)

Relational structures have been demonstrated to be crucial for reasoning [15, 50], and these are conveniently and expressively represented as graphs [5]. This leads to graph reasoning, which occurs when reasoning is structured as or supported by operations on graphs. In this part, we will explain how graph neural networks serve as an underlying backbone for relational reasoning, both in space and time [6]. We will cover basic concepts including node embedding, relational networks [43] and message passing; and advanced topics such as query-conditioned graph construction [29] and graph dynamics [38].

Lecture 6: Hybrid neuro-symbolic reasoning (30 mins)

Theory of neural reasoning cannot be complete without a link to the symbolic approach [14]. This is because the symbolic approach lends itself easily to high-level logical inference, which is important in many NLP and mathematical reasoning problems. In addition, symbolic approach seems to be more natural to deal with important issues such as systematic generalization in which pure neural networks are not very effective yet [3, 13]. In this sub-topic of hybrid neuro-symbolic reasoning, we will cover recent works including neural module networks [19, 53] and the integration of logical models and neural networks [14].

Part 2: Application (180 mins)

A powerful way to demonstrate capability of learning to reason is answering natural questions about data that are unstructured (such as text) or come directly out of sensing devices (such as image and video). In recent years, textual QA or visual QA have been investigated intensively. However, much of existing work exploits pattern matching capability of generic techniques such as RNN and attention to exploit statistics in data rather than constructing a generic, explicit reasoning machines [21]. The current trend set by these works pushes the sophistication of the reasoning processes on finding the correlation between data pattern and the query. Pattern recognition and reasoning are tightly entangled, and reasoning tends to be specific to visual/textual patterns as a results. However, generalization to novel patterns will suffer as a result – the argument has been put forward 30 years ago [13], and has not been proven wrong [7]. This part is organized into four topics: machine reading comprehension, dual architectures in visual question answering, specific aspects of visual question answering, and reasoning over combinatorial domains.

Lecture 7: Neural machine reading comprehension (Text-based QA) (45 mins)

This section will review the most important works in the past 5 years, starting from RNN-based methods with attention mechanisms [46] to modern transformer-based architectures [51, 55]. We will also discuss the current growing trend of leveraging pre-trained language models such as BERT [10], GPTs [41] in machine reading comprehension. Task-agnostic models can be pre-trained on a large corpus returning more holistic concept representations of words with contextual information. Those representations can then be transferred to various down-stream tasks including question answering with a fine-tuning process [56]. By doing that, the field is moving from conventional approaches of having dual processes, namely feature encoding and reasoning, into having an integrated architecture of a single module to carry out both processes at once.

Lecture 8:Visual question answering (45 mins)

VQA presents one of the most exciting venues for neural networks because it sits at the intersection between four distinct domains: machine learning, reasoning, computer vision and NLP. The reasoning problem in VQA can be organized according to the dual process framework [31], that is, there are two interacting components: System 1 (modality-specific pattern extraction), and System 2 (generic reasoning process with symbol binding). System 1 for visual data can be developed on top of 2D/3D CNN, but needs special consideration to support compositional/relational reasoning [26]. We then advance to object-centric representation, that is, a scene is represented as a set of interacting objects in space-time instead of an array of pixels [9]. Recent improvement in performance in object detection (e.g., Faster R-CNN, YOLO, Deformable DETR) makes this approach possible. Not only object-centric representation makes relational reasoning more straightforward, it also enables a seamless data input for System 2, making the joint dual system closer to a neuro-symbolic hybrid. System 2, in addition to being generic, would also require the ability of symbol binding [29], dynamic construction of visual graphs [20, 34], and chaining of predicates [21, 29].

Lecture 9: Video question answering (45 mins)

Video QA opens up new challenges with multimodality– grounded temporal and causal reasoning. Activities and actions are much more lively in videos than static images. Temporal structures of activity such as hierarchy [36], continuity and non-local referencing require complex reasoning [30]. Furthermore, temporal reasoning is distinctive because temporal structure (not like spatial) has an universal direction (only going forward). This opens playground for causal, counterfactual hypothetical reasoning which are extremely attractive yet less explored [52]. Cross-modality reasoning: Visual information and linguistic query naturally lie in different domains; therefore, cross-domain interaction is crucial in this problem. How cross-domain objects interact (cross-reference, binding, combination) is key for human reasoning and also for machine reasoning to succeed [29]. Reasoning on long-form videos containing interleaving multimodal data (visual, caption, audio) appeals the even more advanced multi-modality interaction capability. This provides playground for complex reasoning systems where information pieces from all modalities can be picked, combined, and reasoned on [32].

Lecture 10: Combinatorics reasoning (45 mins)

Combinatorial problems represent an important class of reason- ing, which are usually difficult to solve [16, 35, 44]. We will present how neural networks such as pointer networks [48] and graph neural networks [6], coupled with reinforcement learning, are effective in learning to solve these problems [57]. Examples include inference in probabilistic graphical models [4, 25, 54], finding shortest paths [18], traveling salesman problems [28, 40], graph coloring [33] and substructures counting [1, 8].

Panel Discussion

References

[1]Ralph Abboud, Ismail Ilkan Ceylan, and Thomas Lukasiewicz. Learning to reason: Leveraging neural networks for approximate dnf counting. AAAI, 2020.
[2]Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In CVPR, pages 39–48, 2016.
[3]Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, and Aaron Courville. Systematic generalization: what is required and can it be learned? ICLR, 2019.
[4]Yunsheng Bai, Derek Xu, Alex Wang, Ken Gu, Xueqing Wu, Agustin Marinovic, Christopher Ro, Yizhou Sun, and Wei Wang. Fast detection of maximum common subgraph via deep q-learning. arXiv preprint arXiv:2002.03129, 2020.
[5]Pablo Barcelo´, Egor V Kostylev, Mikael Monet, Jorge Pe´rez, Juan Reutter, and Juan Pablo Silva. The log- ical expressiveness of graph neural networks. In International Conference on Learning Representations, 2020.
[6]Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
[7]Cameron Buckner and James Garson. Connectionism. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, fall 2019 edition, 2019.
[8]Zhengdao Chen, Lei Chen, Soledad Villar, and Joan Bruna. Can graph neural networks count substruc- tures? arXiv preprint arXiv:2002.04025, 2020.
[9]Long Hoang Dang, Thao Minh Le, Vuong Le, and Truyen Tran. Object-centric relational reasoning for video question answering, 2020.
[10]Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1), 2019.
[11]Jonathan St BT Evans and Keith E Stanovich. Dual-process theories of higher cognition: Advancing the debate. Perspectives on psychological science, 8(3):223–241, 2013.
[12]Aidan Feeney and Valerie A Thompson. Reasoning as memory. Psychology Press, 2014.
[13]Jerry A Fodor and Zenon W Pylyshyn. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2):3–71, 1988.
[14]Artur d’Avila Garcez, Marco Gori, Luis C Lamb, Luciano Serafini, Michael Spranger, and Son N Tran. Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning. arXiv preprint arXiv:1905.06088, 2019.
[15]Marta Garnelo and Murray Shanahan. Reconciling deep learning with symbolic artificial intelligence: representing objects and relations. Current Opinion in Behavioral Sciences, 29:17–23, 2019.
[16]Maxime Gasse, Didier Che´telat, Nicola Ferroni, Laurent Charlin, and Andrea Lodi. Exact combinatorial optimization with graph convolutional neural networks. NeurIPS, 2019.
[17]Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Scho¨lkopf. Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893, 2019.
[18]Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska- Barwin´ska, Sergio Go´mez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, 2016.
[19]Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Kate Saenko. Learning to reason: End-to-end module networks for visual question answering. In ICCV, pages 804–813. IEEE, 2017.
[20]Ronghang Hu, Anna Rohrbach, Trevor Darrell, and Kate Saenko. Language-conditioned graph networks for relational reasoning. ICCV, 2019.
[21]Drew A Hudson and Christopher D Manning. Compositional attention networks for machine reasoning. ICLR, 2018.
[22]Daniel Kahneman. Thinking, fast and slow. Farrar, Straus and Giroux New York, 2011.
[23]Roni Khardon and Dan Roth. Learning to reason. Journal of the ACM (JACM), 44(5):697–725, 1997.
[24]Alex Konkel and Neal J Cohen. Relational memory and the hippocampus: representations and methods. Frontiers in neuroscience, 3:23, 2009.
[25]Volodymyr Kuleshov and Stefano Ermon. Neural variational inference and learning in undirected graphical models. In Advances in Neural Information Processing Systems, pages 6734–6743, 2017.
[26]Hung Le, , Truyen Tran, and Svetha Venkatesh. Learning to remember more with less memorization. In ICLR’19, 2019.
[27]Hung Le, Truyen Tran, and Svetha Venkatesh. Neural stored-program memory. In ICLR, 2020.
[28]Hung Le, Truyen Tran, and Svetha Venkatesh. Self-attentive associative memory. In ICML, 2020.
[29]Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran. Dynamic language binding in relational visual reasoning. In IJCAI, 2020.
[30]Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran. Hierarchical conditional relation networks for video question answering. In CVPR, 2020.
[31]Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran. Neural reasoning, fast and slow, for video question answering. In IJCNN, 2020.
[32]Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran. Hierarchical conditional relation networks for multimodal video question answering. In submission, 2021.
[33]Henrique Lemos, Marcelo Prates, Pedro Avelar, and Luis Lamb. Graph colouring meets deep learning: Effective graph neural network models for combinatorial problems. arXiv preprint arXiv:1903.04598, 2019.
[34]Yongfei Liu, Bo Wan, Xiaodan Zhu, and Xuming He. Learning cross-modal context graph for visual grounding. AAAI, 2020.
[35]Qiang Ma, Suwen Ge, Danyang He, Darshan Thaker, and Iddo Drori. Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. arXiv preprint arXiv:1911.04936, 2019.
[36]Romero Morais, Vuong Le, Truyen Tran, and Svetha Venkatesh. Learning to abstract and predict human actions. In British Machine Vision Conference (BMVC), 2020.
[37]Rasmus Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks. In NeurIPS, pages 3368–3378, 2018.
[38]Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, and Charles E Leisersen. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. AAAI, 2020.
[39]Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. In AAAI, 2018.
[40]Marcelo Prates, Pedro HC Avelar, Henrique Lemos, Luis C Lamb, and Moshe Y Vardi. Learning to solve np-complete problems: A graph neural network for decision tsp. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4731–4738, 2019.
[41]Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understand- ing by generative pre-training, 2018.
[42]Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy Lillicrap. Relational recurrent neural networks. NIPS, 2018.
[43]Adam Santoro, David Raposo, David G Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Tim Lillicrap. A simple neural network module for relational reasoning. In NIPS, pages 4974–4983, 2017.
[44]Ryoma Sato, Makoto Yamada, and Hisashi Kashima. Approximation ratios of graph neural networks for combinatorial problems. arXiv preprint arXiv:1905.10261, 2019.
[45]Imanol Schlag and Ju¨ rgen Schmidhuber. Learning to reason with third order tensor products. In Advances in Neural Information Processing Systems, pages 9981–9993, 2018.
[46]Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention flow for machine comprehension. ICLR, 2017.
[47]Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. End-to-end memory networks. NIPS, 2015.
[48]Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, pages 2692–2700, 2015.
[49]Caiming Xiong, Stephen Merity, and Richard Socher. Dynamic memory networks for visual and textual question answering. In International Conference on Machine Learning, pages 2397–2406, 2016.
[50]Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S Du, Ken-ichi Kawarabayashi, and Stefanie Jegelka. What can neural networks reason about? arXiv preprint arXiv:1905.13211, 2019.
[51]Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5753–5763, 2019.
[52]Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, and Joshua B Tenenbaum. Clevrer: Collision events for video representation and reasoning. arXiv preprint arXiv:1910.01442, 2019.
[53]Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Josh Tenenbaum. Neural- symbolic VQA: Disentangling reasoning from vision and language understanding. In NeurIPS, pages 1039–1050, 2018.
[54]KiJung Yoon, Renjie Liao, Yuwen Xiong, Lisa Zhang, Ethan Fetaya, Raquel Urtasun, Richard Zemel, and Xaq Pitkow. Inference in probabilistic graphical models by graph neural networks. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 868–875. IEEE, 2019.
[55]Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. QANet: Combining Local Convolution with Global Self-Attention for Reading Compre- hension. ICLR, 2018.
[56]Chengchang Zeng, Shaobo Li, Qin Li, Jie Hu, and Jianjun Hu. A survey on machine reading com- prehension: Tasks, evaluation metrics, and benchmark datasets. arXiv preprint arXiv:2006.11880, 2020.
[57]Yuyu Zhang, Xinshi Chen, Yuan Yang, Arun Ramamurthy, Bo Li, Yuan Qi, and Le Song. Can graph neural networks help logic reasoning? arXiv preprint arXiv:1906.02111, 2019.