DrDubWiki | 2023 Neural Networks Architecture

Given the success of the graduate seminar I taught last year in Argentina on Neural Networks Architectures, I'm offering it again, this time through the Towards AI discord server.

Format

There will be 9 introductory lectures of about 30' to 40' of presentation followed by discussion with the attendees. After the 9 lectures, we will turn to paper presentations from attendees.

The lectures take place on Tuesdays 8 pm California time. That is 11 pm New York time, 9:30 am India time. The recording of the lectures is available as a YouTube playlist and linked below for each lecture.

Target audience

Deep learning researchers and practitioners interested in improving their understanding of neural network architectures across the board. This is not an introductory course nor it deals with implementation details. The architectures we will see will only be discussed to the level of architecture, not to the level necessary to implement them (but the papers discussed contain the additional details).

Registration

Just head to https://ws.towardsai.net/discord and join the discord server. No need to register, come to the voice channel in the server, Tuesdays 8 pm, California time.

Program

Zip file containing (most) of the papers.

2023-01-10 Introduction
- Neural Networks review. Deep Learning. Types of neurons. Feed-forward networks. Backpropagation. DL frameworks. Theano. AutoDiff. TensorFlow. Torch. JAX.
  - Howard B. Demuth et al. (2014). Neural Network Design. 2nd. USA: Martin Hagan. ISBN: 9780971732117
  - Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016). Deep Learning. Cambridge, MA: MIT Press
  - Xavier Glorot and Yoshua Bengio (2010). “Understanding the difficulty of training deep feedforward neural networks.” In: AISTATS. ed. by Yee Whye Teh and D. Mike Titterington. Vol. 9. JMLR Proceedings. JMLR.org, pp. 249–256
  - Davan Harrison (2021). “A Brief Introduction to Automatic Differentiation for Machine Learning”. In: CoRR abs/2110.06209. arXiv: 2110.06209
  - James Bergstra et al. (2010). “Theano: a CPU and GPU math expression compiler”. In: Proceedings of the Python for scientific computing conference (SciPy). Vol. 4. 3. Austin, TX, pp. 1–7
  - Martín Abadi et al. (2016). “TensorFlow: a system for Large-Scale machine learning”. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp. 265–283
  - Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet (2011). “Torch7: A matlab-like environment for machine learning”. In: BigLearn, NIPS workshop
  - Roy Frostig, Matthew James Johnson, and Chris Leary (2018). “Compiling machine learning programs via high-level tracing”. In: Systems for Machine Learning 4.9
2023-01-17 Popular Network Architectures
- Multi-task learning. Siamese Networks. Generative Adversarial Networks (GAN). Style Transfer. Disentangled Representation Learning.
  - Rich Caruana (1997). “Multitask learning”. In: Machine learning 28.1, pp. 41–75
  - Ting Gong et al. (Sept. 2019). “A Comparison of Loss Weighting Strategies for Multi task Learning in Deep Neural Networks”. In: IEEE Access PP, pp. 1–1. DOI : 10.1109/ACCESS.2019.2943604
  - Jane Bromley et al. (1993). “Signature verification using a "siamese" time delay neural network”. In: Advances in neural information processing systems 6
  - Ian Goodfellow, Jean Pouget-Abadie, et al. (2014). “Generative Adversarial Nets”. In: Advances in Neural Information Processing Systems. Ed. by Z. Ghahramani et al. Vol. 27. Curran Associates, Inc.
  - Xi Chen et al. (2016). “Infogan: Interpretable representation learning by information maximizing generative adversarial nets”. In: Advances in neural information processing systems 29
  - Leon A Gatys, Alexander S Ecker, and Matthias Bethge (2016). “Image style transfer using convolutional neural networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2414–2423
2023-01-24 Structure Learning Networks
- Hopfield Networks. RBMs. SOMs. Autoencoders. VAE.
  - John J. Hopfield (Apr. 1982). “Neural Networks and Physical Systems with Emergent Collective Computational Abilities”. In: Proceedings of the National Academy of Science 79.8, pp. 2554–2558. DOI: 10.1073/pnas.79.8.2554
  - Paul Smolensky (1986). “Information processing in dynamical systems: Foundations of harmony theory”. In: Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press, pp. 194–281
  - Teuvo Kohonen (1982). “Self-Organized Formation of Topologically Correct Feature Maps”. In: Biological Cybernetics 43, pp. 59–69
  - G E Hinton and R R Salakhutdinov (July 2006). “Reducing the dimensionality of data with neural networks”. In: Science 313.5786, pp. 504–507. DOI: 10 . 1126 / science.1127647
  - Diederik P. Kingma and Max Welling (Nov. 2019). “An Introduction to Variational Autoencoders”. In: Found. Trends Mach. Learn. 12.4, pp. 307–392. ISSN: 1935-8237. DOI : 10.1561/2200000056
  - Vineet John et al. (2019). “Disentangled Representation Learning for Non-Parallel Text Style Transfer”. In: ACL, pp. 424–434. DOI: 10.18653/v1/P19-1041
2023-01-31 Convolution Networks
- CNNs. DL image processing. SuperVision (AlexNet). Inception/GoogleLeNet. ResNet. R-CNN. YOLO. RetinaNet. FCNs. U-Net.
  - Li Liu et al. (Feb. 2020). “Deep Learning for Generic Object Detection: A Survey”. In: International Journal of Computer Vision 128.2, pp. 261–318. ISSN : 1573-1405. DOI : 10.1007/s11263-019-01247-4
  - Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton (2017). “Imagenet classification with deep convolutional neural networks”. In: Communications of the ACM 60.6, pp. 84–90
  - Christian Szegedy et al. (2015). “Going deeper with convolutions”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9
  - Kaiming He et al. (2016). “Deep residual learning for image recognition”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
  - Ross Girshick et al. (2014). “Rich feature hierarchies for accurate object detection and semantic segmentation”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587
  - Joseph Redmon et al. (2016). “You only look once: Unified, real-time object detection”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788
  - Tsung-Yi Lin et al. (2020). “Focal Loss for Dense Object Detection”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 42.2, pp. 318–327. DOI : 10.1109/TPAMI.2018.2858826
  - Jonathan Long, Evan Shelhamer, and Trevor Darrell (2015). “Fully convolutional networks for semantic segmentation”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440
  - Olaf Ronneberger, Philipp Fischer, and Thomas Brox (2015). “U-net: Convolutional networks for biomedical image segmentation”. In: International Conference on Medical image computing and computer-assisted intervention. Springer, pp. 234–241
2023-02-07 Recurrent Networks
- RNNs. Training by unrolling. Internal memory, access and update LSTMs. GRUs. Encoder/Decoder. Attention in Encoder/Decoder systems.
  - Zachary Chase Lipton, John Berkowitz, and Charles Elkan (2015). “A Critical Review of Recurrent Neural Networks for Sequence Learning”. In: CoRR abs/1506.00019. arXiv: 1506.00019
  - Sepp Hochreiter and Jürgen Schmidhuber (1997). “Long short-term memory”. In: Neural computation 9.8, pp. 1735–1780
  - Yong Yu et al. (2019). “A review of recurrent neural networks: LSTM cells and network architectures”. In: Neural computation 31.7, pp. 1235–1270
  - Kyunghyun Cho et al. (Oct. 2014). “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation”. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, pp. 1724–1734. DOI : 10.3115/v1/D14-1179
  - Ilya Sutskever, Oriol Vinyals, and Quoc V. Le (2014). “Sequence to Sequence Learning with Neural Networks”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14. Montreal, Canada: MIT Press, pp. 3104–3112
  - Alex Graves, Greg Wayne, and Ivo Danihelka (2014). Neural Turing Machines. arxiv:1410.5401
2023-02-14 Transformer Networks
- Attention is all you need. Transformers. BERT. GPT. T5. Pretraining. Transfer learning. Zero-shot and few shot learning.
  - Tianyang Lin et al. (2022). “A Survey of Transformers”. In: In: AI Open 3, pp. 111–132. ISSN: 2666-6510. DOI: https://doi.org/10.1016/j.aiopen.2022.10.001
  - Ashish Vaswani et al. (2017). “Attention is all you need”. In: Advances in neural information processing systems, pp. 5998–6008
  - Jacob Devlin et al. (June 2019). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, pp. 4171–4186. DOI : 10.18653/v1/N19-1423
  - Alec Radford et al. (2018). Improving language understanding by generative pretraining. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
  - Colin Raffel et al. (2020). “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”. In: J. Mach. Learn. Res. 21, 140:1–140:67
2023-02-21 Graph Processing Networks
- Graph processing architectures. Local vs. global. GNNs. DGCNNs. GCN. MPNN.
  - Benjamin Sanchez-Lengeling et al. (Aug. 2021). “A Gentle Introduction to Graph Neural Networks”. In: Distill 6.8. DOI : 10.23915/distill.00033
  - Franco Scarselli et al. (2009). “The graph neural network model”. In: IEEE transactions on neural networks 20.1, pp. 61–80
  - Muhan Zhang et al. (2018). “An End-to-End Deep Learning Architecture for Graph Classification”. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI’18/IAAI’18/EAAI’18. New Orleans, Louisiana, USA: AAAI Press. ISBN: 978-1-57735-800-8
  - Thomas N. Kipf and Max Welling (2017). “Semi-Supervised Classification with Graph Convolutional Networks”. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net
  - Ameya Daigavane, Balaraman Ravindran, and Gaurav Aggarwal (2021). “Understanding Convolutions on Graphs”. In: Distill. DOI: 10.23915/distill.00032
  - Justin Gilmer et al. (2017). “Neural Message Passing for Quantum Chemistry”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Ed. by Doina Precup and Yee Whye Teh. Vol. 70. Proceedings of Machine Learning Research. PMLR, pp. 1263–1272.
2023-02-28 Multimedia Processing Networks
- Multimedia processing architectures. VQA. NMNs. Hierarchical co-attention. Dall-e. Imagen. Stable Diffusion.
  - Yang Liu et al. (2021). “A survey of visual transformers”. In: arXiv preprint arXiv:2111.06091
  - Stanislaw Antol et al. (2015). “VQA: Visual question answering”. In: Proceedings of the IEEE international conference on computer vision, pp. 2425–2433
  - Jacob Andreas et al. (2016). “Neural Module Networks”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, pp. 39–48. DOI : 10.1109/CVPR.2016.12
  - Jiasen Lu et al. (2016). “Hierarchical question-image co-attention for visual question answering”. In: Advances in neural information processing systems 29
  - Aditya Ramesh et al. (2021). “Zero-Shot Text-to-Image Generation”. In: Proceedings of the 38th International Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. Vol. 139. Proceedings of Machine Learning Research. PMLR, pp. 8821–8831
  - Chitwan Saharia et al. (2022). Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. DOI: 10.48550/ARXIV.2205.11487
  - Robin Rombach et al. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695
2023-03-07 Neural Architecture Search
- NAS. NEAT. Knowledge/Network Distillation. GDAS.
  - Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter (2019). “Neural architecture search: A survey”. In: The Journal of Machine Learning Research 20.1, pp. 1997–2017
  - Kenneth O. Stanley and Risto Miikkulainen (2002). “Evolving Neural Networks Through Augmenting Topologies”. In: Evolutionary Computation 10.2, pp. 99–127
  - Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean (2015). “Distilling the Knowledge in a Neural Network”. In: NIPS Deep Learning and Representation Learning Workshop
  - Xuanyi Dong and Yi Yang (2019). “Searching for a Robust Neural Architecture in Four GPU Hours.” In: CVPR. Computer Vision Foundation / IEEE, pp. 1761–1770

Seminar presentations

emptyshore on https://www.sciencedirect.com/science/article/pii/S0920410521012845 (3D-PMRNN), March 2023. slides
AdriBen on A Scalable, Interpretable, Verifiable & Differentiable Logic Gate Convolutional Neural Network Architecture From Truth Tables, April 2023.

Some questions people might ask

Is there any cost associated with this? No, but if you enjoy the lectures please present a paper yourself. The goal is to learn together.

There will be certificates awarded to attendees? No.

Will the lectures cover architecture so and so? Most probably not but take a look at the programme above. Your best bet is to prepare a paper on that architecture for the seminar and we can discuss it all together.

Will coming to this seminar help me with "extremely precise needs arising from requirements at work or university"? Most certainly not. And in the off chance it does, you'll be sitting through tons of material that is not relevant. The discord server has very good question forums, your best bet is to post there.

Any more questions ping me on the server, I'm DrDub in there.