nlp

PaLM

This was a paper I presented about in Bang Liu’s research group meeting on 2022-04-11. You can view the slides I used here.

It's not just size that matters: small language models are also few-shot learners

We presented this paper as a mini-lecture in Bang Liu’s IFT6289 course in winter 2022. You can view the slides we used here.

A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification

This post was created as an assignment in Bang Liu’s IFT6289 course in winter 2022. The structure of the post follows the structure of the assignment: summarization followed by my own comments. paper summarization permalink Word embeddings have gotten so good that state-of-the-art sentence classification can often be achieved with just a one-layer convolutional network on top of those embeddings. This paper dials in on the specifics of training that convolutional layer for this downstream sentence classification task.
Read more

Learning transferable visual models from natural language supervision (CLIP)

This post was created as an assignment in Irina Rish’s neural scaling laws course (IFT6167) in winter 2022. The post contains no summarization, only questions and thoughts. This concept of wide vs. narrow supervision (rather than binary “supervised” and “unsupervised”) is an interesting and flexible way to think about the way these training schemes leverage data. The zero-shot CLIP matches the performance of 4-shot CLIP, which is a surprising result. What do the authors mean when they make this guess about zero-shot’s advantage:
Read more

Distributed representations of words and phrases and their compositionality

This post was created as an assignment in Bang Liu’s IFT6289 course in winter 2022. The structure of the post follows the structure of the assignment: summarization followed by my own comments. paper summarization permalink This paper describes multiple improvements that are made to the original Skip-gram model: Decreasing the rate of exposure to common words improves the training speed and increases the model’s accuracy on infrequent words. A new training target they call “negative sampling” improves the training speed and the model’s accuracy on frequent words.
Read more