Deep-Learning

deep-learning

LocoProp: enhancing backprop via local loss optimization

Posted on 2022-08-05 at 09:52:06 UTC-0400

This was a paper I presented about in Bang Liu’s research group meeting on 2022-08-05. You can view the slides I used here.

the effects of scale on worst-group performance

Posted on 2022-07-18 at 15:31:12 UTC-0400

I think it’s valuable to be working in the open whenever possible, so I’m going to keep my research notes here. These notes will hopefully be full of good (and bad) ideas, so if someone borrows a good idea and publishes on it, that’s great! This post contains my research notes as I try to understand how model scaling affects worst-group performance. This started as a group project in the neural scaling laws course at Mila in winter 2022. We presented about an existing paper and presented our preliminary results in class. The repository for this project is here.

Continual-T0: progressively instructing 50+ tasks to language models without forgetting

Posted on 2022-06-02 at 15:28:55 UTC-0400

This was a paper I presented about in Bang Liu’s research group meeting on 2022-06-06. You can view the slides I used here. Continual-T0 (CT0) extends T0 by progressively training it on 8 unseen language generation tasks, while retaining a replay buffer of 1% of the original training data to preserve performance. The result is a model that maintains nearly all of its performance on previous tasks while learning the new tasks. In addition, CT0 maintains the original T0’s performance on unseen tasks (which is a big deal because those tasks could not appear in the replay buffer) and it extends the compositionality of T0 to even more unseen tasks.

Multitask prompted training enables zero-shot task generalization (T0)

Posted on 2022-05-27 at 17:05:02 UTC-0400

T0 builds on T5 by fine-tuning on more natural prompts and testing the model’s generalization to held-out tasks. Compare the training format diagrams for T5 (top) and T0 (bottom): Intuitively, the T0 prompts are more likely to be similar to implicit/explicit prompting that’s present in the pretraining data. The authors created several prompts for each dataset.

PaLM

Posted on 2022-04-11 at 12:17:25 UTC-0400

This was a paper I presented about in Bang Liu’s research group meeting on 2022-04-11. You can view the slides I used here.