Kyle Roth

I'm a speech scientist at Cobalt Speech & Language, a company that designs custom speech recognition, text-to-speech, and dialogue models. While at Cobalt I've worked on some neat projects, including language modeling for recognizing air traffic control speech and creating an online training system for ASR models. I am currently in the process of applying to PhD programs in computer science, to start in 2021. See the Research section for some of the areas of machine learning that I find fascinating.

I love using math and code to solve complex problems. I wrote my first lines of Python in 2012, and it was there that I developed a love for software. I graduated from BYU with a BS in Applied and Computational Mathematics (ACME) with an emphasis in linguistics and a minor in computer science. (ACME coursework focuses on the rigorous math for data science, modeling, optimal control, and machine learning.)

During my undergrad I interned with Cobalt Speech (my current employer), as well as Emergent Trading, an automated trading firm that made the news for reporting a problem in a Eurodollar exchange rule that unfairly favored larger competitors. (I developed the analysis tools that were used to track the issue down and determine how our opponent was taking advantage of the rule.)

Contact (he/him)

Around the web I'm known by the username kylrth. I prefer to be contacted through the Matrix protocol (@kyle:kylrth.com). If you'd like an account on my Matrix server, follow the instructions here and ask me for the token.

Matrix  /  email  /  Twitter  /  Github  /  LinkedIn  /  resume

Research

My first foray into ML research was when I received a grant to apply a variable-order CRF model to a morphological parsing task in Basque, achieving 71.3% accuracy. A few months later I started working with the computational photonics group at CamachoLab, where I worked to simulate photonic components using DNNs as replacements for expensive FDTD simulations when designing chip components.

Here are some of my current interests:

  • I'm curious about the relationship between representation (how intelligence thinks) and language (how intelligence communicates). I think if we can improve our understanding of how humans learn and use language, we will come closer to understanding the tools necessary to build the "system 2 cognitive abilities" that Yoshua Bengio has talked about.
  • My most important moral principle is helping others, especially those who are marginalized or impoverished. I think there are many social problems that can be solved with machine learning. I get most passionate about helping refugees and migrants, reducing language barriers, and improving racial and cultural inequality.
  • attention mechanisms (see this literature review I wrote on the subject) and other architectural components that allow models to create natural internal representations
  • developing better theories about how networks learn (maybe because deep learning is biased toward simple functions?)
  • end-to-end models for speech recognition
Projects

In my spare time I'm working on an app for language learners can use to develop their vocabulary. Users can add words to their active list, and the app will recommend additional words that they might also want to learn. I hope this app will make it easier for displaced people to adapt to the culture where they find themselves, even when their language skills are intermediate or advanced.

Here are some projects I've worked on:

attention graph for a sample sentence in NMT

an attention review

I felt I had a poor understanding of attention mechanisms, so for a writing assignment I created a literature review of the important work being done with neural attention. I wrote about the various formulations of attention found in the literature, and discussed important applications. I learned a lot as I studied this. You can check out my write-up here.

the trapezoid used to picture vowels in phonology

speech2phone

speech2phone was a class project I worked on with two other ACME students our senior year. We tried to tackle the problem of phoneme recognition on the TIMIT corpus by splitting it into the tasks of segmentation and classification. We had some fun exploring various algorithms and architectures. We didn't yet have a good sense for the current state of the art research and so we sort of threw everything at the problem and watched what happened. You can see the repo here and the final report here.

SLURM logo

SLURM_gen

SLURM_gen makes it easy to generate and handle arbitrarily-sized datasets on a SLURM HPC environment. I used this tool during my undergrad while working on computational photonics research. (I used the BYU supercomputer to generate FDTD simulations, which were then used as training data for the neural network model.)

Twitter image created by Shawn Campbell. See attribution below.

ngramtweets

An n-gram text generator that handles punctuation and end-of-tweet along with the words. It tries to use hashtags, @-mentions, and URLs correctly.


Site created from Jon Barron's template with some suspicious similarities to Thomas Wolf's site.

Twitter image created by Shawn Campbell and used according to the terms of the CC BY 2.0 license.