Senior Research Scientist - Kensho

Cambridge, MA

About the Job

About Kensho:

Kensho is a 120-person Machine Learning (ML) and Natural Language Processing (NLP) company, centered around providing cutting-edge solutions to meet the challenges of some of the largest and most successful businesses and institutions. We are owned by S&P Global and operate independently. Our toolkit illuminates insights by helping the world better understand, process, and leverage messy data. Specifically, Kensho’s solutions largely involve speech recognition (ASR), entity linking, structured document extraction, automated database linking, text classification, and more. We are continuously expanding our portfolio and are looking for passionate researchers to help us create state-of-the-art models across a variety of domains! Are you looking to solve hard problems and enjoy working with teammates with diverse perspectives? If so, we would love to help you excel here at Kensho. We are a collaborative group of experienced Research Scientists and Machine Learning Engineers, whose academic backgrounds include doctorate degrees in NLP, theoretical physics, statistics, etc. We take pride in our team-based, tightly-knit startup-like Kenshin community, which fosters continuous learning and a communicative environment.

At Kensho, we hire talented people and give them the freedom, support, and resources needed to accomplish our shared goals. We believe in flexibility-first and give our employees the opportunity to work from where they feel most productive and engaged (must be in the United States). We also value in-person collaboration, so there may be times when travel to one of our Kensho hubs (e.g., Cambridge, MA or NYC) will be required for team meetings or company events.

About the R&D Lab:

Since 2022, we have been building a world-class R&D lab comprised of NLP Research Scientists, and we heavily prioritize publishing in top-tier conferences. Our small team has demonstrated compelling results and is fueling innovation throughout Kensho and S&P Global at large. Specifically, we are continuously developing Large Language Models (LLMs) and are actively working on long-context question-answering (QA), complex reasoning, tokenization, alignment (e.g., factuality), multi-document QA, and more!

Our small team has reserved access to hundreds of fast GPUs (A100s), spanning Cloud and on-prem machines.

Our current projects include:

- Long-context document QA, where the answer is contained within documents that are hundreds of pages in length [1]

- Complex reasoning, including better understanding and improving models’ ability to approximate numbers (related to commonsense reasoning).

- Creating rigorous evaluation benchmarks, spanning domain knowledge, quantity extraction, and program synthesis [2]

- Improving existing alignment techniques for domain-specific needs, while also addressing factuality

- Dissecting tokenizers to better understand how each of the sub-components impact intrinsic and extrinsic performance [3][4]

- Multi-Document QA where the answer requires combining information from dozens of sources.

- Retrieval-augmented generation (RAG) methods

- Creating high-quality data filters for LLM development

Additionally, we maintain strong relationships with academia, including collaborating on several ongoing projects, providing industry grants, sponsoring conferences, and jointly holding faculty positions.

[1] DocFinQA: A Long-Context Financial Reasoning Dataset (Reddy et al., 2024)

[2] BizBench: A quantitative reasoning benchmark for business and finance (Koncel-Kedziorski et al., 2024)

[3] Tokenization Is More Than Compression (Schmidt et al., 2024)

[4] Greed is All You Need: An Evaluation of Tokenizer Inference Methods (Uzan et al., 2024)

Source : Kensho