Skip to main content

Main menu

  • Home
  • Current Issue
  • Past Issues
  • Videos
  • Submit an article
  • More
    • About JFDS
    • Editorial Board
    • Published Ahead of Print (PAP)
  • IPR logos x
  • About Us
  • Journals
  • Publish
  • Advertise
  • Videos
  • Webinars
  • More
    • Awards
    • Article Licensing
    • Academic Use
  • Follow IIJ on LinkedIn
  • Follow IIJ on Twitter

User menu

  • Sample our Content
  • Request a Demo
  • Log in

Search

  • ADVANCED SEARCH: Discover more content by journal, author or time frame
The Journal of Financial Data Science
  • IPR logos x
  • About Us
  • Journals
  • Publish
  • Advertise
  • Videos
  • Webinars
  • More
    • Awards
    • Article Licensing
    • Academic Use
  • Sample our Content
  • Request a Demo
  • Log in
The Journal of Financial Data Science

The Journal of Financial Data Science

ADVANCED SEARCH: Discover more content by journal, author or time frame

  • Home
  • Current Issue
  • Past Issues
  • Videos
  • Submit an article
  • More
    • About JFDS
    • Editorial Board
    • Published Ahead of Print (PAP)
  • Follow IIJ on LinkedIn
  • Follow IIJ on Twitter

Context, Language Modeling, and Multimodal Data in Finance

Sanjiv Das, Connor Goggins, John He, George Karypis, Sandeep Krishnamurthy, Mitali Mahajan, Nagpurnanand Prabhala, Dylan Slack, Rob van Dusen, Shenghua Yue, Sheng Zha and Shuai Zheng
The Journal of Financial Data Science Summer 2021, jfds.2021.1.063; DOI: https://doi.org/10.3905/jfds.2021.1.063
Sanjiv Das
is a professor of finance at Santa Clara University and an Amazon scholar at Amazon Web Services in Santa Clara, CA and Palo Alto, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Connor Goggins
is a software development engineer at Amazon Web Services in Palo Alto, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John He
is a software development engineer at Amazon Web Services in Palo Alto, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
George Karypis
is a professor of computer science at the University of Minnesota and senior principal scientist at Amazon Web Services in Palo Alto, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sandeep Krishnamurthy
is a software development manager at Amazon Web Services in Palo Alto, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mitali Mahajan
is a graduate student at Santa Clara University in Santa Clara, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nagpurnanand Prabhala
is a professor of finance at Johns Hopkins University in Baltimore, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dylan Slack
is a graduate student at University of California in Irvine, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rob van Dusen
is a graduate student at the University of Chicago in Chicago, IL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shenghua Yue
is a software development engineer at Amazon Web Services in Palo Alto, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sheng Zha
is a senior applied scientist at Amazon Web Services in New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shuai Zheng
is an applied scientist at Amazon Web Services in Palo Alto, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Info & Metrics
  • PDF (Subscribers Only)
Loading

Click to login and read the full article.

Don’t have access? Click here to request a demo 
Alternatively, Call a member of the team to discuss membership options
US and Overseas: +1 646-931-9045
UK: 0207 139 1600

Abstract

The authors enhance pretrained language models with Securities and Exchange Commission filings data to create better language representations for features used in a predictive model. Specifically, they train RoBERTa class models with additional financial regulatory text, which they denote as a class of RoBERTa-Fin models. Using different datasets, the authors assess whether there is material improvement over models that use only text-based numerical features (e.g., sentiment, readability, polarity), which is the traditional approach adopted in academia and practice. The RoBERTa-Fin models also outperform generic bidirectional encoder representations from transformers (BERT) class models that are not trained with financial text. The improvement in classification accuracy is material, suggesting that full text and context are important in classifying financial documents and that the benefits from the use of mixed data, (i.e., enhancing numerical tabular data with text) are feasible and fruitful in machine learning models in finance.

TOPICS: Quantitative methods, big data/machine learning, legal/regulatory/public policy, information providers/credit ratings

Key Findings

  • ▪ Machine learning based on multimodal data provides meaningful improvement over models based on numerical data alone.

  • ▪ Context-rich models perform better than context-free models.

  • ▪ Pretrained language models that mix common text and financial text do better than those pretrained on financial text alone.

  • © 2021 Pageant Media Ltd
View Full Text

Don’t have access? Click here to request a demo

Alternatively, Call a member of the team to discuss membership options

US and Overseas: +1 646-931-9045

UK: 0207 139 1600

Log in using your username and password

Forgot your user name or password?
Back to top

Explore our content to discover more relevant research

  • By topic
  • Across journals
  • From the experts
  • Monthly highlights
  • Special collections

In this issue

The Journal of Financial Data Science: 4 (2)
The Journal of Financial Data Science
Vol. 4, Issue 2
Spring 2022
  • Table of Contents
  • Index by author
  • Complete Issue (PDF)
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on The Journal of Financial Data Science.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Context, Language Modeling, and Multimodal Data in Finance
(Your Name) has sent you a message from The Journal of Financial Data Science
(Your Name) thought you would like to see the The Journal of Financial Data Science web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Context, Language Modeling, and Multimodal Data in Finance
Sanjiv Das, Connor Goggins, John He, George Karypis, Sandeep Krishnamurthy, Mitali Mahajan, Nagpurnanand Prabhala, Dylan Slack, Rob van Dusen, Shenghua Yue, Sheng Zha, Shuai Zheng
The Journal of Financial Data Science Jun 2021, jfds.2021.1.063; DOI: 10.3905/jfds.2021.1.063

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Save To My Folders
Share
Context, Language Modeling, and Multimodal Data in Finance
Sanjiv Das, Connor Goggins, John He, George Karypis, Sandeep Krishnamurthy, Mitali Mahajan, Nagpurnanand Prabhala, Dylan Slack, Rob van Dusen, Shenghua Yue, Sheng Zha, Shuai Zheng
The Journal of Financial Data Science Jun 2021, jfds.2021.1.063; DOI: 10.3905/jfds.2021.1.063
del.icio.us logo Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Tweet Widget Facebook Like LinkedIn logo

Jump to section

  • Article
    • Abstract
    • RESEARCH QUESTIONS
    • DATA
    • RESULTS AND FINDINGS
    • CONCLUDING DISCUSSION
    • ENDNOTES
    • REFERENCES
  • Info & Metrics
  • PDF (Subscribers Only)
  • PDF (Subscribers Only)

Similar Articles

Cited By...

  • No citing articles found.
  • Google Scholar
LONDON
One London Wall, London, EC2Y 5EA
0207 139 1600
 
NEW YORK
41 Madison Avenue, 20th Floor, New York, NY 10010
646 931 9045
pm-research@pageantmedia.com

Stay Connected

  • Follow IIJ on LinkedIn
  • Follow IIJ on Twitter

MORE FROM PMR

  • Home
  • Awards
  • Investment Guides
  • Videos
  • About PMR

INFORMATION FOR

  • Academics
  • Agents
  • Authors
  • Content Usage Terms

GET INVOLVED

  • Advertise
  • Publish
  • Article Licensing
  • Contact Us
  • Subscribe Now
  • Sign In
  • Update your profile
  • Give us your feedback

© 2022 Pageant Media Ltd | All Rights Reserved | ISSN: 2640-3943 | E-ISSN: 2640-3951

  • Site Map
  • Terms & Conditions
  • Privacy Policy
  • Cookies