Leveraging Machine Readable Filings to Uncover Valuable Investment Insights


As sovereign wealth funds (SWFs) around the world grow in size and number, they are becoming increasingly powerful players on the global financial scene.[1] The investments they make are effective ways for their countries to diversify income streams and become less reliant on a few sources of revenue.

This large SWF invests in all asset classes throughout international markets, including equities, fixed income, real estate, private equity and alternatives. The company had recently put together a new group comprised of quantitative analysts and strategists responsible for extracting important insights from data. The group wanted to expand the information they had available and include additional sources of machine readable annual and interim financial reports to help improve efficiencies and the speed of discovery.

Pain Points

Members of the investment team needed to support both tactical and strategic decision-making at the SWF. They saw the many benefits of using natural language processing (NLP) with large quantities of textual data and had developed an internal solution for obtaining machine readable information on filings for U.S. and global companies. This required an extensive amount of time to gather and maintain the information, however, and team members were concerned about coverage and quality. They wanted to outsource this to a reputable provider, plus find a solution that offered:

  • A comprehensive set of high-quality global information.
  • Easy access through a data feed option.
  • Hands-on technical and product support to address any issues as they arose.

The team had heard that S&P Global Market Intelligence (“Market Intelligence”) was using a lot of artificial intelligence capabilities to connect the dots across its many datasets and contacted the firm to learn more about its offering for machine readable filings.

The Solution

Market Intelligence discussed its U.S. and Global Machine Readable Filings dataset that provides parsed text of annual and interim reports, broken into the various sections identified by companies, with extraneous information (such as page numbers, images and tables) removed. The data is delivered in a structured format enabling users to perform NLP against it without having to do the document cleanup and structuring themselves. This would enable the team to easily support an analysis of strategic initiatives, earnings, M&A plans, new product possibilities and much more.

Easily screen and evaluate U.S. filings

Machine Readable U.S. Filings provide full coverage of 10K/10Q/8K/6K/20F/40F reports dating back to 2006. The documents are structured by major sections, such as Management Discussions & Analysis, Risks, Competition and Intellectual Property. The document hierarchy is retained, creating a historical baseline for backtesting.



Access data extracted from global filings

Machine Readable Global Filings cover 89% of the MSCI World market cap. Data is gathered from 400+ sources, including company websites, stock exchanges and regulatory websites.



Store data in a centralized warehouse

XpressfeedTM automates the download and management of filings data to a centralized company warehouse, delivering updates every two hours. This supports easy linking to other Market Intelligence datasets, including financials, estimates and events data.

Key Benefits

Members of the investment team saw many benefits to the offering and subscribed to both the U.S. and Global Machine Readable Filings. In particular, they thought this would help them:

  • Save enormous amounts of time currently being spent on creating an internal solution for the collection and processing of U.S. and global data.
  • Improve the overall quality of the data with a tested and scalable solution for text-cleansing, including maintaining consistency across reporting periods regardless of structure changes, reclassifying heading sections for standardization purposes and removing irrelevant elements, such as table headers and page numbers.
  • Access extensive textual information for companies around the world, which is pre-tagged, structured and organized.  
  • Replicate fundamental analyst workflows across millions of documents to increase the breadth of analysis and identify documents and sections of highest importance.
  • Leverage NLP and data mining techniques to systematically identify themes, trends and major changes within a company’s reporting of material qualitative information. For example, identify mentions of “ESG”, while filtering out mentions that do not actually apply to this concept.
  • Easily combine the information with other relevant financial data and market participant actions to establish patterns that could warrant further inquiry.

The investment team is now looking at other machine readable textual datasets provided by Market Intelligence. This includes Transcripts that review data on earnings calls, company conference calls and special calls.

Click here to explore the datasets mentioned in this case study.

