How we helped a public equity firm build an automated dataset for active securities using Cap IQ screens

March 17, 2023

The Context

The client, a Public Equity firm, wanted the TresVista team to create a dataset of active securities per period for the last 40 years for a set of 14-15 geographies using Cap IQ screens. The client wanted the team to determine the primary exchanges for each geography, create a dataset free of survivor bias, and remove data inconsistencies using Python. Further, the team was asked to automate the process of fetching fundamental and market-related data for each security using CapIQ mnemonics for excel. Additionally, the team was told to automate the data ingestion into multiple custom-built excel VBA-based backtesting templates.

The Objective

To find the primary exchanges, automate the process of making a survivor bias-free database, fetch KPIs using CapIQ mnemonics and finally automate the data pipeline so that data is available for slicing and dicing, and running multiple backtesting strategies.

The Approach

The TresVista Team followed the following process:

  • Gathered appropriate region-wise primary exchanges
  • Removed inconsistencies in the data using collaborative filtering
  • Automated the process using the trade dates to build initial screens
  • Fetched fundamentals and market data for the relevant securities
  • Leveraged the databases in multiple backtesting templates

The Challenges We Overcame

The major hurdle faced by the TresVista Team were:

  • For each geography, there were different data challenges, and the team had to analyze and build different filtering algorithms
  • Multiple screens had to be analyzed as S&P stores data of inactive tickers/securities separately
  • Due to heavy CapIQ formulas, the CapIQ refresh took at least an hour, causing other excel functions to stop

The team overcame the hurdles by building algorithms to clean the data, identifying how and which variables impacted the whole database pull, and then attempting optimizations to reduce the time database takes to refresh the list of active securities by geographies. This was done, ensuring the strategy followed was not altered. The team also understood how the database stores the data and the multiple ways it links to different data points.

Final Product (Sanitized)

The Value Add – Catalyzing the Client’s Impact

The TresVista Team automated ~95.0% of the whole database creation process. The automation now takes one click to get all the trading periods for a security. The team also provided 100.0% security coverage across geographies and exchanges that the client had fetched. Further, the team tweaked the CapIQ data pull that reduced the data fetching time by ~40.0 minutes. Additionally, the team created databases of multiple periodicities for the client to test the hypothesis at any level of data granularity.