Introduction and Automation Framework

By Brian Pisaneschi, CFA

Since the dawn of algorithmic trading, automation has created major efficiency gains in the investment process. In the last 20 years, these gains have come from rules-based algorithms, which often automate repetitive processes in Visual Basic (VBA) in Excel and, more recently, Python. Typical efficiency gains for the investment process came from automating activities such as data importation — for example, pulling stock prices into an Excel template built to model portfolios, perform valuations, or run VBA macros to clean and reformat data. These processes had to be performed manually multiple times until the algorithm rules could be fine-tuned to perform a working process.

With the dawn of generative AI (GenAI), automation no longer occurs only for rules-based, repetitive tasks. It can now offer capabilities that were previously performed only by humans. GenAI has introduced a plethora of new possibilities to automate the investment process, from changing the way a manager can screen for new investment ideas to creatively generating new trade ideas from news articles.

The potential for efficiency gains has caught the investment industry’s attention. A recent survey by AI chipmaker Nvidia revealed that 55% of financial service companies are actively exploring generative AI workflows (Levitt 2024). The question remains, however, whether the industry can capture the value that this new technology promises.

Financial service companies have highlighted the vast possibilities and current use cases, but these descriptions are often vague, lacking the technical details needed to solidify the viability and value of GenAI. This series will explore these details. Each installment will cover a tool or group of tools being used for automation and showcase their application to the investment process by providing a Python code-based example. Throughout the series, we will explore the tangible value this technology offers and where it still needs improvement.

The code to these installments will be housed on the Research and Policy Center GitHub page, where it can evolve and be discussed as we all learn how best to use these new tools. We invite readers to explore and learn the capabilities of these tools firsthand by joining the Research and Policy Center Discord Server and sharing your implementations.

In this introductory installment of the series, we provide an overview of the current state of GenAI and its applicability to the investment industry, highlighting the tools that professionals are actively using to automate key processes. We conclude this installment by introducing a framework for evaluating which projects are most suitable for automation, helping you to determine where the greatest efficiency gains may be realized.

Investment Industry Use Case Timeline

Since the launch of ChatGPT, we have witnessed a remarkable evolution in the application of GenAI. A 23 July survey by CFA Institute revealed that most investment professionals are using GenAI primarily for basic company and industry research, often as a substitute for Google (CFA Institute 2024). The use of more advanced applications that require programming, however, was notably absent. This finding highlights a critical point: The vast majority of use cases rely heavily on the capabilities offered by the ChatGPT platform.

To understand what investment-related tasks are now within reach thanks to ChatGPT, we examine the platform’s evolution, illustrated in Exhibit 1. When ChatGPT first launched in November 2022, its primary use cases were relatively straightforward — administrative tasks such as managing emails, providing writing assistance, and aiding in basic investment research. These early applications included gaining a better understanding of market-specific characteristics and making sense of unfamiliar investment concepts. Although useful, these tasks mostly offered a substitute for Google.

Exhibit 1. GenAI Timeline: Evolution and Applications

The introduction of advanced data analytics marked a significant step forward. With the ability to connect to a Python interpreter, users could now run basic calculations directly within the ChatGPT environment, enhancing its analytical capabilities by enabling basic data profiling and data visualizations.

Next came the arrival of retrieval-augmented generation (RAG), which further expanded the platform’s functionality by allowing users to upload documents and pull in web data. This technology made summarizing financial reports easier and more insightful, particularly in the context of extracting risks and opportunities more efficiently (Kim, Muhn, and Nikolaev 2024a, 2024b).

The introduction of custom GPTs allowed for even more specialized applications. These tailored GPTs can connect to application programming interfaces (APIs), enabling the creation of powerful stock analyzers. These custom models can retrieve the latest information from news articles, access real-time stock price data, and perform complex calculations using ChatGPT’s Python terminal.

The most recent advancement is the integration of multimodality, particularly through the GPT-4 Vision model. This innovation allows users to analyze graphs, extract key information, and even recreate visual data within Python. The ability to interpret and generate visual data alongside textual analysis has opened up new ways to apply AI in the investment industry.

Challenges and Solutions

The practical application of GenAI has been significantly challenged by a persistent issue: the tendency of large language models (LLMs) to hallucinate, generating “facts” not present in their training data. LLMs generate text autoregressively, selecting the next word based on the highest probability within a vast vocabulary. This process means that an LLM will always produce a word, even when that word lacks relevance or accuracy in response to a question. Because of its autoregressive nature, the model can continue this incorrect line of reasoning, resulting in a string of confident nonsense.

To mitigate this issue, the machine learning community uses several techniques:

Prompt engineering involves carefully crafting prompts to guide the model toward more-accurate responses.
Retrieval-augmented generation (RAG) integrates provided information with external data retrieval to supplement the model’s responses with more-reliable information.
Fine-tuning adjusts the model’s parameters by training it on specific datasets to improve accuracy in particular domains.

The first two techniques fall under the category of in-context learning. They guide the already-trained model by providing it with context, helping it to generate more-accurate answers without altering the model’s underlying parameters. Fine-tuning, on the other hand, is a deep learning technique that involves using additional data to modify the neural network’s parameters, similar to the original training process of the foundation model. Exhibit 2 illustrates the differences between the two approaches.

Exhibit 2. In-Context Learning vs. Fine-Tuning

Each of these techniques has its strengths and limitations, with certain applications being more effective in specific contexts. The broader question remains, however: Can the estimated $1 trillion in capital expenditure projected over the coming years truly unlock the potential of these technologies despite their current flaws (Goldman Sachs 2024)? And, are these workarounds merely temporary solutions until we achieve a fully realized artificial general intelligence (AGI) system (Morris, Sohl-Dickstein, Fiedel, Warkentin, Dafoe, Faust, Farabet, and Legg 2024)?

The Need for Reasoning

The concept of System 1 and System 2, introduced by Daniel Kahneman in his book Thinking, Fast and Slow (2011), provides a useful framework for understanding how LLMs currently operate. System 1 is the fast, automatic part of our brain that helps us with routine tasks, such as putting on pants or brushing our teeth, that we have committed to memory and can perform without conscious thought. In contrast, System 2 is slower, involving deliberate reasoning and deduction, such as solving math problems or planning a vacation.

Much like our brain’s System 1, LLMs generate text automatically by predicting one word after another based on patterns learned during training. This process is purely stochastic, meaning that LLMs rely on a probability distribution of patterns learned rather than any inherent reasoning. In other words, they memorize low-level patterns, such as grammar and word co-occurrence, but struggle to identify higher-order patterns that require deduction based on sets of latent rules (Wang, Yue, Su, and Sun 2024).

By enabling the acquisition of reasoning and deduction during training, we would be giving LLMs a System 2 implicitly. This differs from explicit reasoning — a major discovery in prompt engineering, whereby a user prompts the LLM to think through the question step by step before arriving at its final answer. Just as an algebra student would be asked to show her work when solving a problem, each step allows the LLM to build upon its work, increasing the probability that the next words will be accurate. This approach is known as chain-of-thought prompting (CoT; Wei, Wang, Schuurmans, Bosma, Ichter, Xia, Chi, Le, and Zhou 2023).

Although LLMs are getting closer to implicit reasoning, they are still missing this extremely important attribute to enable a fully autonomous system. However, the success of CoT has led to a new paradigm of prompt chaining to mitigate the flaws in current LLMs’ implicit System 1 thinking. To address these implicit reasoning limitations and enhance LLMs’ practical utility, a variety of tools and techniques have been developed that incorporate CoT and prompt chaining.

Tools for Automation

Throughout this series, we will focus primarily on tools and techniques that fall under the category of in-context learning. Among them are prompt chaining, retrieval-augmented generation (RAG), function calling, and agents. Each of these methods plays a crucial role in enhancing the capabilities of LLMs without requiring extensive retraining

Prompt chaining involves breaking down complex tasks into smaller, manageable prompts that build upon each other. Investment professionals could use prompt chaining to conduct detailed financial analysis by guiding the LLM through a series of logical steps, such as first summarizing a financial statement, then identifying key metrics, and finally assessing potential risks and opportunities.
RAG combines the generative capabilities of LLMs with external data retrieval. For instance, an investment professional might use RAG to pull the latest market data or news articles, allowing the LLM to generate reports or insights that are informed by the most current information. This capability makes RAG particularly useful for tasks such as generating market analysis or updating portfolio strategies based on recent events.
Function calling enables LLMs to interact with APIs or other systems to perform specific tasks. For example, an investment professional could use function calling to automatically retrieve real-time stock prices, execute trades, or even update a portfolio management system.
Agents are autonomous tools that can perform a sequence of tasks on the user’s behalf. In the context of investment management, an agent might be set up to monitor market conditions continuously, generate alerts when certain thresholds are met, and propose actionable insights. For instance, an agent could automatically analyze economic indicators and suggest adjustments to an investment portfolio based on predefined strategies.

Although this list of techniques is not exhaustive, the series will inevitably delve into additional prompt engineering methods as we explore the intricacies of each tool’s application.

Framework For Automation

This automation framework is intended not as an enterprise solution but rather as a guide to help individuals develop an intuitive sense of what techniques could work and what may not be worth pursuing before diving into experimentation. In a field as rapidly evolving as AI, the best approach is often to experiment quickly and learn by doing.

Although many companies are developing enterprise automation frameworks for production-ready products and processes, this framework is designed for individuals. It encourages exploration and hands-on learning, allowing you to satisfy your curiosity and build practical skills in the process. Ultimately, answering the question of automation’s return on investment at an enterprise level begins with a firm’s staff gaining a deep understanding of the technology.

Exhibit 3 compares the characteristics of tasks suitable for traditional automation versus human tasks. We then explore how GenAI differs from both types of tasks.

Exhibit 3. Attributes of Traditional Automation vs. Human Tasks

Attribute	Traditional Automation	Human Tasks
Data type	Structured data (e.g., tabular financial statements)	Unstructured data (e.g., emails, market news)
Task variability	Repetitive, routine tasks (e.g., performance reports)	Variable, context-driven tasks (e.g., strategy formulation)
Input objectivity	Objective, clearly formatted inputs	Subjective, based on intuition and expertise
Output objectivity	Objective, rule-based outputs	Subjective, tailored solutions
Scalability	Highly scalable across repeated tasks	Difficult to scale because of their personalized nature

Although most of these attributes are self-explanatory, input and output objectivity require a bit of clarification. Input objectivity refers to how objective and structured the information is that is fed into the task. Output objectivity, on the other hand, relates to how easily the output can be verified. When an output needs to be highly accurate and verifiable, it must be objective, precise, and not based on interpretation.

Traditional automation for investment processes functions primarily as software. It handles structured data and automates repetitive tasks such as data cleansing, formatting, and performance reporting. These tasks have clear, objective inputs and outputs, driven by rule-based algorithms, making them highly scalable.

On the other hand, human tasks can be viewed as services, often dealing with ambiguous, unstructured data that is highly variable — no two tasks are ever exactly the same. The inputs are subjective, relying on interpretation and intuition developed through years of experience, with outputs tailored to each specific situation. Because of this variability and complexity, the processes in which humans excel are inherently difficult to scale.

GenAI Automation

We first explore the advantages and challenges inherent in GenAI automation today. Despite its vast opportunities and potential for scalability, investment professionals need to keep in mind the limitations when they begin using GenAI.

Data Type: Unstructured Data

One of the most transformative changes that GenAI brings to investment professionals is the democratization of unstructured data in the investment process. Previously, only firms with large data teams and specialized data science talent could unlock the potential of automating processes using unstructured data. For example, models such as BERT (Bidirectional Encoder Representations from Transformers) allowed these firms to automate tasks such as sentiment analysis, but doing so required significant manual effort to label large datasets.

With the advent of frontier LLMs such as GPT-4, the need for extensive labeled datasets has diminished. These models are trained using massive amounts of data, allowing them to generalize to new tasks with minimal additional examples. For instance, investment analysts can now use GenAI to extract insights from earnings call transcripts, market news, or client emails with just a few prompts or examples — without requiring the exhaustive labeling that earlier models needed.

Additionally, the barrier to entry in terms of technical knowledge has significantly lowered. With GenAI, professionals with basic data and Python skills can use powerful models to automate tasks such as synthesizing research reports, analyzing sentiment, or even generating tailored portfolio recommendations based on market trends. This newfound accessibility makes GenAI well-suited for tackling tasks involving unstructured data.

Task Variability: Repetitive and Variable

Generative AI redefines traditional automation by enabling automation of tasks that fall in between repetitive, structured processes and highly variable, context-driven tasks traditionally handled by humans. Using advanced pattern recognition and contextual understanding to process unstructured data, GenAI makes automation possible for tasks that were previously difficult to automate — such as customizing investment strategies to fit unique client goals or drafting personalized client communications. Although these tasks require regular execution and significant contextual awareness, they can be partially automated by being properly grounded to client data, maintaining human oversight to ensure accuracy and alignment with strategic objectives.

Input Objectivity: Subjective, Ambiguous

GenAI has the unique ability to handle both objective and subjective inputs. GenAI excels in processing unstructured and ambiguous data, such as free-form text, images, and other complex inputs. Instead of “understanding” the input as a human would, GenAI identifies patterns across vast datasets, allowing it to generate meaningful outputs even from highly subjective or unclear information. This capability opens the door for automating tasks that require interpretation and contextual flexibility.

Output Objectivity: Subjective and Objective but Stochastic

LLMs produce stochastic outputs, which are non-deterministic and can vary even when given the same input. These outputs are based on probabilities derived from the patterns the models have learned during training. As a result, GenAI outputs are flexible and creative but also come with inherent variability and uncertainty, which can affect both consistency and accuracy.

Achieving fully objective, deterministic outputs is a challenge for GenAI because of its probabilistic nature. Although GenAI can approximate objectivity, tasks that require precise, consistent results are harder to automate. The reliance on pattern recognition introduces an element of uncertainty in every output, making it difficult to guarantee absolute accuracy.

The stochastic nature of GenAI can be mitigated, however. Fine-tuning the model, adjusting parameters such as the temperature setting to control the degree of randomness, or using post-processing techniques can help make outputs more objective. Even so, 100% certainty remains unattainable — the best GenAI can offer is a high probability of achieving the desired outcome.

In fact, OpenAI has dedicated significant effort to enhancing the objectivity of model outputs, particularly through structured outputs such as JavaScript Object Notation (JSON). JSON provides a semi-structured format that can be easily integrated into APIs or other systems, making it ideal for production-ready applications. OpenAI has been fine-tuning models to consistently output in JSON format to ensure that AI-generated data can be effectively used in real-world applications, such as function calling through APIs.

In a recent update, OpenAI introduced a model capable of producing structured outputs 100% of the time. Because of the stochastic nature of LLMs, however, there remains a non-zero chance that outputs could still contain inaccuracies or errors (Pokrass 2024). Therefore, tasks that carry significant risks if implemented incorrectly should always involve a human in the loop to verify and validate the outputs before deploying those outputs in production.

Scalability: High, but Potentially Costly

The key to GenAI’s scalability lies in its ability to generalize across tasks. Narrow AI models, such as BERT, allowed scaling for specific tasks requiring ambiguous interpretation (such as sentiment analysis). In contrast, GenAI models such as GPT can handle a wide range of tasks without extensive retraining. This ability enables GenAI to automate both repetitive tasks, such as summarizing documents, and dynamic, creative tasks, such as generating new content, at scale.

Although GenAI can scale across diverse tasks, it can also be costly to implement at scale. The computational resources required to run large models, particularly in real-time or for high-volume applications, can be substantial.

Exhibit 4 explores the characteristics and some potential applications for GenAI in finance. For instance, LLMs can analyze thousands of financial reports, summarize earnings calls, and provide insights — tasks that would typically require large teams of analysts. Although this capability creates immense opportunities for scaling, especially in research and analysis, the cost of computing power can be high. Thus, despite GenAI’s vast scalability, organizations must carefully weigh the potential costs and infrastructure demands when scaling GenAI applications.

Exhibit 4. Attributes of GenAI Automation

Attribute	GenAI Automation
Data type	Unstructured data (e.g., earnings transcripts, market news)
Task variability	Repetitive and variable tasks (e.g., customizing strategies, client communications)
Input objectivity	Handles subjective, ambiguous inputs (e.g., free-form text)
Output objectivity	Stochastic, probabilistic outputs (e.g., personalized reports)
Scalability	Scalable, but with potential high computational cost

Many of the tasks mapped in Exhibit 4 closely mirror the kinds of activities traditionally performed by humans. The key to understanding potential tasks for LLM automation, therefore, lies in understanding which tasks can be fully automated and which require a human in the loop. Although AI and automation can handle a wide array of processes, especially repetitive or rules-driven ones, certain tasks — particularly those involving subjective interpretation, high variability, or significant risk — still demand human oversight.

Exhibit 5 illustrates how tasks across a portfolio manager’s workflow can be mapped based on their output objectivity and task variability. It highlights which areas are best suited for traditional automation, LLM-driven automation, and human intervention.

Exhibit 5. Tasks by Category: Traditional Automation, LLM Automation, and Human Intervention

A Hybrid Approach

Many of the tasks listed in Exhibit 5 are showcased in isolation, focusing on traditional automation, LLM-driven tasks, or human intervention. When we combine these approaches, however, we can draw on the strengths of each method to create more powerful automations. This hybrid approach is often more effective than using LLMs or automation in isolation. It does require programming skills, however, in order to seamlessly integrate the various components.

The following examples illustrate how the hybrid approach can enhance key tasks. We explore the possibilities for portfolio risk assessment and client communications.

Example 1: Portfolio Risk Assessment

The task of portfolio risk assessment involves calculating risk metrics such as value at risk (VaR) or the Sharpe ratio. Both measures are highly objective but depend on variable data from the market.

Traditional automation can handle the repetitive, objective aspects, such as gathering and processing structured financial data (historical performance, asset allocations) and running predefined risk calculations.
LLM automation can analyze unstructured data sources, such as market sentiment, news articles, or earnings call transcripts, providing additional context that traditional metrics may miss.
Human intervention becomes essential when interpreting the results and making high-stakes decisions based on the risk assessment. Using the automated insights, portfolio managers can adjust their strategies to incorporate nuanced client goals or unforeseen market shifts.

In this hybrid approach, traditional automation manages the quantitative side, LLMs provide context from unstructured data, and human intervention ensures appropriate oversight for critical decisions.

Example 2: Client Communication and Personalized Reporting

For ongoing communication with clients, portfolio managers need to send regular updates, offer strategic advice, and provide personalized insights based on the client’s portfolio. A hybrid approach offers several advantages.

Traditional automation can handle the repetitive generation of performance summaries and standard metrics, which involve pulling structured data from financial systems.
LLM automation can draft personalized commentary, interpreting market trends or performance in a way that aligns with a specific client’s investment objectives. An AI model can also summarize earnings calls and offer insights into how the latest market developments impact the client’s portfolio.
Human intervention remains crucial for the final review and customization of client communications. The portfolio manager can adjust tone, highlight key opportunities or risks specific to the client’s profile, and ensure that the report aligns with long-term strategies.

In this example, traditional automation ensures consistency, LLMs add personalized insights, and human oversight fine-tunes the result, ensuring a tailored and accurate communication for high-value clients.

An Evolving Framework

We have developed the following framework to help prioritize projects for automation and quickly assess ideas for automation. So far, we have provided an initial assessment of tasks suitable for LLM automation, along with insights into how to think about automation in general. As this series progresses, we will continue to expand the framework, adding more nuance around the costs and risks associated with automation and the various tools and techniques.

In addition, we have created a GPT prompt that can be used to assist in making an initial assessment of a hybrid approach to automation based on this scorecard. This prompt will evolve over time and will be available for contributions and updates on the CFA Institute Research and Policy Center GitHub page. We welcome your contributions and invite you to build upon this framework.

The scorecard shown in Exhibit 6 allows users to rank tasks based on different attributes. This scorecard can then be passed to GPT along with a prompt that describes the framework we have developed, allowing the model to recommend a hybrid approach for automation.

Exhibit 6. Scorecard for Initial Automation Assessment

Attribute	Rating (1-5)	Notes
Task Complexity		Is the task repetitive (1) or highly variable (5)?
Output Objectivity		Are the outputs objective (1) or subjective (5)?
Data Structure		Is the data structured (1) or unstructured (5)?
Risk Level		What is the potential risk of automation failure? Low (1), High (5)
Human Oversight Requirement		Does the task need human validation or sign-off? No (0) Yes (1)
Impact on Efficiency		How much time or effort can automation save? Little (1), a lot (5)

GPT Prompt

Using this framework developed for GenAI task automation: 
| Attribute | GenAI Automation | 
|-----------|------------------| 
| Data Type | Unstructured data (e.g., earnings transcripts, market news) | 
| Task Variability | Repetitive and Variable tasks (e.g., customizing strategies, client communications) | 
| Input Objectivity | Handles subjective, ambiguous inputs (e.g., free-form text) | 
| Output Objectivity | Stochastic, probabilistic outputs (e.g., personalized reports) | 
| Scalability | Scalable but with potential high computational cost | 
I would like to assess the following task: 
<DESCRIBE THE TASK> 
I have rated the task across various attributes using the following scorecard: 
<INPUT SCORECARD> 
Based on this scorecard, please provide a score for the overall GenAI automation suitability, a hybrid approach fit score, and the recommended hybrid approach, if appropriate, for automating this task, specifying where traditional automation, GenAI, and human intervention would be most effective. Consider the task’s variability, output objectivity, and potential risks in your recommendation

Example

Description: The task is to conduct performance attribution for a client’s overall portfolio, which includes multiple mutual funds. This involves breaking down the portfolio's performance to identify the contributions from stock selection and asset allocation. Additionally, it provides context by analyzing macroeconomic and sector events that may have influenced performance over the quarter.

Scorecard:

Attribute	Rating (1-5)	Notes
Task Complexity	4	Is the task repetitive (1) or highly variable (5)?
Output Objectivity	4	Are the outputs objective (1) or subjective (5)?
Data Structure	2	Is the data structured (1) or unstructured (5)?
Risk Level	2	What is the potential risk of automation failure? Low (1), High (5)
Human Oversight Requirement	1	Does the task need human validation or sign-off? No (0) Yes (1)
Impact on Efficiency	4	How much time or effort can automation save? Little (1), a lot (5)

Output (GPT4o):

# GenAI Automation Suitability Analysis: Performance Attribution Task 
## Scorecard Recap: 
- **Task Complexity**: 4/5 
- **Output Objectivity**: 4/5 
- **Data Structure**: 2/5 
- **Risk Level**: 2/5 
- **Human Oversight Requirement**: 1 
- **Impact on Efficiency**: 4/5 
--- 
## Overall GenAI Automation Suitability Score: 3.6/5 
This task has a reasonably high suitability for GenAI automation given its mix of structured and unstructured data, the ability to handle both objective and subjective insights, and the low risk of failure. GenAI can excel at breaking down portfolio performance by stock selection, asset allocation, and layering in macro/sector-level color. However, the variability of task complexity (requiring adaptability for different portfolio compositions) prevents full automation. 
## Hybrid Approach Fit Score: 4.25/5 
A hybrid approach is highly suitable because this task involves combining both structured data (returns, asset allocations) and unstructured data (macro and sector insights). GenAI is well-suited for extracting insights from unstructured data, while traditional automation can efficiently handle structured data analysis. Human oversight will be essential for final validation and client communication, especially due to the high degree of objectivity required in the output. 
--- 
## Recommended Hybrid Approach 
1. **Traditional Automation**: 
   - **Segmenting performance by stock selection or asset allocation**: This can be efficiently automated using rule-based software that handles structured data from portfolio reports and calculates contributions. 
   - **Generating structured data insights (e.g., sector performance)**: Traditional automation tools can process data from performance databases or attribution models to segment returns with high precision. 
2. **GenAI**: 
   - **Adding qualitative analysis based on macro and sector events**: GenAI can handle unstructured market news, earnings transcripts, and reports to contextualize performance attribution. It can provide insights on how external factors like economic conditions, geopolitical events, or sector trends may have influenced returns. 
   - **Customizing client communications**: GenAI can generate personalized, high-level reports that explain complex performance attribution in a client-friendly manner, adapting to specific client preferences or concerns. 
3. **Human Intervention**: 
   - **Final review and validation**: Given the importance of objective, high-stakes financial reporting, human oversight is needed to ensure accuracy, confirm GenAI's macroeconomic assessments, and finalize communications to clients. 
   - **Client interaction**: Handling subjective client feedback and clarifying complex performance questions should remain human-led. 
--- 
## Why This Approach Works: 
- **GenAI** is effective for handling unstructured data (e.g., interpreting market news) and generating reports with natural language, ensuring scalability and context-rich insights. 
- **Traditional automation** excels in tasks like number-crunching (e.g., portfolio performance) and minimizes computation time on repetitive processes. 
- **Human intervention** ensures accuracy and manages the risk of misinterpretation, especially where outputs are highly objective. 
By combining these elements, the task can achieve both efficiency and the necessary rigor in financial reporting.

Conclusion

We have explored the key attributes for the new age of automation powered by large generative AI models, aiming to develop a more intuitive understanding of what tasks are best suited for this advanced form of automation. Along the way, we introduced a scorecard to assist in the initial assessment of automation opportunities.

As this field rapidly evolves, the criteria for automating tasks may shift dramatically — especially if LLMs continue to advance and develop implicit reasoning abilities. Nonetheless, it is crucial that we begin experimenting with this technology now to fully understand its potential and limitations. This series serves as a starting point to build that intuition, and we invite others to join us in this journey, contributing their insights and discoveries as we collectively explore the future of automation.

References

CFA Institute. 2024. “Generative AI, Unstructured/Alt Data, and Open Source Survey.” CFA Institute (1 May).
Goldman Sachs. 2024. “Gen AI: Too Much Spend, Too Little Benefit?” Goldman Sachs Global Investment Research, Issue 129 (25 June). www.goldmansachs.com/insights/top-of-mind/gen-ai-too-much-spend-too-little-benefit.
Kahneman, D. 2011. Thinking, Fast and Slow. New York: Farrar, Straus and Giroux.
Kim, A., M. Muhn, and V. V. Nikolaev. 2024a. “Bloated Disclosures: Can ChatGPT Help Investors Process Information?” Chicago Booth Research Paper No. 23-07, University of Chicago Booth School of Business. DOI:10.2139/ssrn.4425527.
Kim, A., M. Muhn, and V. V. Nikolaev. 2024b. “From Transcripts to Insights: Uncovering Corporate Risks Using Generative AI.” Chicago Booth Research Paper No. 23-19, University of Chicago Booth School of Business. DOI:10.2139/ssrn.4593660
Levitt, K. 2024. “AI Takes Center Stage: Survey Reveals Financial Industry’s Top Trends for 2024.” NVIDIA. Retrieved 2 September 2024, from https://resources.nvidia.com/en-us-financial-services-industry/ai-financial-industry.
Morris, M. R., J. Sohl-Dickstein, N. Fiedel, T. Warkentin, A. Dafoe, A. Faust, C. Farabet, and S. Legg. 2024. “Levels of AGI for Operationalizing Progress on the Path to AGI.” Working paper, Cornell University (5 June). DOI:10.48550/arXiv.2311.02462.
Pokrass, M. 2024. “Introducing Structured Outputs in the API.” OpenAI (6 August). https://openai.com/index/introducing-structured-outputs-in-the-api/.
Wang, B., X. Yue, Y. Su, and H. Sun. 2024. “Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization.” Working paper, Ohio State University and Carnegie Mellon University. DOI:10.48550/arXiv.2405.15071.
Wei, J., X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, Chi, E., Le, Q., and D. Zhou. 2023. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” Working paper, Google Research, Brain Team (10 January). DOI:10.48550/arXiv.2201.11903.