In this tutorial, I’ll show you how to build an AI agent workflow that creates custom, company-level financial datasets — all without using stock data APIs or traditional scraping.
We’ll use agentic AI to gather unique, internet-based data points for public companies, using only a stock ticker as input. These signals can include:
- Executive departures
- Board composition
- Employee sentiment
- Number of open job postings
And you can define your own variables — whatever you believe moves stock prices.
Why This Matters
Web-search-enabled agents can generate custom, timely data that traditional datasets may not yet include. You can:
- Track CEO tenure trends across your watchlist
- Measure employee sentiment using Glassdoor
- Get real-time job openings as a proxy for growth
- Compare institutional ownership over time
Customize Based on Your Market Theory
You decide what data matters:
- Leadership changes?
- Hiring patterns?
- Board composition?
Use your hypothesis to define the fields you want the AI to extract from the web.
Tech Stack Used
- Python 3.11 (via Conda)
- OpenAI Agents SDK
- Perplexity Sonar (via OpenAI client wrapper)
- Pandas for tabular output
- WebSearchTool (included in
openai-agents
)
Environment Setup
Environment setup: Conda + pip install instructions for command prompt
# Create Conda environment
conda create -n ai-agent-finance python=3.11 -y
# Activate
conda activate ai-agent-finance
# Install dependencies
pip install openai-agents==0.0.11 pydantic==2.11.3 pandas==2.2.3
Create a Jupyter notebook in VS code
Add your API keys via environment variables:
#for chatgpt
export OPENAI_API_KEY=your_openai_key
#for perplexity
export PPLX_API_KEY=your_perplexity_key
1. OpenAI Models
OpenAI Agent: Create a dataset with GPT-4o-mini + web search
Step 1: Import necessary libraries and set up OpenAI API key
# import os
# os.environ["OPENAI_API_KEY"] = "OPEN AI API KEY GOES HERE"
from agents import Agent, Runner, WebSearchTool
from typing import List
from pydantic import BaseModel
import pandas as pd
Step 2: Define CompanyInfo schema
class CompanyInfo(BaseModel):
company_name: str
ticker: str
sector: str
founding_year: int
number_of_employees: int
ceo_tenure_years: float
ceo_count_since_2010: int
average_glassdoor_rating: float
institutional_ownership_pct: float
board_member_count: int
job_positions_open: int
Step 3: Instantiate WebSearchTool
# 1) Instantiate the search tool
web_search = WebSearchTool()
Step 4: Create the OpenAI Agent
# 2) Create the Agent
agent = Agent(
name="CompanyInfoAgent",
instructions="""
For a given U.S.-listed company ticker, use the WebSearchTool to find:
- Full company name
- Ticker symbol
- Sector/industry
- Year the company was founded
- Current total number of employees
- Current CEO’s tenure in years
- Number of different CEOs the company has had since January 1, 2010
- Average employee rating on Glassdoor
- Percentage of shares held by institutional investors
- Total number of board members
- Current Number of Job Positions Opened (globally)
Then return exactly the JSON matching the CompanyInfo schema.
""",
tools=[web_search],
output_type=CompanyInfo,
model='gpt-4o-mini',
)
Step 5: Loop over a list of company tickers
# 3) Loop over a list of tickers
tickers = [
"AAPL",
"MSFT",
"GOOGL",
"AMZN",
"TSLA"
]
all_company_data = []
for ticker in tickers:
info = await Runner.run(agent, ticker)
print(info.final_output)
all_company_data.append(info.final_output.model_dump())
Step 6: Create a Pandas DataFrame from the collected data
# 4) Create a Pandas DataFrame from the collected data
df = pd.DataFrame(all_company_data)
df
2. Perplexity Sonar Models
Perplexity Agent: Repeat the process using Sonar Pro
Step 1: Import necessary libraries and set up the Perplexity Sonar API key
# PPLX_API_KEY = "PERPLEXITY SONAR API KEY GOES HERE"
from agents import Agent, Runner, AsyncOpenAI, OpenAIChatCompletionsModel
from typing import List
from pydantic import BaseModel
import pandas as pd
Step 2: Define CompanyInfo schema
class CompanyInfo(BaseModel):
company_name: str
ticker: str
sector: str
founding_year: int
number_of_employees: int
ceo_tenure_years: float
ceo_count_since_2010: int
average_glassdoor_rating: float
institutional_ownership_pct: float
board_member_count: int
job_positions_open: int
Step 3: Set up the Perplexity client
# 1) Setup the perplexity client
perplexity_client = AsyncOpenAI(base_url="https://api.perplexity.ai", api_key=PPLX_API_KEY)
Step 4: Create the Perplexity Sonar Agent
# 2) Create the Agent
perplexity_agent = Agent(
name="CompanyInfoAgent_pplx",
instructions="""
For a given U.S.-listed company ticker, use the WebSearchTool to find:
- Full company name
- Ticker symbol
- Sector/industry
- Year the company was founded
- Current total number of employees
- Current CEO’s tenure in years
- Number of different CEOs the company has had since January 1, 2010
- Average employee rating on Glassdoor
- Percentage of shares held by institutional investors
- Total number of board members
- Current Number of Job Positions Opened (globally)
Then return exactly the JSON matching the CompanyInfo schema.
""",
output_type=CompanyInfo,
model=OpenAIChatCompletionsModel(
model="sonar-pro",
openai_client=perplexity_client) # perplexity client goes here
)
Step 5: Loop over a list of company tickers
# 3) Loop over five major plants
tickers = [
"AAPL",
"MSFT",
"GOOGL",
"AMZN",
"TSLA"
]
all_company_data = []
for ticker in tickers:
info = await Runner.run(perplexity_agent, ticker)
print(info.final_output)
all_company_data.append(info.final_output.model_dump())
Step 6: Create a Pandas DataFrame from the collected data
# 4) Create a Pandas DataFrame from the collected data
df_pplx = pd.DataFrame(all_company_data)
print("Perplexity Sonar Pro")
df_pplx
Comparison: OpenAI vs. Perplexity
Which agent produced more accurate or complete data?
- OpenAI GPT-4o-mini had better company summaries
- Perplexity Sonar found fresher job opening stats
- Glassdoor ratings varied slightly — worth cross-checking
How to Improve These Results
- Tune your agent instructions — clarity = better parsing
- Add fallback prompts if data is missing
- Retry logic for failed or partial results
- Merge with structured APIs (like Yahoo Finance)
🧠 Final Thoughts
The market hasn’t priced in these kinds of custom AI-built datasets yet.
If you’re early, you can build smarter signals from public data using cheap inference—before everyone else is doing the same.
This workflow is reproducible, extensible, and completely offline-compatible once set up.