Agentic Data Workflows

Row-by-Row Dataset Processing

How to process datasets one record at a time when each row needs judgment.

When This Pattern Helps

Suppose you have 300 article abstracts and need to decide, one by one, which ones belong in a literature review. Or suppose you have a spreadsheet of interlibrary loan requests with missing identifiers, partial titles, and conflicting editions, so each record calls for a different search strategy. In both cases the difficulty is at the row level. Each record needs its own judgment, and no single rule covers them all.

A short script can send the rows through one at a time, carrying your criteria along and writing each decision back out as it goes. If a task can be captured as a clean if/then rule, an ordinary script will usually do the job faster. Use this pattern when each row needs a small judgment call and you want that call recorded as the script runs.

Your Data
CSV or spreadsheet

→

Agent
interprets each row

→

Output
enriched data + reasoning

How It Works

In practice, the workflow usually takes the form of a small script that loops through the rows and sends each one, along with your rules, to a language model (the underlying AI that interprets text and produces responses). That is all the word script means in this chapter, a short program the coding tool writes for you that reads one row, sends it with your instructions, saves the answer, and moves on. You do not need to hand-write that loop yourself. What you do need is a clear description of the judgment each row requires, what the rules are, and what columns the output should contain. Give the tool that, and it can build the script and record the reasoning as it runs.

Describing the Workflow

Describe your task in language close to the work itself. Someone unfamiliar with the project should be able to pick up your prompt and apply it consistently. Name the criteria, specify the output columns, and say what counts as an edge case worth flagging.

Example: Library metadata

"I have a CSV (comma-separated values) file of interlibrary loan requests. For each row, look up the item in WorldCat, choose the best search strategy based on the fields available, check which library consortia hold it, and write the result plus a short reasoning note back to the output file. Process the rows one at a time."

Example: Research screening

"I have a CSV of article records exported from a database search. Each row has a title, authors, year, journal, and abstract. Screen each abstract against my inclusion and exclusion criteria, decide whether to include or exclude it, tag the included ones by sub-topic, and write a short reasoning note for each decision. Process the rows one at a time."

The more specific you are about your input columns, the reasoning each row requires, and the shape of the output you want (new columns, separate file, particular format), the fewer rounds of revision you will need.

Tip: Model choice and cost

For large row-by-row jobs, a faster general-purpose model is usually the right starting point, since each row is a separate call and processing hundreds of them accumulates costs or uses a significant share of a subscription's daily capacity. If the reasoning turns out to be unusually subtle, you can always test a small batch with a more capable model and compare.

Designing the Decision Rules

The main work is defining how the model should reason about each row. What counts as a match? What matters most? What should happen when evidence conflicts, and what needs to be written down for review?

Strategy or criteria: What approach to use for each row. This might be a search strategy ("If ISBN is present, use that; otherwise search by title and author") or a classification scheme ("Include if the study measures academic outcomes among college students").
Verification and definitions: How to confirm a decision is correct, and what your terms mean in practice. "Verify the author name matches, accounting for variations." "Code as 'stigma' if the response mentions embarrassment, fear of judgment, or reluctance to seek help."
Priority or sequencing: The order in which to check things. "Check Open Access first, then local holdings, then partner libraries." Without a stopping rule, the model may keep searching long after it has found a usable result.
Edge cases: What to do when things are ambiguous. "If the title appears in multiple editions, prefer the most recent." "If a response could fit multiple categories, assign all that apply."
Finally, specify what reasoning to record so you can audit the results. "Always explain which search strategy you used and why." "Quote the specific phrase that led to your classification." This column is also useful at the test-batch stage, since it shows you where the rules need more detail.

Start with a narrow rule set

Start with basic rules and run a small test batch of 5-10 rows. Read the reasoning column and look for places where the model's answer is vague or hedging. Those are the spots where the rules need more detail.

Tip: Reproducibility

If you save both the final version of your prompt and the reasoning output, you have enough to reproduce or dispute any individual decision later. Criteria that feel clear when you write them often need revision once you see how the model applies them. With a paper trail, you can trace where and why a decision went wrong.

Case Study: Interlibrary Loan Requests

The Task

A library had 164 interlibrary loan requests for Spanish-language materials that staff could not easily locate. ISBNs were often missing, titles were partial, and several editions might exist for a single work. No single search strategy would cover all the records, so each row needed its own approach.

The Decision Rules

Availability checking followed a priority cascade:

Tier 1: Check for Open Access (free, immediate)
Tier 2: Check local digital holdings (fastest delivery)
Tier 3: Check local physical holdings
Tier 4: Check partner libraries in METRO consortium
Tier 5: Check CUNY system
Tier 6: Check SUNY system
Tier 7: Check broader US holdings
Tier 8: Check international holdings

The governing rule was stop as soon as you find availability, which prevented needless international searches when a book was already available at a nearby CUNY library.

Input: 13 columns

Transaction Date
Transaction Number
Language
Loan Title
Loan Author
Loan Place
Loan Publisher
Loan Date
Loan Edition
ISSN
Accept Alternate Edition
Library Use Only
Cited In

Output: 34 columns

All original columns, plus:

SUNY_Holdings
LC_Classification
Dewey_Classification
Subjects
Best_Availability_Tier
Tier_Number
Tier_Details
OA_Status
OA_URL
GC_Digital / GC_Physical
Local_Partners_Holdings
CUNY_Holdings
IDS_Holdings
US_Holdings_Count
International_Holdings_Count
WorldCat_OCLC
WorldCat_Format
Agent_Reasoning
Processing_Date

Sample Reasoning Output

For each row, the model documented its decision process in the Agent_Reasoning column:

When no ISBN was available:

"SEARCH: No ISBN/DOI provided. Used title phrase search ti:"Accesorios de la moda", found 2 records. Selected OCLC 773434443 (PrintBook, exact title match with full subtitle). Verified author 'Max von Boehn' matches input 'Boehn,' and publisher 'Salvat' from Barcelona matches input. TIERS: T1-No DOI provided, skipped OA check. T2-ZGM (the requesting library's WorldCat code) digital NOT FOUND. T3-ZGM physical NOT FOUND. T4-FOUND 1 institution: UVV (Fashion Institute of Technology Library). STOPPED at T4 per cascade rules."

When choosing between similar records:

"SEARCH: Cleaned title 'Aire frío' (dropped trailing slash). Used search query ti:"Aire frío" Piñera, found 2 records. Selected OCLC 1078932491 (PrintBook, 'Aire Frío de Virgilio Piñera' published 1990). Other record was a critical essay about the play. It did not match the requested work."

Case Study: Screening Abstracts for a Literature Review

The Task

A researcher had exported 312 article records from a database search on social media use and academic performance among college students. Many records were only tangentially related, so each abstract needed to be screened against inclusion criteria, classified, and tagged by sub-topic before the review could proceed.

The Decision Rules

Inclusion and exclusion criteria were defined up front:

Include if the study examines a relationship between social media use and academic outcomes (GPA, test scores, course performance, study habits) among college or university students
Exclude if: K-12 population only, no academic performance measure, purely about social media design/marketing, non-empirical (opinion pieces, editorials)
Tag sub-topics: distraction/time-displacement, collaborative learning, information sharing, mental health mediator, self-regulation
Flag for human review if the abstract is ambiguous or the study spans multiple populations

Input: 5 columns

Title
Authors
Year
Journal
Abstract

Output: 11 columns

All original columns, plus:

Decision
Exclusion_Reason
Sub_Topics
Population
Methodology
Reasoning

Sample Reasoning Output

A few representative rows from the reasoning column:

A clear include:

"INCLUDE. Abstract describes a survey of 847 undergraduates measuring daily Instagram/TikTok usage and correlating with self-reported GPA and assignment completion rates. Population: college students (confirmed). Academic measure: GPA and assignment completion (confirmed). Sub-topics: distraction/time-displacement (abstract mentions 'time spent on social media during study sessions'), self-regulation ('students who set app timers reported higher GPAs'). Methodology: cross-sectional survey. Confidence: HIGH."

A clear exclude:

"EXCLUDE. Reason: No academic performance measure. Abstract discusses social media's effect on body image and self-esteem among college women. While the population is correct (college students), the outcomes measured center on psychological well-being. Academic performance is outside the study's measures. The word 'performance' appears once but refers to 'social performance' (self-presentation). It does not refer to academic outcomes. Confidence: HIGH."

An edge case flagged for review:

"INCLUDE (FLAG FOR REVIEW). Abstract describes a study of 'students aged 16-22,' which spans both high school and college populations, and results are not reported separately by age group. The academic measure (exam scores) qualifies. I'm including it because college-age students are part of the sample, but flagging because the mixed-age results may not be cleanly separable. Sub-topics: distraction/time-displacement. Methodology: longitudinal (2 semesters). Confidence: MEDIUM."

Other Use Cases

The same pattern works for any task where each row needs interpretation. Metadata enrichment is one example. You look items up across databases when identifiers are missing, choose the best match, and document the search strategy for each record. Qualitative coding works similarly. When you categorize interview transcripts or open-ended survey responses against a codebook (a structured set of categories with definitions and rules for applying them), each response needs its own justification. The reasoning column is the audit trail. The same approach also works for document classification and for entity resolution, which means deciding whether two near-matching names or identifiers refer to the same person, organization, or work.

When to Use This Pattern

This pattern only makes sense when each row needs interpretation. For tasks that fit cleanly into rules, a formula or keyword filter run through an ordinary script will always be faster and cheaper. For anything in between, run a small test batch first. Review the reasoning output, check whether the judgments hold up, and decide whether to do the full run.

The Working with Tabular Data chapter covers broader strategies for loading, cleaning, and analyzing spreadsheets and CSVs, including cases where the work is more about transformation than judgment.