How to process datasets one record at a time when each row needs judgment.
Suppose you have 300 article abstracts and need to decide, one by one, which ones belong in a literature review. Or suppose you have a spreadsheet of interlibrary loan requests with missing identifiers, partial titles, and conflicting editions, so each record calls for a different search strategy. In both cases the difficulty is at the row level. Each record needs its own judgment, and no single rule covers them all.
A short script can send the rows through one at a time, carrying your criteria along and writing each decision back out as it goes. If a task can be captured as a clean if/then rule, an ordinary script will usually do the job faster. Use this pattern when each row needs a small judgment call and you want that call recorded as the script runs.
In practice, the workflow usually takes the form of a small script that loops through the rows and sends each one, along with your rules, to a language model (the underlying AI that interprets text and produces responses). That is all the word script means in this chapter, a short program the coding tool writes for you that reads one row, sends it with your instructions, saves the answer, and moves on. You do not need to hand-write that loop yourself. What you do need is a clear description of the judgment each row requires, what the rules are, and what columns the output should contain. Give the tool that, and it can build the script and record the reasoning as it runs.
Describe your task in language close to the work itself. Someone unfamiliar with the project should be able to pick up your prompt and apply it consistently. Name the criteria, specify the output columns, and say what counts as an edge case worth flagging.
"I have a CSV (comma-separated values) file of interlibrary loan requests. For each row, look up the item in WorldCat, choose the best search strategy based on the fields available, check which library consortia hold it, and write the result plus a short reasoning note back to the output file. Process the rows one at a time."
"I have a CSV of article records exported from a database search. Each row has a title, authors, year, journal, and abstract. Screen each abstract against my inclusion and exclusion criteria, decide whether to include or exclude it, tag the included ones by sub-topic, and write a short reasoning note for each decision. Process the rows one at a time."
The more specific you are about your input columns, the reasoning each row requires, and the shape of the output you want (new columns, separate file, particular format), the fewer rounds of revision you will need.
For large row-by-row jobs, a faster general-purpose model is usually the right starting point, since each row is a separate call and processing hundreds of them accumulates costs or uses a significant share of a subscription's daily capacity. If the reasoning turns out to be unusually subtle, you can always test a small batch with a more capable model and compare.
The main work is defining how the model should reason about each row. What counts as a match? What matters most? What should happen when evidence conflicts, and what needs to be written down for review?
Start with basic rules and run a small test batch of 5-10 rows. Read the reasoning column and look for places where the model's answer is vague or hedging. Those are the spots where the rules need more detail.
If you save both the final version of your prompt and the reasoning output, you have enough to reproduce or dispute any individual decision later. Criteria that feel clear when you write them often need revision once you see how the model applies them. With a paper trail, you can trace where and why a decision went wrong.
A library had 164 interlibrary loan requests for Spanish-language materials that staff could not easily locate. ISBNs were often missing, titles were partial, and several editions might exist for a single work. No single search strategy would cover all the records, so each row needed its own approach.
Availability checking followed a priority cascade:
The governing rule was stop as soon as you find availability, which prevented needless international searches when a book was already available at a nearby CUNY library.
For each row, the model documented its decision process in the Agent_Reasoning column:
"SEARCH: No ISBN/DOI provided. Used title phrase search ti:"Accesorios de la moda", found 2 records. Selected OCLC 773434443 (PrintBook, exact title match with full subtitle). Verified author 'Max von Boehn' matches input 'Boehn,' and publisher 'Salvat' from Barcelona matches input. TIERS: T1-No DOI provided, skipped OA check. T2-ZGM (the requesting library's WorldCat code) digital NOT FOUND. T3-ZGM physical NOT FOUND. T4-FOUND 1 institution: UVV (Fashion Institute of Technology Library). STOPPED at T4 per cascade rules."
"SEARCH: Cleaned title 'Aire frío' (dropped trailing slash). Used search query ti:"Aire frío" Piñera, found 2 records. Selected OCLC 1078932491 (PrintBook, 'Aire Frío de Virgilio Piñera' published 1990). Other record was a critical essay about the play. It did not match the requested work."
A researcher had exported 312 article records from a database search on social media use and academic performance among college students. Many records were only tangentially related, so each abstract needed to be screened against inclusion criteria, classified, and tagged by sub-topic before the review could proceed.
Inclusion and exclusion criteria were defined up front:
A few representative rows from the reasoning column:
"INCLUDE. Abstract describes a survey of 847 undergraduates measuring daily Instagram/TikTok usage and correlating with self-reported GPA and assignment completion rates. Population: college students (confirmed). Academic measure: GPA and assignment completion (confirmed). Sub-topics: distraction/time-displacement (abstract mentions 'time spent on social media during study sessions'), self-regulation ('students who set app timers reported higher GPAs'). Methodology: cross-sectional survey. Confidence: HIGH."
"EXCLUDE. Reason: No academic performance measure. Abstract discusses social media's effect on body image and self-esteem among college women. While the population is correct (college students), the outcomes measured center on psychological well-being. Academic performance is outside the study's measures. The word 'performance' appears once but refers to 'social performance' (self-presentation). It does not refer to academic outcomes. Confidence: HIGH."
"INCLUDE (FLAG FOR REVIEW). Abstract describes a study of 'students aged 16-22,' which spans both high school and college populations, and results are not reported separately by age group. The academic measure (exam scores) qualifies. I'm including it because college-age students are part of the sample, but flagging because the mixed-age results may not be cleanly separable. Sub-topics: distraction/time-displacement. Methodology: longitudinal (2 semesters). Confidence: MEDIUM."
The same pattern works for any task where each row needs interpretation. Metadata enrichment is one example. You look items up across databases when identifiers are missing, choose the best match, and document the search strategy for each record. Qualitative coding works similarly. When you categorize interview transcripts or open-ended survey responses against a codebook (a structured set of categories with definitions and rules for applying them), each response needs its own justification. The reasoning column is the audit trail. The same approach also works for document classification and for entity resolution, which means deciding whether two near-matching names or identifiers refer to the same person, organization, or work.
This pattern only makes sense when each row needs interpretation. For tasks that fit cleanly into rules, a formula or keyword filter run through an ordinary script will always be faster and cheaper. For anything in between, run a small test batch first. Review the reasoning output, check whether the judgments hold up, and decide whether to do the full run.
The Working with Tabular Data chapter covers broader strategies for loading, cleaning, and analyzing spreadsheets and CSVs, including cases where the work is more about transformation than judgment.