How to use Google Sheets “Smart Fill” and AI formulas

I recently inherited a beast of a spreadsheet—over five thousand rows of customer data, pulled from three different legacy systems. The issue? No consistency, naturally. Customer names were sometimes “Doe, John,” other times “John Doe (Acme Corp),” and often just a string like “J. Doe – Senior Manager.” Shipping addresses were lumped into single cells, dates formatted in a dozen different ways. It was a nightmare for filtering and reporting.

My first thought was to bang out a complex `REGEXEXTRACT` or `TEXTSPLIT` formula for each column, but the patterns were too varied, and I knew I’d spend more time debugging regular expressions than actually cleaning the data. That’s when I decided to lean on some of the newer, more interactive features in Google Sheets: Smart Fill and, for a specific tricky bit, an AI formula.

Using Smart Fill for Pattern Recognition

Smart Fill is surprisingly effective for extracting or transforming data when a clear, albeit complex, pattern exists across a column. It saves you from crafting intricate formulas by simply observing your manual input and replicating it.

How I Used Smart Fill: Step-by-Step

  1. First, I opened the messy spreadsheet in Google Sheets. I picked a column where I needed to extract the actual customer name from a longer string like “John Doe (Acme Corp).”

  2. I inserted a new column immediately to the right of my source data. Let’s say my original data was in column A.

  3. In the new column (column B), starting with the first row of data, I manually typed out the desired output. For example, if A2 was “John Doe (Acme Corp),” I typed “John Doe” into B2. If A3 was “Jane Smith – Product Lead,” I typed “Jane Smith” into B3. I did this for about three to five rows to give Sheets a good set of examples.

  4. After entering my examples, I selected those cells I just filled (B2:B5 in my example).

  5. Then, I navigated to the menu: Data > Smart Fill > Fill.

  6. Sheets thought for a second, then populated the rest of column B based on the pattern it detected from my examples. It was remarkably accurate for 90% of the data.

I repeated this process for other columns—extracting just the city from a full address string, or standardizing date formats by showing it “DD-MM-YYYY” for a few rows. It’s not magic, but it’s a heck of a lot faster than writing custom regex for every variation.

Leveraging AI Formulas (like GENERATE_TEXT)

For a particular column of “customer notes,” which contained free-form text ranging from short phrases to small paragraphs, I needed to categorize them. This is where a more advanced AI capability came into play, specifically `GENERATE_TEXT`. Now, I’m usually wary of “AI” for production systems, but for this one-off categorization, it fit the bill.

My Approach to AI Formulas

Using `GENERATE_TEXT` requires having Google Workspace Labs features enabled. For me, that meant clicking through a prompt to activate it when I first tried to use an AI function. Your Google Workspace admin might need to enable it for your organization.

  1. I started with my column of messy customer notes (let’s say column C).

  2. In an adjacent empty column (column D), I wrote my prompt. For instance, in D2, I typed:

    =GENERATE_TEXT(C2, "Categorize this customer note into one of these categories: 'Billing', 'Technical Support', 'Feature Request', 'General Inquiry'. Only provide the category name.")

  3. I then dragged that formula down for the first 50 rows. I intentionally started small, given my previous experience with these kinds of functions.

  4. I reviewed the output carefully. If the categorization wasn’t quite right, I’d adjust the prompt slightly, making it more specific. For example, I might add: “If unsure, default to ‘General Inquiry’.”

  5. Once satisfied, I’d apply it in batches of a few hundred rows, periodically checking the results.

This allowed me to quickly sort thousands of unstructured notes into actionable categories, something that would have taken days to do manually or required a much more complex, fragile rules-based system.

Common mistakes

Over the years, and especially with these newer tools, I’ve seen a few recurring pitfalls. Here are some to watch out for:

  • Not enough examples for Smart Fill: People often try to use Smart Fill with just one manually entered example. While it sometimes works, it’s far more reliable with two, three, or even five diverse examples. This helps Sheets understand the pattern even if it has minor variations.

  • Inconsistent examples: If your manually entered examples for Smart Fill don’t follow a consistent internal logic, Sheets gets confused. If you’re trying to extract a first name, make sure all your examples *only* contain the first name, not sometimes the first name and other times a nickname.

  • Over-reliance on AI formulas for critical data: Remember, AI outputs can be non-deterministic and can “hallucinate.” For mission-critical data, always prefer deterministic formulas (`VLOOKUP`, `INDEX/MATCH`, `TEXTSPLIT`, `REGEXEXTRACT`) or manual review. I only use AI formulas for tasks where a minor inaccuracy is acceptable or where I’m performing a thorough human review afterward.

  • My own mistake with `GENERATE_TEXT`: The first time I played with `GENERATE_TEXT`, I treated it like any other formula. I typed it in, then immediately dragged it down for all 5,000 rows. It sat there, “Loading…” for an eternity, then eventually returned a cryptic “Error” or just timed out for most cells. I learned quickly that these AI functions make API calls, which have rate limits and latency. They don’t process instantly for large datasets like a simple sum. Now, I apply them in small batches (50-100 rows), let them complete, review, and then move on to the next batch. It’s slower, but it actually *works*.

  • Ignoring security and privacy: Be mindful of what data you’re feeding into any cloud-based AI service. For non-sensitive data or internal summaries, it’s generally fine within enterprise Workspace environments, but don’t just dump sensitive customer PII or proprietary secrets into a public AI model without understanding the implications.

  • Internet dependency: AI formulas need an active internet connection, whereas most standard Sheets formulas will function offline with the document cached.

These tools, when used thoughtfully and with an understanding of their limitations, can save a tremendous amount of manual data cleanup time.