
Large-scale tracking studies, from NPS and post-purchase trackers to brand perception surveys, can gather thousands or even millions of responses over time. At their best, they become the recipe books for customer understanding: what customers keep coming back to, what leaves a bad taste, and which patterns the business should act on.
At enterprise scale, that recipe book gets super tricky to maintain. Every wave adds new text comments, markets, languages, and stakeholder questions. The topic collection that worked so well just a few months ago can start hiding what matters today: a new service issue buried under a legacy topic, a regional pattern flattened by global reporting, a niche complaint that never becomes visible until it has already reached the board deck.
Corporate organisations love stable trackers, yet this remains one of the biggest challenges in the field of CX insights. To capture the right topics and be responsive to customers' challenges, we need to continuously improve our approach without creating too much fluctuation across the organisation."
Digital Product Leader Global CX Platform @ IKEA
NPS, CSAT, post-purchase scores, and brand tracking metrics all show movement. They tell you if something improved, dropped, or stayed flat. For CX and brand health monitoring, that movement is the finished plate. The open-ended feedback shows what happened behind the scenes.
A score might show that satisfaction dropped after a flight, a purchase, a store visit, or a cruise. The text comments can show if customers were frustrated by delays, unclear communication, staff behavior, product availability, cleanliness, food quality, or something much more specific.
Scores show that something changed. Open-ended responses show why. They capture insights outside the scope of quantitative questions, helping teams spot emerging topics, sharper pain points, and customer language that fixed rating scales can miss.
Lufthansa is a strong example of this. Their feedback analysis combines NPS with open-ended questions across ground operations, in-flight service, and brand perception. With Caplena, the Lufthansa team analyze over 600,000 pieces of feedback annually across more than 600 detailed topics. These insights feed into an internal Insights Hub used by more than 1,000 employees.
At that scale, broad topics only get you so far. A topic like “Punctuality” can show that delays are affecting the customer experience. But the action often sits one layer deeper: were customers frustrated by the delay itself, or because they did not feel informed while they waited? That second version gives the team a clearer problem to solve.
Large trackers have to stay comparable across survey cycles while still reflecting how customers talk about their experience now. That’s where the recipe book needs careful editing.
As the tracker matures, the recipe book starts needing careful upkeep. New issues appear, some topics become too broad, and others need merging, splitting, renaming, or clearer descriptions. Add market-specific language and more precise stakeholder questions, and every change has to improve the menu without making yesterday’s reporting impossible to explain.
IKEA shows what this looks like at scale.
The company analyzes over 3 million feedback comments per month across 40+ languages, with 16,000 employees using Caplena's capabilities worldwide. Before Caplena, IKEA used a lexicon-based method that was labor-intensive and inconsistent across languages. A lot of feedback also landed in a broad “General” category, which hid specific issues such as food quality or wait times.
AIDA Cruises faced a similar challenge.
The team moved from sample-based analysis to analyzing over 1.7 million guest comments with confidence, using Caplena alongside Qualtrics. Open-ended NPS follow-up questions help them explain score changes, identify improvement areas, and answer operational questions from onboard services, entertainment, and food and beverage teams.
Customer feedback changes constantly. Large trackers need a safe place to test the recipe before changing the menu everyone uses.
The cleanest way to manage large trackers is to separate the live reporting environment from the experimentation space. In Caplena, that means working with a production project and a development project connected through a learning relationship.
This is the setup CX teams like IKEA's have built.
The topics that matter to customers today may be different tomorrow. To address this, we introduced a test environment for learning and optimisation before launching changes into production. This allows us to monitor quality and assess the impact on reporting before making updates available to all users."
Digital Product Leader Global CX Platform @ IKEA
| Production project | Development project | |
| Kitchen role | Live kitchen | Test kitchen |
| What it holds | Full tracker: all rows, survey cycles, reports, alerts, and ongoing analysis | Representative sample, for example 20,000 rows |
| What it’s for | Stable, stakeholder-facing reporting | Testing topic changes before they affect live reporting |
| What happens there | Teams monitor trends, compare markets, and share insights across the business | Teams test new topics, merge or split topics, adjust topic descriptions, and fine-tune AI topic assignment |
| Rule of thumb | Keep it stable | Use it to experiment safely |
The learning relationship makes the setup work. Production surfaces real changes in the tracker. Development gives the team space to explore those changes, test them against the existing topic collection, and validate them. Once the update is ready, the learning relationship lets the team push one clean version back into production without rebuilding the analysis from scratch.
For large-scale CX trackers, your development project is the cookbook: a small, representative sample where you test new topics, merges, splits, and descriptions. Once the recipe works, you apply one validated update to the full production project. It’s faster, more efficient, and gives you a clean audit trail.
Co-Founder & Co-CEO
The kitchen roles are useful here. The human analyst is the chef, orchestrating the analysis and deciding what makes it onto the menu. Insight Agent acts like the sous-chef, helping investigate the ingredients, meaning the data, and the recipes, meaning the topics. LLM topic generation is closer to the line cook, preparing candidate topics before they are served to production.
AI helps the kitchen move faster. But humans still own the recipe.
Once the setup is in place, new patterns get a clear path from discovery to validation to production. That path is important because large trackers feed many decisions. A small topic change can affect trend lines, reports, alerts, and ultimately stakeholder trust.
This workflow keeps experimentation active without turning the live tracker into a test kitchen.
Treat the production project as your one source of truth. It holds the full dataset, recurring survey cycles, stakeholder-facing reports, and live alerts. This is where teams monitor performance, compare markets, track topics over time, and share insights across the business.
Production should reflect validated analysis. New ideas are welcome, of course, but they need to go through the right process before they affect live reporting.
As new survey cycles arrive, alerts and reports help teams identify emerging issues. This could be a topic crossing a threshold, a sudden change in sentiment, or a recurring theme becoming more common in one market, journey, or customer segment.
These early signals matter because tracker changes often start quietly. A local service issue, a new complaint type, or a shift in customer expectations may already be visible in the open ends before the overall score moves.
Before adding a new topic, analysts should check if the pattern is real, specific, and worth tracking. Insight Agent and LLM topic generation help here. Analysts can ask follow-up questions, inspect niche patterns, and see whether a new issue deserves a permanent place in the tracker.
Exploration helps the team move from broad topics into the smaller moments that explain the experience. It helps analysts decide whether a new issue belongs in the recipe book, or whether it is just a one-cycle special. Not every interesting signal needs a permanent place on the menu.
We have around 600 codes, but we can’t capture every nitty-gritty topic, especially when we’re constantly testing new things across airlines. With Insight Agent, we can quickly explore niche questions, like whether delayed passengers felt informed while they waited.
Principal Customer Insights Analyst
Once a new pattern looks promising, move it into the development project. This is where the team checks whether the new topic still works within the existing topic collection.
Does it overlap with another topic?
Is it too narrow? Is it too broad?
Does it work across languages?
Does it help a stakeholder make a better decision?
Will it still make sense three survey cycles from now?
The development project gives analysts space to test without creating noise for everyone using the live reports.
In the development project, the team reviews topic assignments, corrects edge cases, adjusts topic labels, improves topic descriptions, and merges or splits topics where needed. This is the human-in-the-loop quality control stage.
AI Quality Score helps the team understand whether the AI is assigning topics well enough to go live. Topic generation can suggest overlapping, rare, or emerging topics, but the team decides what belongs in the final structure. The update should only move to production once the categorization has been battle-tested.
That quality check becomes especially important when the tracker supports many teams, markets, and operational decisions. FlixBus analyzes more than 800,000 open-ended post-ride survey responses per year across 45 countries, using the results alongside NPS to guide decisions across operations, country management, network planning, and leadership. At that scale, topic assignment needs to be precise enough to support real improvements, from punctuality to delay communication.
That’s the sweet spot: AI helps manage the scale, but the insights team keeps control over the recipe that guides business decisions.
Once the topic change has been tested and refined, push one consolidated update into production. The live tracker receives deliberate improvements the team can explain now and in the future, rather than many small edits that are hard to trace.
This is crucial for auditability. The development project shows how the recipe was tested. The production project shows which version made it onto the menu. So when someone asks, “Why did this topic change in Q3?”, the answer is clear. No mystery ingredient. No digging through old prep notes. No “I think Martin changed the recipe before he went on holiday.”
A production+development workflow gives large trackers stable reporting and active learning. Analysts can keep improving the topic collection without putting stakeholder-facing reports at risk.
That makes tracker governance a business issue, not just an analysis setup. When new customer signals stay visible, teams can act faster, prioritize better, and keep CX decisions grounded in what customers actually say. And the data supports this business-case, too: Forrester found that customer-focused companies report 41% faster revenue growth, 49% faster profit growth, and 51% better customer retention than others.
The workflow also keeps experimentation cost efficient. Many AI solutions are credit- or token-based, so testing every change on the full tracker can get expensive fast. A representative sample lets teams experiment quickly, battle-test the update, and apply only the proven version to the full dataset.
That is how large CX and brand trackers stay useful: production keeps reporting stable, development gives analysts room to refine, and the learning relationship moves validated changes back into the live tracker. The result is a tracker the business can trust, and one that keeps up with what customers are really saying. The best teams keep service running while the recipe gets better. That’s how large trackers keep serving fresh insights without burning down the kitchen.
Do you want to see how this would work for your tracker setup?
Book a call to request a complimentary proof of concept, or explore the Caplena Product Tour