Guest reviews are the richest source of operational intelligence most hospitality properties collect — and one of the least efficiently used. The standard approach to review analytics, whether manual reading or simple sentiment scoring, systematically loses the most actionable signal: not whether guests were satisfied, but which specific aspects of the experience fell short and why. This paper from the University of the Aegean proposes a framework that addresses that gap directly, using a combination of BERT and large language models to move from raw review text to structured, aspect-level recommendations.
The architecture works in two layers. BERT handles the initial classification pass — fast, consistent, and effective for high-volume processing of structured categories. But the researchers identify a key limitation in BERT that any operator relying on automated review scoring should understand: supervised models trained on star ratings tend to inherit those ratings' ambiguities. A guest who gives four stars but embeds a substantive accessibility complaint in their review text will often be classified as Positive by BERT, which over-weights the metadata signal. The Gemini layer corrects for this by applying generative reasoning to the full review context, correctly identifying the negative sentiment thread even within an overall positive rating. The practical consequence is that dual-model systems catch a category of complaint — the qualified positive, the buried criticism — that single-model approaches routinely miss.
The aspect-based component is where the framework delivers its most direct operational value. Rather than outputting a sentiment score, the system extracts domain-specific dimensions: accommodation quality, service responsiveness, cultural experience, pricing perception, accessibility. Each is scored independently, and the LLM layer generates plain-language recommendations keyed to the specific aspects showing negative sentiment. A hotel or attraction manager receives not just “guests are less satisfied this month” but “accessibility is the primary driver of negative reviews in the past 90 days, with repeated mentions of signage inadequacy and physical access to the upper floors.”
The case study applies the framework to TripAdvisor reviews of the Archaeological Site of Mystras, a UNESCO World Heritage Site in Greece, making this one of the few sentiment analysis papers in tourism to demonstrate end-to-end results on a real destination dataset. The visualization dashboard the researchers built surfaces patterns across time, aspect categories, and sentiment severity — the kind of interface a revenue or operations manager could use in a weekly review meeting without requiring data science expertise.
For hotel operators, the key design decisions in this framework translate directly. The prompt sensitivity limitation the authors acknowledge — LLM outputs can vary based on how questions are framed — is a real constraint for teams without technical staff to manage and tune prompts. But the direction is right: aspect-based sentiment analysis with LLM-generated recommendations is substantially more operationally useful than net sentiment scores, and the gap between what leading properties can extract from reviews and what average properties do with the same data is growing. Properties that move toward structured, aspect-level review intelligence now are building a feedback loop that compounds — better-identified problems, faster fixes, improved scores, more reviews, better data — while those still relying on star averages are working from a much noisier signal.