Mutation testing represents a robust technique for assessing the adequacy of test suites. However, the challenge of equivalent mutants persists, hindering its widespread adoption in practical settings. These mutants, while syntactically distinct, are functionally identical to the original program, leading to inflated computational costs and inaccurate mutation scores. Traditional detection approaches often overlook ”contextual equivalence,” where mutants behave identically only under the broader constraints and domain-specific boundaries of the entire program. To address this, we introduce GEM-LLM, a hybrid framework that leverages large language models for semantic invariant inference and SMT solvers for formal equivalence verification. GEM-LLM employs inter-procedural slicing to provide large language models with detailed calling-context information, enabling the identification of domain-specific constraints that traditional tools miss. Our evaluation on 10 projects from the Defects4J benchmark demonstrates that GEM-LLM classifies 25%–30% of overlooked surviving mutants as equivalent, achieving a precision of 98%. A detailed breakdown reveals superior performance on relational and boundary operators, with detection rates up to 34.2%. By shifting from local to global analysis, GEM-LLM offers a scalable and formally rigorous solution to the equivalent mutant problem, enhancing the accuracy and reliability of mutation testing in large-scale enterprise software systems.