Converting spoken clinical dictation into structured electronic health record (EHR) entries poses significant challenges in low-resource languages. These difficulties are primarily driven by phonetic complexity and the prevalent phenomenon of code-switching with English medical terminology, both of which severely degrade the performance of automatic speech recognition (ASR) models. This paper presents a practical hybrid framework that combines ASR, Persian-specific normalization of code-switched and orthographically ambiguous terms, metadata-constrained field mapping, and uncertainty-triggered human review for structured EHR entry. Leveraging large language model (LLM) prompting and specialized bilingual glossaries, this module normalizes code-switched terms and disambiguates Persian-specific homophones. The proposed method employs embedding-based field mapping guided by form metadata and existing clinical annotations, supplemented by selective human-in-the-loop validation for high-stakes fields. Evaluation within the BioArc EHR system demonstrates substantial improvements in transcription quality and structured data accuracy. Experimental results and granular ablation studies confirm that explicitly addressing linguistic challenges such as code-switching and homophone variability is critical for improving field detection rates and supporting data integrity. Our findings underscore that while the framework serves as a robust assistive tool–achieving high field detection accuracy in controlled evaluations–a hybrid workflow remains essential to address residual error rates (approximately 5–10% in complex forms) and support the precision required for medical documentation.