Data Cleaning & Automating Search Relevancy
AI/ML
This project focused on transforming fragmented data and automating data pipelines to enhance search relevancy evaluations for a Retrieval and Relevancy team. It culminaated in an advanced analysis solution using Large Language Models to derive deep user insights based on personas of the different customer types.






Problem & Solution
The Challenge
The team faced significant hurdles with messy, fragmented data spread across numerous queries, alongside manual and repetitive data retrieval processes. This lack of a unified "golden dataset" consistently slowed down crucial analysis and decision-making for search relevancy.
My Solution
The project team consisting of my co-intern and myself implemented a robust solution to clean, segment, and enrich existing data with key performance indicators (KPIs) such as clicks, revenue, and conversions. A key component was automating the data pipeline using Apache Airflow, ensuring the consistent and reproducible refresh of a unified "golden dataset".
Project Info
Duration: 3 Months
Role: Software Engineer
Technologies Used
Key Features
Automated Data Pipeline Management
Integrated KPI Metrics
Unified Golden Dataset Creation
Streamlined Data Transformation
LLM-Powered Persona Analysis
What I Learned
Hands-on Data Pipeline Automation
Data Integration & Transformation Techniques
Adaptability & Problem-solving in dynamic environments
Prompt engineering for LLM-based evaluations