Data Cleaning & Automating Search Relevancy

AI/ML

This project focused on transforming fragmented data and automating data pipelines to enhance search relevancy evaluations for a Retrieval and Relevancy team. It culminaated in an advanced analysis solution using Large Language Models to derive deep user insights based on personas of the different customer types.

Problem & Solution

The Challenge

The team faced significant hurdles with messy, fragmented data spread across numerous queries, alongside manual and repetitive data retrieval processes. This lack of a unified "golden dataset" consistently slowed down crucial analysis and decision-making for search relevancy.

My Solution

The project team consisting of my co-intern and myself implemented a robust solution to clean, segment, and enrich existing data with key performance indicators (KPIs) such as clicks, revenue, and conversions. A key component was automating the data pipeline using Apache Airflow, ensuring the consistent and reproducible refresh of a unified "golden dataset".

Project Info

Duration: 3 Months

Role: Software Engineer

Technologies Used

Key Features

  • Automated Data Pipeline Management

  • Integrated KPI Metrics

  • Unified Golden Dataset Creation

  • Streamlined Data Transformation

  • LLM-Powered Persona Analysis

What I Learned

  • Hands-on Data Pipeline Automation

  • Data Integration & Transformation Techniques

  • Adaptability & Problem-solving in dynamic environments

  • Prompt engineering for LLM-based evaluations