About me

As a Data-Centric AI Engineer, I architect and ship end-to-end data pipelines and machine learning systems that power scalable, production-grade AI applications. I design robust ETL workflows using Spark, Dask, and Apache Airflow; build high-throughput ingestion layers on AWS EMR, Redshift, Snowflake and Databricks; and optimize data stores with Python multithreading, file-splitting and S3 external stages. Leveraging infrastructure as code (Terraform, CloudFormation), containerization (Docker/Kubernetes) and CI/CD (GitHub Actions, GitLab CI), I ensure reliable, maintainable data platforms that serve real-time analytics and batch ML workloads.

As a Full-Stack AI Engineer, I translate complex ML models into seamless user experiences—developing interactive front-ends in React.js and Tailwind CSS, and implementing REST/GraphQL APIs with FastAPI, Node.js/Express or Spring Boot. I integrate vector databases (FAISS, Pinecone) and RAG workflows via LangChain to build intelligent document-retrieval interfaces, and orchestrate multi-agent LLM pipelines with Streamlit and LangFlow. My end-to-end approach spans data modeling and feature stores through inference services and scalable deployment on Kubernetes, backed by observability (Prometheus, Grafana, ELK) and automated retraining loops.

With a foundation in both software engineering and MLOps, I thrive at the intersection of data engineering and AI product development—turning raw data into high-impact insights, and research prototypes into user-facing applications that drive measurable business value.

What I currently vibe with

  • design icon

    Applied Machine Learning

    Currently working on applied machine learning projects involving particle physics data.

  • Web development icon

    Full Stack and Data Engineering

    Keeping up with the latest trends in data engineering and web development and building scalable applications.

  • mobile app icon

    Conquering mountain peaks

    Whenever I get a chance, I go on a hike to conquer a mountain peak, and capture the beauty of nature.

  • camera icon

    Photography

    Besides work, I invest my time capturing nature and life at high-quality pixels.

Organizations that I've worked with

Resume

Education

  1. Delhi Technological University

    Pursued bachelors in Environmental Engineering with a minors degree in Computer Science Engineering from Delhi Technological University, one of the prestigious institutions in India established in 1941.

Experience

  1. Data Engineer, Upwork

    March 2025 – Present

    Built a flood prediction product using geospatial (point-in-polygon) operations with Spark, Java, Geotrellis, and AWS EMR Step Functions, boosting CPU utilization by 32%. Developed a scalable address interpolation algorithm with Python Dask and AWS Redshift, increasing address coverage by 44% through an automated two-pointer approach. Designed a centralized data repository and ingestion framework for 40 data products into Snowflake and Databricks using AWS Step Functions. Optimized Snowflake data uploads with Python multithreading, file splitting, and compression via AWS S3 external stages, enhancing speed by 74% and eliminating legacy cloud EMR costs.
    Tech stack: Spark, Java, Geotrellis, AWS EMR, Python Dask, AWS Redshift, Snowflake, Databricks, AWS

  2. Machine Learning Researcher, University of Alabama

    April 2025 – Present

    Designing and implementing a scalable sparse autoencoder pipeline for classifying and reconstructing particle collision events using minimally processed detector image data. Supervisor - Dr. Sergei V. Gleyzer
    Tech stack: Python, PyTorch, NumPy, MLflow, HPC, SLURM

  3. Open-Source Software Engineer, Probabl ai

    April 2025 – July 2025

    Contributing to Probabl.ai’s “skore” project to enhance data-science testing and visualization capabilities.
    Tech stack: Python, Pandas, NumPy, scikit-learn, pytest

  4. Data Analytics & DevOps Engineer, Nordblock

    September 2024 – February 2025

    Deployed monitoring infrastructure with Prometheus, Grafana, and InfluxDB for carbon-neutral bitcoin mining operations across Nordic regions, implementing LSTM autoencoder anomaly detection for power consumption patterns, reducing equipment downtime by 25%. Built ELK stack pipeline ingesting logs from AWS S3 into ElasticSearch with Kibana visualization for energy-to-heat conversion analysis and rapid troubleshooting. Orchestrated scalable analytics workflows using Apache Spark Structured Streaming to process Kafka data streams, with Airflow managing ETL job scheduling and automated model retraining pipelines via Kubernetes.
    Tech stack: Python, Linux, Docker, Kubernetes, Prometheus, Grafana, InfluxDB, ELK Stack, Apache Spark, Kafka, Airflow, AWS

  5. Technical Student, CERN (European Organization for Nuclear Research)

    June 2023 – July 2024

    Architected and maintained systems for a 6K-user global CMS experiment collaboration, optimizing publication, personnel, and institute management for reliability and scalability. Delivered on-demand user support and enhanced the iCMS-web ecosystem to streamline global workflows. Developed iCMS-teams, a web service to register new collaborators within the CMS experiment. Contributed to iCMS-statistics providing configurable analytics on CMS collaboration via tables, plots, and exportable data. Built microservices and web apps using Python (Django, Flask API), Java (Spring Boot), and Vue.js. Automated periodic database synchronization with CRON jobs using Bash and Python scripts. Optimized resource allocation for EOS storage and implemented CI/CD pipelines with GitLab and OpenShift for seamless deployments. Witnessed server migrations from CentOS 7 to AlmaLinux 9 post end-of-life.
    Tech stack: Python, Django, Flask, Java (Spring Boot), Javascript, Typescript, Vue.js, Databases (PostgreSQL, MariaDB, OracleDB), Linux, Docker, Bash, Shell scripting, Git, GitLab CI/CD, Helm, OpenShift

  6. Data Engineering Intern, Roni Analytics

    February 2023 – May 2023

    Contributed as part of the co-founding team of a startup, developing a crypto analysis tool incorporating real-time metrics from Ethereum-based layer 1 blockchains. Designed and optimized Spark SQL queries on Hadoop HDFS systems to process large-scale blockchain datasets for efficient analysis and backtesting workflows. Worked on system monitoring and backend integration, leveraging Docker and Kubernetes.
    Tech stack: Python, Docker, Apache Spark, Hadoop HDFS, Docker, Kubernetes

  7. Software Engineering Intern, CERN-HSF (Google Summer of Code)

    May 2022 – August 2022

    Developed and published an Etherpad plugin via NPM to enable collaborative file-sharing across the CS3 Science Mesh platform using Golang REST APIs.
    Tech stack: JavaScript, Node.js, Golang, Docker

  8. Software Engineering Intern, bcoin (Summer of Bitcoin)

    May 2022 – August 2022

    Adapted Bitcoin-Core compact block relay (BIP152) in the bcoin library; implemented end-to-end tests to raise coverage by 40%.
    Tech stack: JavaScript, Node.js, C++, Mocha

  9. Software Engineering Intern, Public Lab (Google Summer of Code)

    May 2021 – August 2021

    Updated a spectrometry data-analysis library for cross-browser WebRTC camera switching; added mapping integrations and boosted test coverage by 30%.
    Tech stack: JavaScript, Ruby on Rails, WebRTC, Leaflet.js, Cypress

  10. Software Engineering Intern, moja Global

    June 2021 – September 2021

    Prototyped a Vue.js dashboard interfacing with Flask APIs; containerized via Docker and managed infra with Terraform on GCP.
    Tech stack: Vue.js, Python, Flask, Docker, Terraform, GCP, AWS, Azure

My skills

  • Web & API Development
  • DevOps & Cloud
  • Machine Learning
  • Data Engineering

Portfolio

Blog

Contact

Contact Form (Please avoid spam)