Hello, I'm Harry Nguyen

AI/Data Engineer

Specializing in Machine Learning, NLP, Microsoft Fabric tool

About Me

Harry Nguyen

Hello! I'm Quan (Harry) Nguyen, I had graduated from University of Information Technology - VNUHCM with a strong academic background in Data Science and Information Technology. My research focuses on visual question answering, particularly for Vietnamese text in images. I've contributed to creating benchmark datasets and developing state-of-the-art models in this domain.

In addition to my AI expertise, I have solid experience as a Data Engineer, designing and implementing scalable data pipelines, building data warehousing solutions, and working with various big data technologies. I'm passionate about creating end-to-end solutions that bridge the gap between data infrastructure and AI applications.

B.Sc. in Data Science, University of Information Technology - VNUHCM
AI/Data Engineer
Vietnam

My Skills

AI Engineer

Computer Vision
Natural Language Processing
Transformers
Deep Learning

Data Engineer

Python
Data Pipeline Development
Data Warehousing
MLOps

Data Engineering Expertise

With a strong foundation in data engineering, I design and implement robust data pipelines and infrastructure to support AI/ML applications. My data engineering experience spans:

Data Pipeline & Orchestration

  • ETL/ELT Process Development: Experience designing and implementing end-to-end data pipelines for various use cases
  • Workflow Orchestration: Airflow, Prefect, and custom workflow management solutions
  • Real-time Data Processing: Building systems for stream processing using tools like Kafka, Spark Streaming

Technologies & Infrastructure

  • Cloud Platforms: AWS (S3, Redshift, EMR), Azure (Blob Storage, Data Lake, Data Factory, Databricks)
  • Data Processing: Apache Spark, Hadoop ecosystem
  • Data Storage: PostgreSQL, MongoDB, Data Lake architectures, Lakehouse
  • Data Warehousing: Dimensional modeling, fact and dimension tables design
  • Version Control & CI/CD: For data pipelines and infrastructure

Highlighted Data Engineering Projects

MLOps - NYC Taxi Fare Prediction

An end-to-end MLOps implementation for New York taxi fare prediction, featuring:

  • Complete ETL pipeline for preprocessing taxi trip data
  • Data quality validation and monitoring
  • CI/CD pipeline for ML model deployment
  • Model monitoring and automated retraining process

Real-time Sentiment Analysis Pipeline

A real-time data processing pipeline for fast food brand sentiment analysis:

  • Real-time data collection and processing
  • Scalable architecture for handling high-volume streaming data
  • Integration with NLP models for sentiment analysis
  • Dashboard for real-time monitoring of brand sentiment

AI Engineering Expertise

As an AI Engineer, I specialize in developing cutting-edge artificial intelligence solutions with a focus on computer vision and natural language processing. My expertise includes:

Deep Learning & Computer Vision

  • Visual Question Answering: Expertise in multimodal models that understand text in images and answer questions
  • Image Recognition: Experience with CNN architectures and transformer-based models for visual tasks
  • Object Detection: Implementation of YOLO, Faster R-CNN, and other detection architectures
  • OCR Technologies: Specialized in text detection and recognition for multilingual documents

Natural Language Processing

  • Transformer Models: Experience with BERT, RoBERTa, and custom transformer architectures
  • Sentiment Analysis: Building real-time systems for monitoring brand sentiment and customer feedback
  • Multilingual NLP: Creating solutions for Vietnamese and other languages beyond English
  • Question Answering: Implementing advanced QA systems for text and multimodal inputs

AI Development Stack

  • Frameworks: PyTorch, TensorFlow, Hugging Face Transformers, ONNX
  • Deployment: Model serving with Azure ML, Flask/FastAPI, AWS Sagemaker
  • Model Optimization: Quantization, pruning, and knowledge distillation techniques
  • Experimentation: Experiment tracking with MLflow, Weights & Biases

Research & Development

  • Research Implementation: Converting academic papers into production-ready AI systems
  • Dataset Creation: Building high-quality datasets for machine learning tasks
  • Model Evaluation: Comprehensive evaluation metrics and testing methodologies
  • Publications: Contributing to academic research in AI/ML fields

Highlighted AI Projects

VisionReader

A multimodal transformer-based model for Vietnamese visual question answering:

  • Custom architecture combining vision and language transformers
  • State-of-the-art performance on Vietnamese text in images
  • Multilingual support and cross-lingual knowledge transfer
  • Optimized for deployment in resource-constrained environments

ViTextVQA Dataset

The first large-scale dataset for Vietnamese visual question answering:

  • Over 10,000 images with Vietnamese text
  • 30,000+ question-answer pairs covering diverse domains
  • Rigorous quality control and annotation guidelines
  • Benchmark evaluations for various model architectures

Featured Projects

ViTextVQA Dataset

ViTextVQA-Dataset

The first high-quality large-scale dataset in Vietnamese specializing in understanding text within images.

Computer Vision NLP VQA
VisionReader

VisionReader

Visual question answering with transformers in Vietnamese, with support for other languages.

Transformers VQA Multilingual
MLOps NY Taxi

MLOPs-NY-taxi-fare-prediction

An end-to-end MLOps implementation for predicting taxi fares in New York City.

MLOps Prediction Data Pipeline
Sentiment Analysis

Real-time Sentiment Analysis

Real-time sentiment analysis system for fast food brand monitoring and customer feedback analysis.

NLP Sentiment Analysis Real-time

Publications

2025

ViOCRVQA: novel benchmark dataset and VisionReader for visual question answering by understanding Vietnamese text in images

HQ Pham, TKB Nguyen, Q Van Nguyen, DQ Tran, NH Nguyen

Multimedia Systems 31 (2), 106, 2025

2024

ViTextVQA: A large-scale visual question answering dataset for evaluating Vietnamese text comprehension in images

Q Van Nguyen, DQ Tran, HQ Pham, TKB Nguyen, NH Nguyen

arXiv preprint arXiv:2404.10652, 2024

2024

UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models

Q Van Nguyen, HQ Pham, DQ Tran, TKB Nguyen, NH Nguyen-Dang

arXiv preprint arXiv:2405.17002, 2024

2023

MAT: Effective Link Prediction via Mutual Attention Transformer

Q Van Nguyen, HQ Pham, DQ Tran, TKB Nguyen, NH Nguyen

2023 IEEE 10th International Conference on Data Science and Advanced Analytics

Get In Touch