Portfolio Manuscript - arXiv Style Preprint
Anas Dorbani

Abstract. A PhD student at Polytechnique Montreal specializing in the intersection of AI and data systems. My research focuses on multimodal data integration, tabular understanding, and enhancing database systems with large language models. I am passionate about building the next generation of intelligent data systems.
Keywords: Data & AI Systems; Multimodal Data Integration; Tabular Understanding.
1.Publications
[1] Factorized and Vectorized Execution: Optimizing Analytical and Semantic Queries over Relations
[2] Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB
2.Education


PhD in Computer Engineering

B.Sc. in Computer Science
3.Research and Industry Experience

Oracle Labs
Automated schema generation for Oracle's Financial Crimes & Compliance systems, enhancing data processing. Fine-tuned 7B models to optimize schema and handle abbreviated column names. Created a framework to evaluate schema generation and data integration accuracy. Improved metadata consistency from 0.4 to 0.6, boosting data interpretability. Optimized output parsing for better data flow and results with 7B models.

Oracle Labs
Enhanced machine learning explainability for the AutoMLx project by optimizing LFI/GFI explainers, reducing their processing time by 80% and improving inference speed. Streamlined memory usage from 20GB to 4GB, lowering operational costs for explanation services. Achieved 83% code coverage to ensure reliability and maintainability of explainability features. Collaborated with cross-functional teams to deliver scalable, high-performance ML explainability solutions within AutoMLx

National University of Rabat
Engineered a deep learning model to predict RFID pricing by scraping specifications and market data. Deployed the solution on GCP using Docker for scalable performance and built a Django web application to streamline data collection and real-time model testing.
4.Teaching

INF3710: Files and Databases
Introduction to files and databases: needs analysis via the entity-relationship model; relational model and relational algebra; SQL DDL/DML and embedded SQL; concurrency control and transaction management; relational schema design (functional dependencies and normal forms); storage models and file structures; indexing and hashing.
- Fall 2025 - TA for Prof. Amine Mhedhbi
- Winter 2026 - TA for Dre. Franjieh El Khoury
5.Awards and Grants
VLDB Travel Grant
Funding support for students, researchers, and faculty to attend the VLDB 2025 conference in London, covering travel, lodging, and free registration to promote participation in database research.
6.Selected Systems
FFX(August 2025 - Present)
Fast Factorized eXecution engine for join-heavy analytical and semantic queries. Built in C++ with factorized intermediates and vectorized execution to optimize performance for modern data workloads.
Flock(September 2024 - Present)

DBMS extension integrating LLM and RAG into OLAP systems. Developed FlockMTL from infrastructure design to code implementation and optimization. Designed custom map and reduce functions to integrate advanced workflows into relational database systems. Implemented dynamic batching over tuples to improve query execution efficiency.
OpenHands(Mars 2024 - August 2024)

Platform for software development agents. As a core maintainer, I helped with reimplementing the SWE agent and fixing its benchmark to improve performance and reliability. Assisted with issue resolution and reviewed pull requests to maintain project quality.
SecureStream(Feb 2024)
A network security project that employs machine learning and real-time traffic monitoring to detect anomalies in network data. Powered by the CSE-CIC-IDS2018 dataset and cicflowmeter, it enables swift identification of potential threats, enhancing overall network security.