Arun Vignesh Malarkkan

Arun Vignesh Malarkkan

Ph.D. Candidate in Computer Science | AI Researcher
Researching the reliability and generalization of LLM reasoning using causal inference and mechanistic interpretability, focusing on invariance discovery, spurious correlation mitigation, and robust evaluation. Currently seeking opportunities as an Applied Scientist / ML Researcher to translate breakthrough methods into scalable AI solutions.

News & Updates

Mar 2026
Paper CAFE and Distribution Shift Aware Neural Tabular Learning submitted to ICML 2026.
Mar 2026
Paper FinRule-Bench, Rethinking Data Augmentation, and Evolving Demonstration Optimization submitted to KDD 2026.
Dec 2025
Keynote Delivered keynote talk "Beyond Correlation: Can Causal Reasoning Enhance Modern AI Models?" at IEEE MoSICom 2025, BITS Pilani, UAE.
Dec 2025
Award Received the IEEE ICDM Scholarship at the IEEE International Conference on Data Mining 2025.
Nov 2025
Paper DELTA: Variational Disentangled Learning for Privacy-Preserving Data Reprogramming accepted at IEEE ICDM 2025.
Nov 2025
Paper Causal Graph Profiling via Structural Divergence accepted at IEEE BigData 2025.
Jun 2025
Research Started as Doctoral Researcher at DOW Chemicals under the National Academy of Engineering Grant.
Oct 2024
Paper Multi-view Causal Graph Fusion Based Anomaly Detection published at ACM CIKM 2024.

About Me

Translating complex data into reliable decisions with causally-informed machine learning.

I am a Ph.D. candidate in Computer Science at Arizona State University, working in the Knowledge Discovery and Data Mining (KDD) Lab under Dr. Yanjie Fu. My research focuses on improving the reliability and generalization of large language model (LLM) reasoning using causal inference and mechanistic interpretability, with an emphasis on invariant mechanism discovery, spurious correlation mitigation, and robust evaluation.

My work investigates whether LLMs encode causally invariant reasoning processes or rely on spurious correlations, combining interventional analysis, causal mediation, and mechanistic interpretability to understand and improve model behavior under distribution shifts.

Prior to my Ph.D., I completed my Master’s in Computer Science at Arizona State University and worked in industry as a Senior Software Engineer at Fidelity Investments and a Software Development Engineer at Amazon Alexa AI. My experience spans large-scale machine learning systems, probabilistic modeling, and production-grade decision pipelines.

I am currently a Doctoral Researcher at Dow Chemicals under the National Academy of Engineering (NAE) Frontiers of Engineering program, developing causal-aware, neuro-symbolic multi-agent reinforcement learning frameworks for AI-driven materials discovery and simulation.

Outside of research, I enjoy photography and teaching programming, particularly working with students interested in creative problem-solving and building systems from first principles.

11
Publications
43
Citations
4
H-Index
1
i10-Index

Research Areas

Building AI systems that reason reliably — not just accurately.

LLM Reliability & Mechanistic Interpretability

Investigating whether LLMs encode causally invariant reasoning mechanisms or rely on spurious correlations. Developing causal mediation frameworks over intermediate tokens using activation patching and counterfactual inputs — identifying the sparse subset of tokens that act as stable causal mediators.

Causal Inference & Representation Learning

Designing causal graph learning frameworks for invariant modeling under distribution shift. Work spans causal disentanglement for privacy-preserving reprogramming, structural divergence for anomaly detection in cyber-physical systems, and incremental causal learning for streaming environments.

Causally-Guided Multi-Agent RL & Data-Centric AI

Building neurosymbolic, causally-aware multi-agent RL frameworks for automated feature engineering and materials discovery. Integrating causal structure with sequential decision-making to improve robustness under distribution shifts in industrial simulation environments.

Technical Expertise

Core Research
LLM Reasoning & Reliability Causal Inference Mechanistic Interpretability Causal Representation Learning Invariance Discovery Post-training & Fine-tuning RAG Data-Centric AI
ML & Systems
PyTorch TensorFlow HuggingFace Transformers DeepSpeed Distributed Training GPU/CUDA vLLM
Interpretability Tools
TransformerLens Activation Patching Linear Probing PyTorch Hooks Representation Analysis
LLM & Generative AI
PEFT / LoRA / QLoRA LangChain LlamaIndex OpenAI API Vertex AI Weights & Biases
Infrastructure & Data
AWS (SageMaker, RDS, DynamoDB) Apache Spark MySQL Docker Pandas / NumPy Scikit-learn AutoGluon
Programming
Python Java TypeScript

Professional Experience

Industry experience spanning AI research, ML systems, and software engineering.

DOW Chemicals
Doctoral Researcher
June 2025 — Present

Research collaboration under National Academy of Engineering (NAE) Frontiers of Engineering grant. Developing causal-aware multi-agent reinforcement learning frameworks for AI-driven material science simulations and advancing computational models for materials discovery and optimization.

Fidelity Investments
Senior Software Engineer
January 2023 — June 2023

Tech-lead for developing end-to-end retirement goal financial projections. Orchestrated project retirement goal savings framework based on Monte Carlo simulations. Developed ensemble outlier detection tools to determine factors impacting projected savings goals.

Amazon Alexa AI
Software Development Engineer
June 2021 — January 2023

Implemented end-to-end voice (VUI) purchase flow for Skills requiring Parental Consents, enabling HIPAA compliance. Developed Purchase Likelihood Score model that drove the Alexa Purchase Recommender system, improving performance by 8% increase in Voice Skill purchases. Integrated localization module and designed voice-enabled promotional discounts for Alexa Skills.

ASU Decision Center
Data Scientist / Software Developer
June 2020 — May 2021

Developed outlier prediction models for financial plans, school populations, and infrastructure. Designed end-to-end ETL pipelines and AWS Lambdas for data ingestion and analysis. Implemented prediction models to optimize school resources and analyze graduation rate factors.

Publications

Recent contributions to Causal ML, LLM Reasoning, and Data-centric AI research.

Published
ACM CIKM 2024
Multi-view Causal Graph Fusion Based Anomaly Detection in Cyber-Physical Infrastructures
Arun Vignesh Malarkkan, D. Wang, Y. Fu
Proceedings of the 33rd ACM CIKM, pp. 4760-4767, 2024
Accepted
IEEE ICDM 2025
DELTA: Variational Disentangled Learning for Privacy-Preserving Data Reprogramming
Arun Vignesh Malarkkan, H. Bai, A. Kaushik, Y. Fu
Proceedings of the 25th IEEE International Conference on Data Mining, 2025
Accepted
IEEE BigData 2025
Causal Graph Profiling via Structural Divergence for Robust Anomaly Detection in Cyber-Physical Systems
Arun Vignesh Malarkkan, H. Bai, D. Wang, Y. Fu
Proceedings of the 13th IEEE International Conference on Big Data, 2025
Accepted
IEEE TBD 2025
Incremental Causal Graph Learning for Online Cyberattack Detection in Cyber-Physical Infrastructures
Arun Vignesh Malarkkan, D. Wang, H. Bai, Y. Fu
IEEE Transactions on Big Data, 2025
Under Review — ICML 2026
CAFE: Causally-Guided Automated Feature Engineering with Multi-Agent Reinforcement Learning
Arun Vignesh Malarkkan, W. Ying, Y. Fu
arXiv preprint. Under Review — ICML 2026
A Survey on Data-centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective
W. Ying, C. Wei, N. Gong, X. Wang, H. Bai, Arun Vignesh Malarkkan, S. Dong, D. Wang, D. Zhang, Y. Fu
arXiv preprint arXiv:2502.08828, 2025
Under Review — KDD 2026
FinRule-Bench: A Benchmark for Joint Reasoning of LLMs over Financial Tables and Principles
Arun Vignesh Malarkkan et al.
arXiv preprint. Under Review — KDD 2026
Under Review — KDD 2026
Rethinking Data Augmentation under Covariate Shift: Invariant-Guided Diffusion and Prototype Reweighting
H. Cao, X. Wang, Arun Vignesh Malarkkan, K. Liu, D. Wang, Y. Fu
Under Review — KDD 2026
Under Review — KDD 2026
Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation
X. Wang, K. Liu, Arun Vignesh Malarkkan, Y. Fu
arXiv preprint. Under Review — KDD 2026
Under Review — ICML 2026
Distribution Shift Aware Neural Tabular Learning
W. Ying et al.
arXiv preprint. Under Review — ICML 2026
Rethinking Spatio-temporal Anomaly Detection: A Vision for Causality-driven Cybersecurity
Arun Vignesh Malarkkan, H. Bai, X. Wang, A. Kaushik, D. Wang, Y. Fu
arXiv preprint arXiv:2507.08177, 2025
View on Google Scholar

Honors & Awards

2025
IEEE ICDM Scholarship
IEEE International Conference on Data Mining, 2025
2025 – Present
NAE Frontiers of Engineering Grant
National Academy of Engineering — DOW Chemicals Doctoral Research Collaboration
2018
Academic Scholarship
Arizona State University, 2018
2017 – 2018
Best Undergraduate Researcher Award
Arizona State University

Academic Service

Program Committee
  • AAAI 2026
  • IEEE ICDM 2025
  • IEEE BigData 2024 — Undergraduate & High School Research Symposium
  • IEEE ICSCAN 2024
Reviewer
  • KDD
  • ICLR
  • NeurIPS
  • ICDM
  • IEEE BigData
  • Urban AI
  • ACM TKDD (Journal)
Talks & Chairing
  • Keynote Speaker — IEEE MoSICom 2025, BITS Pilani UAE
  • Session Chair — IEEE MoSICom 2025
  • Session Chair — IEEE ICDM 2025 Undergraduate & HS Research Symposium
Teaching & Mentorship
  • Senior TA / Lab Instructor — OOP, Data Mining, Semantic Web Mining (ASU)
  • Project Manager — 30 Capstone Project Teams (ASU)
  • Research Mentor — Undergraduate & Graduate Researchers (Jan 2024–Present)
  • IEEE Phoenix Chapter Student Volunteer & Organizer

Get in Touch

Open to research collaborations, internship opportunities, and academic discussions.

Let's Connect

Whether you're interested in research collaboration, have questions about my work, or want to discuss opportunities — my inbox is always open.

Seeking Opportunities

Actively seeking Applied Scientist and ML Research Internship opportunities to apply my expertise in LLM reasoning reliability, causal ML, and mechanistic interpretability. Open to industry research labs and academic collaborations.