2022-2023 Capstone Projects

GENETIC VARIABILITY AND DISEASE RESISTANCE ANALYSIS OF GRAPEVINE FROM ARMENIA

The paper studies population structure and genetic variability of Armenian cultivated and wild grapevines. Armenian grapevine genetics is shown to be different from the worldwide population. Further, the study considers powdery mildew disease, as some Armenian wild samples have shown to be resistant to the disease. By conducting GWAS on the wild Armenian samples, the study reveals new genetic markers that are significantly associated with powdery mildew disease Resistance.

Student: Emma Hovhannisyan

Supervisors:  Dr. Hans Binder, Maria Nikoghosyan

PHYSICS INFORMED SPATIOTEMPORAL DEEP LEARNING

This capstone thesis presents a comprehensive investigation of physics-informed spatiotemporal deep learning,  a novel approach for solving nonlinear partial differential equations (PDEs) with the aid of deep learning techniques. By leveraging the principles of physics-based deep learning, this research aims to provide accurate, data-driven solutions and data-driven discoveries to complex PDE problems. In particular, we discuss Burgers and the complex Ginsburg-Landau equations in the context of physics-informed neural networks.

Index Terms—Keywords: physics-informed neural networks, spatiotemporal deep learning, nonlinear partial differential equations, physics-based deep learning

Student: Aleksandr Mkrtchyan, Jon Hakobyan

Supervisor: Aleksandr Hayrapetyan

REAL-TIME PARKING OCCUPANCY DETECTION

Nowadays, finding vacant parking spaces in urban areas have become a problem. Drivers spend a lot of time finding vacant parking spaces, which is stressful and results in traffic congestion and increased fuel consumption. To solve this problem, we developed a Deep learning-based system that can accurately detect and classify parking spaces as occupied or unoccupied in real-time, using live video feeds from cameras installed in the parking lot. The suggested system consists of two components: object detection using YOLO [1] and parking space occupancy detection using IoU(intersection over the union). To train and test our object detection model, we also created and labeled our dataset, which consists of images taken from the recordings of the cameras installed in a parking lot. Testing results showed that our system works with high accuracy and can be applied in real-life situations.

Student: Laura Barseghyan

Supervisor: Elen Vardanyan

DISEASE CLASSIFICATION ON ENCRYPTED RETINA SCAN IMAGES USING NEURAL NETWORK

As more sensitive medical data is being utilized for machine learning applications, ensuring the privacy and security of such data is becoming increasingly important. This paper proposes a privacy-preserving approach for disease detection on retinal scan images using homomorphic encryption. We explore the use of convolutional neural networks and vision transformer models for this task and highlight the need for custom approximation functions for certain activations like ReLU, GeLU, and Softmax. Our experiments show promising results in terms of accuracy and privacy preservation and demonstrate the feasibility of using homomorphic encryption for medical image analysis. We propose a vision transformer architecture with an 83% accuracy on the testing dataset and obtain an approximation function for layer normalization operation, getting us one step closer to performing fully encrypted inference. Overall, our approach offers a potential progress for protecting sensitive medical data while enabling the advancement of machine learning in healthcare.

Students:  Anri Abrahamyan, Elina Israyelyan, Hovhannes Manushyan

Supervisor: Varduhi Yeghiazaryan

COMPARING SUBJECTIVE RANDOMNESS WITH POISSON DISTRIBUTION

Past research has successfully demonstrated that humans are in fact, able to perceive and respond to randomness in the environment. However, it is also well known that humans' understanding of random events is biased – they tend to see regularities in truly-random data. This tendency is referred to as the “cluster illusion”. This study focuses on two-dimensional point patterns to analyze the statistical properties of the distribution of subjective randomness by comparing it to the properties of the Poisson distribution. To do so, past research studies are reviewed and 2D dot patterns are generated utilizing different algorithms used in past behavioral experiments. The possibility of modeling subjective randomness with the Image Pyramid as well as its limitations are discussed.

Student: Valentina Mkhitaryan

Supervisor: Dr. Tadamasa Sawada

VEHICLE LICENSE PLATE RECOGNITION IN ARMENIA

This capstone project presents an Automatic Number Plate Recognition (ANPR) system designed to detect and recognize Armenian license plates. The system employs an object detection algorithm to detect license plates, applies skew and rotation adjustment to correct the orientation, and uses optical character recognition to recognize the characters on the plate. The project achieves a high accuracy rate of 90% on plate detection and 95% on character recognition on a manually collected dataset consisting of high-quality photos and videos. Overall, this project presents a promising solution for ANPR  Systems.

Student: Diana Sargsyan

Supervisor: Elen Vardanyan

SMALLER3D: SMALLER MODELS FOR 3D SEMANTIC SEGMENTATION USING MINKOWSKI ENGINE AND KNOWLEDGE DISTILLATION METHODS

There are various optimization techniques in the realm of 3D, including point cloud-based approaches that use mesh, texture, and voxels which optimize how you store, and how do calculate in 3D. These techniques employ methods such as feed-forward  networks, 3D convolutions, graph neural networks, transformers, and sparse tensors. However, the field of 3D is one of the most computationally expensive fields, and these methods have yet to achieve their full potential due to their large capacity, complexity, and computation limits. This paper proposes the application of knowledge distillation techniques, especially for sparse tensors in 3D deep learning, to reduce model sizes while maintaining performance. We analyze and purpose different loss functions, including standard methods and combinations of various losses, to simulate the performance of state-of-the-art models of different Sparse Convolutional NNs. Our experiments are done on the standard ScanNet V2 dataset, and we achieved around 2.6% mIoU difference with a 4 times smaller model and around 8% with a 16 times smaller model on the latest state-of-the-art spacio-temporal convents based models. Our source code is available at: https://github.com/madanela/smaller3d


Student
: Alen Adamyan, 

Supervisor: Erik Harutyunyan

ANALYSIS & PREDICTION OF REAL ESTATE HOUSE PRICING FOR YEREVAN, ARMENIA

This study is aimed to predict real estate prices in Yerevan using three regression models, trained and tested using ArcGIS, namely, Generalized Linear Regression (GLR), Geographically Weighted Regression (GWR), and Forest-based Classification and Regression (FBCR). We aim to answer the following research question:

What is the best spatial data science technique for predicting real estate prices in Yerevan, Armenia, and how can it be visualized and validated to inform decision-making for property buyers, sellers, and urban planners?

The project involves exploring various techniques and tools in ArcGIS Pro & Python to analyze real estate data in Yerevan. The first stage of the project includes using IQR for outlier detection, analyzing the Correlation matrix of variables, and assessing the Global Moran's Spatial Autocorrelation in order to understand our data better. The spatiotemporal distribution of house sales is visualized using the 3D tools in ArcGIS to identify patterns and trends in the data. The analysis also includes spatiotemporal data science techniques, such as hotspot analysis, Space-Time-Cube patterns for sale data, and evaluation of model performance based on standardized residuals, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R ). 

The results showed that the FBCR model outperformed the GLR and GWR models in terms of predictive accuracy, as demonstrated by the lowest MAE and RMSE and the highest R . In order to have a pipeline that properly answers our research question, we used the data of real estate houses that were not sold yet to examine how well each of these models can be generalized for new data. We also try to optimize the quality of the models by analyzing the spatially varying relationships between our independent variables, constructing a baseline approach to analyze the residuals, and from there, choosing the most optimum/significant number of variables to include in each model. We conclude this project by approving that the prediction by the FBCR and GWR model comes much closer to our actual distribution prices as both models are far better at capturing the spatial heterogeneity of the data.

Students: Davit Nazlukhanyan, Awadis Shikoyan

Supervisor: Pakrad Balabanian

MACHINE LEARNING-POWERED ACCIDENT PREDICTION FOR THE AUTOMOTIVE INSURANCE INDUSTRY

Abstract—Predicting car accidents is a critical issue for the insurance industry. The rise of machine learning (ML) has provided companies with new tools to analyze data and predict potential accidents more accurately. This study applies different ML models to analyze the data from one of the Armenian insurance companies. The dataset provides car accident records and details such as car specifications. The ML algorithms used in this study include Logistic Regression, Random Forest, Xgboost, and Artificial Neural Networks (ANN). The results were evaluated using various performance metrics such as accuracy, precision, recall, and F1 score. The study showed that XGBoost and ANN outperform other models and can help to improve the accuracy of risk assessments for policy offerings.

Student: Dawid Arakelyan

Supervisor: Aram Butavyan

QUEUE PREDICTION FOR MULTI-BRANCH COMPANIES IN THE SERVICE INDUSTRY

The service sector is proliferating in Armenia, and many companies with various branches are dealing with the problem of queue management. The work proposes a model that, with specific changes, can be projected on an arbitrary product-service company. In the paper, the Coffee House Company was taken as an instance. Coffee House Company is Armenia’s most significant and fastest-growing takeaway coffee shop network. Coffee House has 29 branches in Yerevan and regions. The menu of the Coffee House consists of more than 230 types of drinks. By developing four models and predicting future sales, the business can decide the number of baristas in advance to reduce and manage queues. It is demonstrated that additional information sources can improve the predictions’ accuracy. This statement will be discussed in the following sections.

Student: Narine Isakhanyan

Supervisor: Aram Butavyan

DEVELOPING A DATA MODEL FOR ARMENIAN REAL ESTATE MARKET USING BUSINESS INTELLIGENCE TOOLS

With the quick development of technology and the amount of data, one can claim that the digital revolution has spread its wings everywhere, including the financial sector, in this case, the real estate market. Real Estate Market has economic significance in each country. This project will connect the real estate industry with data science to generate insights for the market. It can be done in various ways, including creating Business Intelligence tools and making predictions with Time Series Forecasting and Machine Learning algorithms. However, this project will focus on the Data Engineering and Business Intelligence parts and use the data generated daily to create a data model. The data model will be an end-to-end pipeline with Business Intelligence tools to analyze Armenia’s Real Estate Market with a big focus on apartments in Yerevan. It will provide its users with an opportunity to access the Real Estate data more insightfully, including better speed, readability ,and reliability. The user segment of the project will include buyers, real estate agencies, investors, Government agencies, and anyone interested in gaining insights from Armenian Real Estate Market more comprehensively and efficiently.

Student: Hripsime Voskanyan

Supervisor: Arman Asryan

 

DEEP CONVOLUTIONAL NEURAL NETWORK-BASED ECG ANALYZER

This article introduces a deep-learning-based instrument designed to identify cardiac arrhythmias. Electrocardiography (ECG) serves as a prevalent, non-invasive diagnostic technique employed to evaluate the electrical function of the heart. Conducting an ECG test entails placing electrodes on a patient’s chest, arms, and legs, thereby capturing the electrical impulses produced by the heart These impulses are subsequently amplified, filtered, and depicted as a graphical waveform, illustrating the heart’s electrical patterns.  Our primary goal is to incorporate this tool within a medical application as an assisting tool that will help physicians in Patient’s continues care for chronic disease prevention and management. Utilizing such advanced instruments can facilitate the identification of heart irregularities even in the absence of cardiologists. The methodology implemented in this research project involves an existing approach operating on unfiltered ECG signals. The chosen approach demonstrates exceptional performance compared to other existing methods. This study thereby contributes significantly to the fields of heart disease diagnosis and management, ECG signal analysis, and deep learning applications in detecting arrhythmias and other cardiovascular diseases.

Student: Mane Hakobyan

Supervisor: Arthur Ghulyan

EXPLORING THE ARMENIAN POLITICAL LANDSCAPE WITH DATA SCIENCE

Now more than ever it is important to stay up to date with the latest political events and most importantly be able to analyze them and draw conclusions. The following paper aims to illustrate how to use data science tools when dealing with complex political situations. To make the analysis more informative and less biased, I will scrape the tweets of analysts who are not on the same political side. Some of the analyses will use the 2020 Nagorno - Karabakh war as a divider to showcase how much the engagement and sentiment changed after the war. I will also train the LSTM model on the data to give the most accurate sentiment scores. This paper aims to put the information we are getting from politicians into ordered containers, using classic data analysis techniques combined with sentiment analysis and polarity. It aims to analyze the research questions that I have defined in section 1.3.

Student: Yeva Tshngryan

Supervisor: Natali Gzraryan