Pleural Diseases Assessment and Fatty liver Estimation Using Sub-Sampled Ultrasound Channel Data (Thesis)

Pleural effusion, the accumulation of excess fluid in the pleural cavity, is challenging to diagnose, especially in scenarios with limited patient data. Pleural effusion is a medical condition characterized by the accumulation of excess fluid in the pleural cavity, the space between the lungs and the chest wall. On ultrasound, this fluid appears as a dark, anechoic (black) area between the lung and chest wall, indicating the presence of free-floating liquid. In cases of significant effusion, the lung tissue may appear compressed due to the pressure exerted by the fluid. Additionally, the pleural layers, which normally remain closely opposed, may become visibly separated by the accumulated fluid.

Traditional image-based approaches, which rely on beamforming to reconstruct ultrasound images before classification, often lack the precision needed for accurate disease detection. This limitation is particularly problematic for assessing pleural effusion and estimating liver fat levels.

This thesis explores three approaches to improve diagnosis under under-sampled conditions. First, task-based beam-forming leverages Generative Adversarial Networks (GANs) to enhance image reconstruction, making it more suitable for medical diagnosis. The other two approaches are non-image-based methods, which bypass image reconstruction altogether. Instead, they extract meaningful features directly from raw ultrasound channel data and apply advanced machine learning models for disease detection. These methods aim to improve diagnostic accuracy and efficiency, even with sparse data.

Ultra Sound Channel Data

Ultrasound channel data consists of raw signals collected from individual transducer elements as sound waves interact with tissues and return as echoes. These signals serve as the foundation for ultrasound image reconstruction using beamforming algorithms. The data is typically structured in a 3D matrix format, where

"Samples" represent the time-series of received signals

"Frames" correspond to individual ultrasound pulse events

"Channels" contain information from each transducer element.

This rich dataset captures essential acoustic properties of tissues, enabling advanced processing techniques beyond traditional image-based diagnostics.

Motivations

This project aims to explore new approaches to medical ultrasound analysis by either enhancing image quality and model explainability or directly analyzing raw ultrasound channel data without visual reconstruction. Rather than focusing solely on improving pleural effusion diagnosis—already well-handled with current imaging techniques—the goal is to investigate whether alternative data processing methods can improve visualization and classification performance. Ultimately, the project seeks to enhance diagnostic accuracy in ultrasound imaging, potentially reducing reliance on X-rays, which are more expensive and pose health risks due to radiation exposure.

Project

This project is a proof of concept for detecting pleural effusion using advanced signal processing and machine learning techniques. Traditional diagnostic methods rely on reconstructed ultrasound images processed with beamforming algorithms, such as Delay and Sum (DAS). However, these methods are computationally expensive, require high sampling rates, and often lead to information loss. Additionally, image-based models struggle with limited datasets, leading to overfitting and reduced generalizability.

This project explores three complementary methodologies for detecting pleural effusion using ultrasound data, aiming to move beyond conventional image-based diagnostic models:

Task-Based Beamforming for Image Enhancement
The first approach focuses on improving image reconstruction using task-based beamforming. Unlike standard methods like Delay and Sum (DAS), which generate generic ultrasound images, this method reconstructs images specifically optimized for disease detection. The goal is to create more informative ultrasound images tailored to pleural effusion diagnosis.
Feature-Based Machine Learning Approach
The second approach extracts meaningful features from ultrasound channel data rather than relying solely on images. By leveraging statistical and signal-based characteristics, this method applies machine learning techniques to predict the presence of pleural effusion, reducing dependency on fully reconstructed images.
Direct Raw Data Processing with Transformer Models
The final and most advanced approach bypasses image reconstruction entirely by feeding raw ultrasound channel data directly into deep learning models, particularly transformer-based architectures. This method seeks to fully exploit the rich, high-dimensional information present in raw signals, aiming for faster, more accurate, and data-efficient disease detection.

Data

For this study, we collected ultrasound channel data from nine patients—eight with pleural effusion and one healthy individual—using the Verasonics ultrasound system. This system provides raw, high-resolution data by capturing signals from an array of transducer elements, each acting as both a transmitter and receiver. The acquired data, stored in a three-dimensional matrix format (Samples, Frames, Channels), offers a detailed representation of ultrasound wave interactions with body tissues, enabling advanced analysis beyond conventional imaging methods.

CNN Approach for Classification

To investigate the potential of ultrasound data for identifying pleural effusion, we trained a Convolutional Neural Network (CNN) using images generated from Minimum Variance (MV) beamforming. The goal was to analyze the latent space of these images to determine whether they contained distinguishable patterns between healthy and diseased patients.

First, we applied t-distributed Stochastic Neighbor Embedding (t-SNE) to visualize the latent space in a two-dimensional space. However, no clear separation between the two classes was observed, suggesting that MV beamformed images alone might not provide sufficient discriminatory features.

To further assess their diagnostic potential, we trained machine learning models directly on the images. A dataset was constructed and augmented using techniques such as flipping, inversion, normalization, and Gaussian noise addition. We trained two models:

Fine-tuned ResNet50 – A pre-trained ResNet50 model was adapted for our classification task, with its final layers modified for binary classification. Transfer learning was applied by freezing earlier layers while fine-tuning the last few layers.
Custom CNN Architectures – We implemented a CNN to reduce input dimensions before passing them to ResNet50, and also developed a Vanilla CNN from scratch, consisting of three convolutional layers, pooling layers, fully connected layers, and a softmax output.

These models were trained to classify the presence or absence of pleural effusion, exploring different architectures to optimize performance.

The conventional approach of training a model on images generated through MV beamforming yielded unsatisfactory results. Despite achieving a high training accuracy of 95%, test set accuracy varied significantly between 50% and 65%, indicating poor generalization. This suggests strong overfitting, likely due to dataset limitations, as the frames originate from a small number of patients, reducing inter-patient variability. Additionally, the computational cost of the pipeline is a major drawback. The MV beamforming step adds significant processing time without delivering images that sufficiently enhance classification performance. These findings suggest that crucial diagnostic information may be lost during beamforming, limiting the model’s ability to differentiate between pleural effusion and healthy cases effectively. Given these limitations, exploring Task-Based beamforming, where the model directly processes channel data to generate ultrasound images, could be a promising alternative. This approach has the potential to both improve classification accuracy and reduce computational complexity.

Task Based Beamforming and Generative Adversarial Networks (GAN)

Task-based beamforming is a deep learning-driven approach to ultrasound imaging that directly processes raw channel data, optimizing image generation for a specific diagnostic task rather than purely for visual realism. Unlike conventional beamforming techniques like MV or DAS, this method leverages GANs to produce images tailored for classification. The generator converts raw channel data into diagnostically relevant images, while the discriminator classifies these images as "normal" or "pleural effusion," providing feedback that helps refine the generator’s output. This adversarial training process ensures that the generated images not only appear realistic but also enhance diagnostic accuracy. By learning spatial hierarchies in the data through convolutional and upsampling layers, the model effectively extracts and reconstructs relevant features, making it a promising alternative for ultrasound-based disease detection.

The task-based beamforming approach directly optimizes images for clinical tasks, focusing on enhancing diagnostically relevant features, such as distinguishing between "normal" and "pleural effusion" cases. Unlike traditional methods that prioritize general image quality, this approach is tailored for diagnostic value, offering more useful images for clinical decision-making. Additionally, the GAN-based model improves imaging efficiency by generating high-quality, task-specific images, reducing reconstruction time and enabling real-time imaging. Its adversarial training process continually refines the model, improving both image quality and diagnostic accuracy, while providing AI explainability by highlighting key features emphasized in the generated images.

Generator

The GAN framework for task-based beamforming optimizes ultrasound imaging for diagnostic tasks, like detecting pleural effusion. The generator, based on a U-Net architecture, converts raw ultrasound channel data into diagnostically useful images. It processes the data through downsampling and upsampling layers, capturing critical features needed for distinguishing between normal and pleural effusion conditions. The generator's network includes convolutional layers, batch normalization, and activation functions, with skip connections to retain high-resolution features. The final output layer generates images in the expected ultrasound intensity range. The generator is trained to optimize both image reconstruction and task-specific classification, ensuring the generated images are realistic and informative for diagnostic purposes.

Discriminator

The discriminator in this model acts as a binary classifier, distinguishing between normal and pleural effusion conditions in both real and generated images. Built with a CNN architecture, it captures spatial hierarchies in ultrasound images. The network uses convolutional layers to learn hierarchical features and is followed by fully connected layers to output class probabilities for normal or pleural effusion. The discriminator is trained using both adversarial and classification losses, which help it learn to classify real images correctly and identify misclassified generated images. This dynamic between the generator and discriminator improves the quality of the images and enhances the model’s diagnostic performance.

Results

The task-based beamforming model was trained on GPUs at the Weizmann Institute, with a training time of 12 hours. Extensive data augmentation techniques were applied to enhance dataset diversity and robustness. After 12 epochs, the loss function converged, indicating effective learning. The model achieved an accuracy of 0.69, recall of 0.70, and an F1 score of 0.68 on the test set, outperforming the MV beamforming approach, which had an accuracy of 0.61, recall of 0.64, and F1 score of 0.59. The task-based model highlighted key diagnostic features, particularly in pleural effusion cases. However, it sometimes produced noisier images for normal data, affecting its overall accuracy and F1 score. Despite initial hopes, the model's focus on explainability led to a significant loss in diagnostic precision, making it less reliable for clinical use. The trade-off between explainability and performance resulted in inconsistent results, diminishing its potential for real-world applications.

Comparison with Fatty Liver Estimation Project

The fatty liver project, conducted in collaboration with NYU, involved ultrasound channel data from approximately 350 patients. The goal was to predict whether a patient had fatty liver disease, based on the brightness (echogenicity) of the liver seen in ultrasound images. The task-based beamforming approach used in this project was similar to that of the pleural effusion project. The model performed much better for fatty liver estimation, achieving an accuracy of 0.84, recall of 0.87, and F1 score of 0.79. The generator's images for fatty liver better emphasized features needed for disease classification, compared to the MV beamformed images.

One reason for the improved performance was the larger dataset, providing more variability and helping the model generalize better. Another reason was the nature of fatty liver's diffuse appearance in the images, which offered more diverse information for the model to learn from. The pleural effusion model's performance was affected by the undersampling of its dataset, and a larger dataset could improve both the image generation and disease classification for pleural effusion.

Features Based Approach Using Raw US Channel Data

The goal is on assessing pleural effusion directly from ultrasound channel data, bypassing traditional image reconstruction to avoid information loss. The goal is to preserve all the raw signal data, which may improve the accuracy and reliability of disease detection.

Feature Extraction involves calculating several characteristics from the channel data, such as signal amplitude, phase, instantaneous frequency, energy, entropy, and others. These features provide key information about tissue structures, which can be used for classification without image reconstruction.

Nakagami Distribution is then applied to the extracted features to analyze the differences between pleural effusion and healthy tissue. This distribution is ideal for modeling ultrasound backscattered signals because it reflects the acoustic properties of tissues, such as the speed of sound, which varies in the presence of pleural effusion. By analyzing these variations, the Nakagami distribution helps model the scattering properties of the tissues and detect disease-related changes.

The Nakagami parameters are estimated using maximum likelihood estimation from the collected data, and patterns are identified by comparing the distribution of these parameters between normal and pleural effusion cases. This method leverages the full potential of raw ultrasound data, enhancing disease assessment without relying on image reconstruction.

A set of 18 features is extracted from each ultrasound frame, such as amplitude, phase, spectral centroid, peak frequency, and Nakagami distribution parameters. These features are used to classify the data into "Pleural" (disease present) or "Normal" (no disease). The classification task uses Binary Cross-Entropy as the loss function to evaluate the model's performance.

Three machine learning models—Random Forest, Support Vector Machine (SVM), and Fully Connected Neural Network (FCNN)—are tested to find the best model for accuracy and recall. Feature-based models offer computational efficiency compared to image-based models, which require more resources. This method is especially advantageous in clinical settings, where quick and resource-efficient diagnosis is crucial. Additionally, in cases of undersampling, feature-based approaches may outperform image-based models that tend to overfit small datasets. Overall, this approach shows potential in improving the accuracy and speed of diagnosing pleural effusion and other medical conditions.

Features Analysis, Correlation and Clustering

Feature Analysis: The distributions of features like Signal Energy, Skewness, Signal Entropy, and Envelope Mean were found to differ significantly between normal and pleural effusion frames. For example, pleural effusion frames typically showed higher Signal Energy due to stronger acoustic reflections, while Normal frames exhibited higher Skewness values, indicating more uniform tissue reflections. The Kurtosis values were higher for normal frames, suggesting a more peaked echo distribution, whereas pleural effusion frames had a flatter, more uniform distribution. However, features like Instantaneous Frequency and Spectral Centroid showed little difference between the two conditions, likely because the ultrasound data predominantly represents liver tissue in both cases.

Clustering: To further assess feature effectiveness, K-means clustering was applied. After dimensionality reduction using PCA, the clustering visualization revealed two distinct clusters corresponding to pleural effusion and normal cases. The clustering results showed a high purity of 0.89, suggesting the selected features can effectively differentiate between the two groups, making them promising for classification tasks.

Results

Three models were tested on the feature dataset: Support Vector Machine (SVM), Random Forest, and Fully Connected Neural Network (FCNN). Each model was trained to classify ultrasound frames as either "Pleural" (indicating pleural effusion) or "Normal."

The training process involved tuning different parameters for each model. For the Random Forest, the number of decision trees was varied, and accuracy was monitored on both training and test sets. The SVM’s performance was assessed based on training epochs, while the FCNN’s learning progress was tracked through loss function evolution.

The models were evaluated using a test set of 1,000 frames (55% "Pleural" and 45% "Normal"), ensuring no overlap with the training set to avoid bias. The results showed impressive performance for all models:

These results demonstrated the models’ high effectiveness in classifying the frames and their computational efficiency. The training time was minimal, with Random Forest and SVM taking under a minute, and the FCNN taking about three minutes.

In addition, the feature importance analysis for the Random Forest model, using the Gini index, revealed that "Signal Entropy" was the most important feature.

Direct Raw US Channel Data Based Approach

This section explores using raw ultrasound channel data for pleural effusion assessment, bypassing traditional image reconstruction and feature extraction. The approach leverages neural networks to autonomously identify patterns in the raw data, potentially capturing nuanced information lost in image formation. Advantages include preserving detailed data and improving performance in undersampled scenarios, where traditional models may struggle. However, challenges include the interpretability of deep neural networks and the risk of biased learning if the training data is not representative.

Recurrent Neural Network (RNN) with Convolutional Neural Network (CNN)

The first model explored combines a Recurrent Neural Network (RNN) with a Convolutional Neural Network (CNN) to process raw ultrasound channel data, structured as a 3D tensor (samples, events, channels). The RNN captures temporal dependencies along the "samples" dimension, while the CNN learns spatial features across the "events" and "channels" dimensions. A Long Short-Term Memory (LSTM) architecture is used for the RNN, and the CNN includes convolutional layers with pooling to reduce dimensionality. After feature extraction, a fully connected layer with a sigmoid activation function classifies the frames as pleural effusion or normal. Binary Cross-Entropy Loss is used for training, optimizing the model's performance.

Transformer-Based Model for Ultrasound Channel Data

The second model uses a Transformer-based architecture, designed to handle long-range dependencies in sequential data. Given the large size of ultrasound channel data, the input is divided into smaller patches, which are embedded into a higher-dimensional space. The model uses six encoder layers, each with eight attention heads, to capture both local and global patterns through self-attention. The output from the last encoder layer is passed through a fully connected layer with a sigmoid activation for classification. The model is trained using Binary Cross-Entropy Loss. The Transformer model is computationally efficient, handling complex dependencies and making it suitable for real-time ultrasound data analysis.

Training and Results

Both models were trained using raw ultrasound (US) channel data as input. The training set consisted of frames from eight patients, while the test set came from a patient not included in the training set, ensuring an unbiased evaluation. Training was conducted on a GPU due to the computational demands of the high-dimensional data and complex models. The RNN + CNN model took approximately 15 hours to train, while the Transformer model took 19 hours. Both models were trained for 50 epochs, with a learning rate of 0.001 and a batch size of 16. Loss functions for both models showed convergence, indicating effective learning and parameter optimization.

After training, both models were tested on 1,000 frames from an unseen patient. The RNN + CNN model achieved an accuracy of 0.82, a recall of 0.83, and an F1 score of 0.81, indicating good performance in classifying pleural effusion cases. In contrast, the Transformer model had lower performance metrics, with an accuracy of 0.74, recall of 0.73, and an F1 score of 0.74. The Transformer’s lower performance may be due to its patch-based processing, which might treat irrelevant parts of the data similarly to relevant parts, failing to capture essential patterns. Despite slightly lower performance compared to feature-based methods, both models outperformed image-based approaches, with the RNN + CNN model showing significant promise for direct raw ultrasound data analysis.

Conclusion

Limitations

Future Work

Future research could focus on several key areas to improve the diagnostic accuracy and generalizability of ultrasound channel data models. One promising direction is multimodal approaches that combine ultrasound channel data with other imaging modalities like traditional ultrasound or CT scans. This fusion could enhance diagnostic accuracy by leveraging both detailed structural information from conventional images and raw acoustic signals from channel data, particularly in complex cases.

Another area of interest is developing segmentation techniques to focus on relevant regions within the channel data. Inspired by deep learning segmentation methods, a segmentation model could pinpoint disease-relevant areas, such as those near the pleural space for pleural effusion, enabling more targeted analysis and potentially improving sensitivity, specificity, and computational efficiency.

Additionally, advanced feature extraction methods using unsupervised or semi-supervised learning could uncover new patterns within the channel data, enhancing model adaptability across different datasets and conditions. Methods like those in [17], which use deep learning for ultrasound feature extraction, could contribute to more generalizable models.

Expanding the dataset to include more patients with varying disease severity, scanned with different equipment, would improve model robustness and generalizability. A larger, more diverse dataset could reveal whether model performance improves with more training examples or if the methods used are already optimal under current conditions. Incorporating data from studies like [5] could further test the model’s performance on outliers, offering insights into its ability to handle diverse patient populations and equipment variations.