MaxSea

Machine Learning Specialist • April 2024 - September 2024 (6 months)
Internship • Bidart, Nouvelle-Aquitaine, France

Neural network (YOLOX) training for real time ship detection and tracking.

1. Data gathering

Improving YOLOX performance started by creating a boat detection dataset. I managed to gather more than 110 000 images with about 290 000 ships.

By exploring the dataset with FiftyOne, I detected some duplicates and similar images. Thanks to embeddings, I created a similarity filter that I used to avoid overfitting.

2. Visualization and statistics

Embeddings allowed me to go deeper into the dataset: I detected clusters of unwanted objects like submarines. I computed statistics on detection size, image dimensions and number of objects per class of boat (to avoid unbalanced categories).

3. Clustering and annotation

To allow the model to classify boats, I labelled the dataset. I used the embeddings and clustering which allowed me to quickly sort data.

4. Training

To rapidly iterate over experiments, I automated the training pipeline. The pipeline contained data filters, YOLOX parameters, logs and training results backup.

Everything was automated so the team could launch a full training based on a single JSON file and bash script.

5. Quantization

To allow YOLOX to work in real time, I used OpenVINO to quantize every model and optimize it for Intel CPUs. The improvement was above 45% per frame.

6. Performances and tests

To create a solid knowledge base for my successor, I logged every training and created clear reports of my work. The next data scientist could easily work on the project right after me and didn't need much time to get to know the processes.

📄 This internship is described in my memoire.

Here are the results of a model I trained. Other smaller models were trained in order to be less resource-heavy.

Detection
Embeddings
Sélection