02 June, 2026

Skin Cancer, fractal image processing, and Prolog, on a Macbook Air M5

 I know that LLMs are already doing very good work in identifying whether skin lesions are malignant. There is a gap, though, of uncertainty. Also, it is difficult to trust LLMs with this sort of analysis, because how they get to their conclusions is opaque - they are 'black boxes'.

I've been trying to use a different approach to tackle these two problems. To use a fractal approach to the analysis, both as a different method, and as a 'white box' approach where it is possible to see exactly what feature of an image leads to a diagnosis.

I started working on this a few years ago, when there was an ISIC dermatological challenge, but I used a naive approach to understanding the fractal nature of lesions - the usual box-counting approach that, unfortunately, reduces the image to a black and white set of pixels, and analyses the fractal dimension of those pixels.

I have developed a new fractal analysis that analyses the image much more closely, including colour planes, and the morphology across the lesion, including the edges, producing a 3 x 8 dimensional matrix, or tensor, representing the fractal nature of the image.

I then treat this as a 24-dimensional manifold and do a kmeans cluster analysis using Mahalanobis distance calculation measures to identify the clusters the images fall into.

I've discovered that this results in 52 discernable clusters, and, within these clusters, it is satisfying that there is a clear signal difference between the benign and malign images, namely, the kurtosis of their distribution, which is extremely high for the malign lesions. 

There are still some regions where there is close overlap between benign and malignant images, and here, because they are now very specific types of image, we can use other charactaristics like the roughness of the image to disambiguate the two.

This leaves us with the 52 clusters and lots of rules for how to identify each type of lesion within each cluser. Instead of an extremely messy and fragile python program full of if statements, this is the perfect job for Prolog, where each rule is a fact, and identifying a new lesion becomes simply a testing against these rules. I've long wanted to find a practical application of Prolog, and this is a very satisfying one.

There are over half a million images from the ISIC database, and it has take 12 days to analyse them, at a rater of 1.92 seconds an image. This has been possible by using an Apple macbook M5 air, and programming all the matrix calculations on Apple metal. The M5 is quite remarkable by running this load on its 10 CPUs, and 10 GPUs, with Apple Metal 4 support, whilst responding normally to standard work, and only running slightly warm. The load average has been mainly about 9, with four python scripts running in parallel.

Now that the heavy lifting period is nearly over, as I write this, it is down to the final 1000 images, it will be time to run a full validation, using the Prolog database (actually 'problog', which is adapted for statistical work), against the images to find the specificity and sensitivity within the ISIC images, as well as checking the diagnostic types against the actuals.




No comments:

Post a Comment