More on the Faces dataset

The file is a csv structured like this

I haven't looked at what the format of the pixels is (i downloaded the months a while ago and cona't remember it's description), whether it's greyscale or rgb. Also i don't know what the ethnicity numbers indicate, (but it doesn't matter, if i get high accuracy with a model, only then will it relevant).

I will be using a 'type-oriented' styel of coding to tackle this (look on my previous post of html/xml to see what i mean). This is important for this reason -

in the past, i have attempted to do feature engineering in PROLOG.It is the most 'natural' way to code i felt, because it is such a great query language. It gives your unlimited flexibility query into a graph (and a graph can represent anything). However, i wold end up getting 'lost' because it's modularity is not so good. After predicates (prolog's name for functions) started piling up, good documentation would have been needed to nt become overwhelmed.

This is why i have switched ot coding this in Haskell. I don't like writing documentation. I can instead rely on the types sa the documentation,and more importantly, letting the computer nudge me into doing the right things (the power of hindley-milner type system).

I have a number of ideas i'd like to explore in this project, with noise ontroduction and classification being th most exciting (recently a paper from openai made me enthusiastic about this https://arxiv.org/abs/2105.05233 diffusion models beat gans on image synthesis). I had bean up this line of thinking a few years ago (2015), after coming across Ian Goodfellow's amazing GAN work. GAN's seemed so tricky though (the Nash equilibriam seemed stuff intuitively seemed scary to my straigforward optimization mind) and i read online comments about a lot of people never managing to get it to work. I then came up with the idea of noise classicaiton a few years later, wherein the data points are damaged (noise introduced) at various levels (let's say 1-99%) and classifiers trained for each of those levels. This I had surmised would approximate the gradual progression of the GAN training (from randomness generator to perfect generator) without the tricky generator-discriminator dance.

I am also pontificating on the thought that maybe the issues that prommpted Ian Goodfellow to develop GANs - adversial examples - might even be solved without any of this (diffusion/noise or GAN) by simply having an immense number of classifers. In that having 1,000s and 1,000s of targets (or whatever number X it might require) leads to the improbability of an adversarial pixel being able to 'sneak' in.

I am choosing to do this all without neural nets, since 1, they require a lot of computer which i don't have, 2, they are at this fairly straight forward (import tensorflow/pytorch and press play) and so are not so interesting as a coding challenge, 3. i've been exposed ot neural net talk so long (I was into it before Hinton/Lecun got their fame from New York Times reporting) that i'm victim to familiarity breeds contempt, 4, everyone's on neural nets and little old me won't make a dent in anything, 5, i have quite a few ideas about feature engineering i want to experiment with (graph isomorpisms, high degreee polynomials, and various other ideas that i don't have a name for yet so can't describe succintly), 6 i like vowoal wabbit - it's fast, it was my foray into machine learning (in the late 2000s) and so 'attachment' to it, and i like to do machine leanring experiments a few seconds at a time (and not hours like neural nets).

Two more recent things i've seen have made me excited about machine leanrign and to get to work on this project.

https://syncedreview.com/2021/05/14/deepmind-podracer-tpu-based-rl-frameworks-deliver-exceptional-performance-at-low-cost-19/ Google replaces BERT with fourier transform. This was exciting because i had said 10 years ago that 'AI' would have something do with matrix factorization I published a blog post back then, How To Create Human Level AI, which got some traction, but also a few detractors not being so kind about simply this being possibly a the solution to AI factorizaiton.https://www.blogger.com/blog/post/edit/preview/629591928921587567/4921118490548513476

Well - this recent paper makes me feel a little vindicated. Fourier transform boils down to removing high frequency compents ie. and so does matrix factorization - remove the singal from the noise. Of course, likely google's paper is not just about this, but i do feel a slight vindication nevertheless.

A second recnet exciting inducer was Francois Chollet talking on Machine Learning Street Talk. He was talking about alternatives to neural nets (which i said earlier, i don't work on because of those mentioned reasons). I had alreayd been thinking higher how graph theory could be applied to feature engineering, and i got some feeling of confidence that others are also discussing these kinds of thought.

THE END

Search This Blog

umar's coding blog

More on the Faces dataset

Comments

Post a Comment

Popular posts from this blog

What i'm working on right now - age gender ethnicity face dataset

html css xml generation library in haskell