Could you share implementation?

#1
by stanpony - opened

Hey @Corianas . Your model seems cool. Would love if you can shre the implementation as I am working on a character level tokenized model too. Unfortunately am not having success, so any help is appreciated :)

Hi, sorry just noticed this message as was on a trip.
What would you like to know?

I found having the model use a shift key for creating capitals actually really helped the model understand letters properly, and that being uppercase is just a state of being, and not really affecting the meaning what is being written. (Except for things like CEO, it knew that was always capitalized versions of the letters)

thats why all my character level models have the bit of code to turn text like "Hi" into "↨hi"
I also pre-process out all Unicode containing samples, (after trying to convert to plain letters)

I am currently planning on a run of 64k context length, as I feel about 32 pages (ish) of real text should be a great tool to be able to use. especially after upping my stories dataset with better samples and so on.

Also: Thank you so much this is the AI project I keep coming back to, and love that others are getting joy from it.

Very cool. Thanks. Wait wow you want to train on 64k context? That's huge. How do you train it? Is it because you run in 16bit? This would actually be really interesting for me to know because my task actually requires long contexts in an ideal world.

I got the character level tokenizer to work by the way which is great. Trained on tiny stories, 512 context length, and generates english very well. Also, i trained it on a 5M GPTNano architecture.

Interestingly I didn't face this upper case issue. I did normalize the dataset, and removed all rows that just had weird unicode samples (was just easier than saving those few thousand rows for training).

Yeah, I used Karpathy's NanoGPT for this, and his llama one as well for a similar model, but I came across https://github.com/KellerJordan/modded-nanogpt which I did replicate a few months ago that was using a native context window of 64k I realized, but not due to it being 16 bit, just... new techniques that are better memory wise.

Since then I have set about trying to get my dataset big enough for it, (not in sample sizes yet, though they are on my list for a future iteration) as I feel 64k is the size where things would be actually useful, as doing it 1 letter at a time isn't space optimized and will need that.

and yes it worked fine with the uppercase, I just... went 1 step further as I just... liked the vibe more to be honest.

I also hope to extend it to other keys like alt/ctrl afterwards in post training to use old text apps and such and by baking it in that these are modifiers... i hoped would make that easier..

yeah for sure, cool stuff. What's your goal with this project? yeah massive context is defo needed for character level models. Will check out modded_gpt. Just followed you on X btw. Would love to keep updated with your char project.

My goal? honestly, I always found them to be very nifty, and as time went on, and contexts have gotten bigger, this mitigates their biggest flaw, needing lots of context to get anything done.

I aim to make my own personal AI, and sometimes... I can be a little free-form on words/spellings to make points, which got me wondering about my old project about an AI that... had to grow its own vocab in its mind, and so I recently came back to this project again... having been following the gpt-nano speedruns :-p

Currently doing tests on my 4060 16gig card, as I want something I can finetune/continue later at home, as the goal really is to have something that really can be a home grown AI, I guess.

Sign up or log in to comment