Thursday, September 08, 2011

notMNIST dataset

I've taken some publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A-J taken from different fonts. Here are some examples of letter "A" Judging by the examples, one would expect this to be a harder task than MNIST. This seems to be the case -- logistic regression on top of stacked auto-encoder with fine-tuning gets about 89% accuracy whereas same approach gives got 98% on MNIST. Dataset consists of small hand-cleaned part, about 19k instances, and large uncleaned dataset, 500k instances. Two parts have approximately 0.5% and 6.5% label error rate. I got this by looking through glyphs and counting how often my guess of the letter didn't match it's unicode value in the font file. Matlab version of the dataset (.mat file) can be accessed as follows:
load('notMNIST_small.mat')
for i=1:5
    figure('Name',num2str(labels(i))),imshow(images(:,:,i)/255)
end
Zipped version is just a set of png images grouped by class. You can turn zipped version of dataset into Matlab version as follows
tar -xzf notMNIST_large.tar.gz
python matlab_convert.py notMNIST_large notMNIST_large.mat
Approaching 0.5% error rate on notMNIST_small would be very impressive. If you run your algorithm on this dataset, please let me know your results.

613 comments:

«Oldest   ‹Older   601 – 613 of 613
Documentation in Python code is crucial for ensuring readability, maintainability, and collaboration within a project. Here are some best practices for effective Python code documentation@ www.nearlea said...

NearLearn provides excellent training programs with real-world projects. The mentors are very supportive and knowledgeable.” https://nearlearn.com/blog/top-10-python-training-institutes-in-bangalore/

Documentation in Python code is crucial for ensuring readability, maintainability, and collaboration within a project. Here are some best practices for effective Python code documentation@ www.nearlea said...

Highly recommend Power BI @ NearLearn for anyone looking to boost their career in data analytics and business intelligence.” https://nearlearn.com/courses/business-intelligence-visualization/power-business-intelligence-training-and-certification

vinod1 said...

If you are looking for the best place for PTE and IELTS coaching in Hyderabad, Punjagutta, FederPath is one of the top choices. FederPath Consultants is located at Khursheed Mansion, opposite to Punjagutta Metro Pillar No. 1108, above Medsys Pharmacy Mall, Hyderabad

vinod1 said...

If you are looking for the best place for PTE and IELTS coaching in Hyderabad, Punjagutta, FederPath is one of the top choices. FederPath Consultants is located at Khursheed Mansion, opposite to Punjagutta Metro Pillar No. 1108, above Medsys Pharmacy Mall, Hyderabad.

Skyappzprathi said...

NIce information.Business intelligence services in Dubai

Documentation in Python code is crucial for ensuring readability, maintainability, and collaboration within a project. Here are some best practices for effective Python code documentation@ www.nearlea said...

Thanks for the motivating post! As someone switching from finance to tech, I found NearLearn’s online training super helpful. Their mentorship and real-world projects made the transition smoother than I expected. https://nearlearn.com/courses/business-intelligence-visualization/power-business-intelligence-training-and-certification

Ftirspectro said...

FTIR Spectro is a trusted Double Beam Spectrophotometer Manufacturer, delivering high-precision instruments designed for accurate analytical results in laboratories. Our advanced spectrophotometers ensure superior performance, durability, and reliability for research and industrial applications. As a leading manufacturer, FTIR Spectro focuses on innovation and quality to meet global standards in spectroscopic analysis.

📩 contact@ftirspectro.com | 📞 +91 9996186555

vinod1 said...

Discover CK Convention, the best , which provides excellent facilities at a reasonable cost. We come together with simplicity, comfort, and value to create memorable celebrations that are affordable within your budget, making us the best for weddings, parties, and corporate events.

Documentation in Python code is crucial for ensuring readability, maintainability, and collaboration within a project. Here are some best practices for effective Python code documentation@ www.nearlea said...

NearLearn provides excellent training programs with real-world projects. The mentors are very supportive and knowledgeable.” https://nearlearn.com/python-classroom-training-institute-bangalore

Belinda said...

I was diagnosed with Parkinson’s disease four years ago. After traditional medications stopped working, I tried a herbal treatment from NaturePath Herbal Clinic Within months, my tremors eased, balance improved, and I regained my energy. It’s been life-changing I feel like myself again. If you or a loved one has Parkinson’s, I recommend checking out their natural approach at [www.naturepathherbalclinic.com]. info@naturepathherbalclinic.com

skyappz said...

Great article, very helpful and well-structured. Looking forward to more posts on machine learning from you.Android App UI/UX Design Experts in Dubai

vinod1 said...

Cosmetic gynecology is good for women’s health and self-esteem. LA Picasso is the best Cosmetic Gynecology Clinic in Hyderabad. We provide safe, cutting-edge treatments that are designed to improve the way your body looks and works. Dr. Kiranmai, a well-known gynecologist with many years of experience, leads our clinic. We are dedicated to providing great care with compassion and knowledge.

vinod1 said...

When it comes to pursuing a dream or goal of living and working in the USA, finding the right consultancy can truly be the differentiator. Of all the consultancy options out there, Swaya Overseas is certainly the best USA Work Visa Consultancy in Hyderabad. Swaya Overseas has been in business for many years, works in a completely transparent manner, and has the expertise needed to guide and help countless professionals pursuing their American dreams. Regardless of your profession, whether you are a healthcare worker, IT professional, etc., or searching for opportunities in other professions, Swaya Overseas’ goal is to help you at every step of the way!

«Oldest ‹Older   601 – 613 of 613   Newer› Newest»