- http://commondatastorage.googleapis.com/books/icdar2007/README.txt
- http://commondatastorage.googleapis.com/books/icdar2007/[filename]
- [filename] goes from Volume_0000.zip to Volume_0999.zip
To download it, ie, with Python on Linux box
import os for i in range(1000): if count>1: break fname = 'Volume_%04d.zip'%(i) f = 'curl -L http://commondatastorage.googleapis.com/books/icdar2007/%s -o %s'%(fname,fname) os.system(f)The goal of this dataset is to facilitate research into image post-processing and accurate OCR for scanned books.

No comments:
Post a Comment