WeOCR Project
Since Feb. 2005 / Last update: Sep. 15, 2019
End of Services
Thank you very much for using the WeOCR services
for more than twelve years.
As we can see a lot of nice online OCR systems
today, I have decided to close our services.
The servers will be shutdown gradually,
although some will remain working only for demonstration
purposes.
When I started developing "Online OCR service"
in 2004, there were only a couple of experimental
ones, which had been almost abandoned.
Who at that time imagined that online version
of OCR service would be practical enough
despite the privacy concerns?
Who would provide free OCR services when
the OCR engines costed much?
We can see the consequences today.
There's a high demand.
Some high-performance OCR engines are available
for free.
I don't know if the WeOCR project had some
contributions and/or impacts in the world
of web service developments.
But, it is obvious the services here are
no longer attractive.
So, it's time.
Thanks again to everyone who tried our services
and programs,
developed his/her own service, got some
inspirations (if existed) for new online services.
Aug. 25, 2017 Hideaki Goto
IMPORTANT !
Don't use bots (automated programs)
regularly. The server performance is limited, and
people suffer from high loads!
You should use a local OCR engine if
you need a regular processing since it is much more
reliable and faster basically.
Introduction
WeOCR is a platform for Web-enabled OCR
(Optical Character Reader/Recognition) systems
that enables people to use character recognition over networks.
A WeOCR server receives document images from users,
recognize texts in the images, and return recognition results to the users.
WeOCR does not have its own character recognition engine.
Instead, it is intended to accommodate various character recognition engines.
WeOCR provides a simplified user interface
so that more people can benefit from OCR easily.
Although some people would worry about the privacy of their documents,
we think there are still a lot of applications of
OCR in which privacy does not matter.
We hope WeOCR will expand the range of OCR applications further.
Objectives
- Design the architecture of WeOCR.
- Develop a toolkit that enables OCR developers and researchers
build their own Web-based OCR sites easily.
- Encourage people to develop OCRs for various languages
and to open them to the public
either as a Web service or as a Free Software.
- Make some useful tools and libraries for Web-based OCR systems.
Features
WeOCR-toolkit has the following features.
- Receive a document image from each client computer,
pass the image to the back-end OCR engine,
generate HTML data from the result data,
and send the data back to the client.
- Uncompress the incoming image file if required.
- Limit the size of the input data to protect the server
from huge data.
- Examine the integrity of image file headers.
- Convert the input image into a common image format (PNM).
- Limit the number of jobs to prevent the server from
processing too many documents at once
and to maintain acceptable server response.
- Terminate the OCR engine after a specified time has passed,
if the engine continues running (in vain) due to
unexpected input data or bugs in the engine.
- Support server search function using spec files in XML.
Documentation
License
The license is the
Apache License, Version 2.0.
(An MIT-X derivative applies to weocr-toolkit-0.12 and older.)
You don't need to open the source codes of your
OCR engine to the public, if you wish so.
TODO
- Deploy more WeOCR servers. (ASAP)
- Advertisement! (ASAP)
- Encourage researchers/developers to provide their own WeOCR services. (ASAP)
- Find open source OCRs for various languages. (midterm)
- Improve the UI. (midterm)
- Write documentations. (midterm)
-
Develop an OCR for Japanese (ASAP)
- ... etc.
Recent changes
- [Jun 15, 2012]
- WeOCR-toolkit ver.0.14 has been released.
- [Apr 26, 2009]
- WeOCR-toolkit ver.0.13 has been released.
- [Sep 26, 2008]
-
- [Sep 9, 2008]
-
- [Aug 19, 2008]
- WeOCR-toolkit ver.0.12 has been released.
- [May 12, 2007]
- WeOCR-toolkit ver.0.11 has been released.
- [Apr 8, 2007]
-
- [Feb 12, 2007]
- The server search CGI can now produce server lists in XML.
This would be useful for various web applications using WeOCR.
Pass parameter "fmt=xml" to the CGI.
- [Aug 18, 2006]
- An automatic spec collector is now up and running.
Once your server is registered for the server list,
your spec file will be examined periodically (twice a day)
and used for updating the list.
- [Jun 26, 2006]
- WeOCR-toolkit ver.0.10 has been released.
- [Jun 9, 2006]
- Hebrew OCR (hocr) has been added (see
here).
- [Jun 7, 2006]
-
- [Feb 26, 2006]
- WeOCR-toolkit ver.0.10beta has been released.
- [Feb 19, 2006]
- The OCR engine at ocr1/e1 has been updated to ocrad-0.14.
- [Jan 22, 2006]
- WeOCR-toolkit has been released (at last).
- [Jan 18, 2006]
- The project has been renamed, since the previous name ocrweb was
too popular in another community.
- [Nov 6, 2005]
- A new server with GOCR has been released.
- [Oct 14, 2005]
- The OCR engine at ocr1 has been updated to ocrad-0.13.
- A filter for adaptive thresholding has been added.
- [Oct 7, 2005]
-
- [Sep 22, 2005]
- JPEG (JFIF) support has been added.
- [Aug 28, 2005]
- Some modifications to internal codes. (No new feature.)
- [Jun 10, 2005]
- The OCR engine used at ocr1 has been updated to ocrad-0.12,
which runs much faster.
- ocr1 now accepts gzipped image files as well as raw files.
Comments
Send
feature requests, questions, bug reports, or other comments.
Note that no reply will be sent, basically.
Answers to some common questions may appear on the website.
keywords:
Optical Character Recognition, WeOCR, OCR Web, OCRWeb, Web OCR, WebOCR,
online OCR, free OCR
© 2005-2019 Hideaki Goto
www.imglab.org