The Bert based model handles misspellings synonyms really well by association BUT is does over compensate a lot. In an age where domain squatters have taken all the properly spelled names startups have to rely on synonyms. Then google penalizes them. Yet few months ago Google declared it would penalize squatters. This approach is hurting many companies (so I hear in my circles) and Google isn’t listening :(
There is too much fake marketing around companies adopting ML and AI today. We call this the "man behind the curtain" (wizard of oz). The rule of thumb is: if it takes minutes or even hours to OCR & data extract then it's human labor.
Lots of companies like Expensify, Bills.com, ReceiptHog and others use MTurk or services like MTurk to extract data from the financial documents. The accuracy is still not 100% guaranteed also the categorization is usually off since person categorizing your receipts doesn't have a history of your previous purchases and how they should be categorized in your business. This also means that if you are doing anything with PII or a healthcare company, watch out. These companies are NOT HIPAA compliant. They leak PII data. It takes 1 social hack to steal someone's identity.
How do I know all this? Because we (https://iqboxy.com YC W17) built a 100% automated solution for bookkeeping and expense management. You scan a receipt/bill/invoice and you get results in a few seconds. We have been offered many times to use those human labor services for "automated data extraction" but we believe it's not how this problem should be solved in 2018.
@dbirulia I see you have a "free plan" (the "paid plan" is currently too much for me for expected value, in my country it's ~10 coffees, not 2), but it has "Advanced OCR" grayed out. Does it mean it has no OCR at all (just stores image scans?), or rather some kind of "primitive OCR" (whatever this means)?
Hey @akavel the difference is that the system will not build ML models for your account and will not learn from your edits and categorizations. Give it a try and maybe it’s enough for your case.
While you're likely correct, there's really no way of telling if they are compliant or not without a published 3rd party audit (ideally several). If a company puts the proper policies and controls in place, and then proves implementation and adherence to a 3rd party, then they are technically compliant (be that HIPAA, PCI, etc.). It is possible to define and implement data access policies for offshore workers. It's extremely hard to prove adherence, but it's possible.
I doubt you could sign a BAA with offshore workers who don't have to comply to such US standards. Furthermore, this space will get shaken up in 2018 in Europe when EU General Data Protection Regulation (GDPR) goes into effect.
Re 3rd party audit -- yes, a Pen Test by a 3rd party & BAA should be the standard for healthcare companies dealing with service provides. If Expensify has any healthcare companies using their service they are either too small to employ such due diligence or Expensify is headed towards a disaster aka Equifax #2.
Either way, tech companies should take privacy more seriously.
Saw that you were super active in this thread, so I googled your username. It looks like you're their competitor and you're acting like you're an unbiased/concerned person. Pretty dishonest - I'm sure it's great, but you should disclose that you're shilling for your company.
Not once did I "shill" for my company here. And yes I am concerned about this and people affected. Should I not be? Calling me dishonest is just poor form mate.
I don't see how a 100% automated solution is going to detect errors. Unless you have humans reviewing and verifying at least a sample of the results you can't train. So do you expect your customers to review everything they scan line-by-line to verify accuracy?
Great App! There are a lot of productivity apps out there, but most of them are overloaded with useless functionalities, this one seems to be very simple and solves the major needs.
Let us know once Freemium version is out.
That definitely sucks... but being employed, unemotional and doing same stuff day to day sucks even more... it kills you from the inside.
Hope everything goes well with your company now!
New company (farther along startup, funded, and doing great work) is a lot less stressful. I basically view it like a paid vacation where I can focus on engineering and tech instead of the howling vortex of crazy that is business and sales. :)
There are a lot of great start ups that are willing to bring brilliant engineers here to Silicon Valley. But these days start ups are no looking for "PHP developers" these days, start ups are looking for smart, energetic developers who are not just focussed on one language. I don't think I know any developer who is coding in only One specific language right now. Most important thing is your experience and domain knowledge.
Remember language is just a tool and you should be ready to learn a new tool within a few days/weeks.
Also I would recommend to start contributing to open source projects, or start your own open source project at GitHub for example.
Indicate your knowledge and interests. Participate in discussions and connect to new people with similar interests.
For example if you are the smartest engineer that has a great experience in OCR (optical character recognition) and you contribute a lot in open source projects that build frameworks for OCR like Open CV + you have your blog where you share some thoughts on OCR and write some tutorials. I'm sure sometime you will be contacted by Silicon Valley start up that needs you.