posts
Welcome! Check out my posts below.
-
June 19, 2021.
Hierarchial spreadsheets and you
tags: pl, hci
The idea of viewing your possessions like a video game inventory is really amusing:
- 1x laptop
- 1x charging cable
- 1x face mask
- 50x alcohol wipes
- 1x large cup of jasmine green pearl milk tea of power <+7>
Long story short, I had a “wait but actually though” moment in a flash of shower thought brilliance.
It’s not like I lacked motivation for it either. As a minimalist, I’ve always wanted to do this thing where (1) you take inventory of all your things, (2) you cross out the things you don’t need, (3) you get rid of those things. The idea is to intentionally get rid of things, like that hairbrush you never ever ever ever ever use, but keep anyways “just in case”.
Now here was the hard part:
Q: Am I really going to be writing hundreds of items in a bulleted list?
A: sure why not
Q: Some things are in boxes, should I write down the things inside boxes?
A: yes, just indent the things inside boxes
Q: What about boxes in boxes? On a shelf, in a closet?
A: yea, wait fuckThere was no way I was writing a deeply nested bulleted list of hundreds of items – I’ve worked with enough big JSON/YAML behemoths to know that you lose sight of the nesting about 2 levels in. A spreadsheet is only slightly better, but again if you do nesting you either have to have huge cells or multiple spreadsheets. That’s where the second flash of shower thought brilliance kicked in – what if I do this in TreeSheets?
TreeSheets
Some spreadsheet software will let you have all kinds of things in a cell – formulae, images, sparklines, stock quotes. TreeSheets (written by the wonderful Wouter van Oortmerssen) lets you have spreadsheets in cells. I don’t know how old or new this concept is but it’s really cool, except for the outdated looking GUI, which is built on wxWidgets.
For me, it just means my nested lists don’t suck, and I can use a second column in each list to add “delete” tags to my things.
(an image should go here but I'm too lazy rn)
After all was said and done, I had a huge spreadsheet of all my things. TreeSheets has a feature where you can copy your whole spreadsheet and immediately paste it somewhere and it comes out as a nested list. The output was about 500 lines long – turns out I have like 500 things. Which mentally feels like a big number, but according to my friend, “it’s one of those things that seem like a lot only if you count it, like calories.”
I had been using TreeSheets to serve as an organizational tool. By the end of the week-long journey of cataloguing all my things, we witness our third flash of brilliance: spreadsheets have formulas.
The plan was obvious. Set up a programmatic dashboard in TreeSheets using lookup formulas to locate items at will, or at least list all the items I tagged with “delete”. Kind of like a shitty query language! But TreeSheets doesn’t really support that – formulas are super limited, there’s not a lot of functions, and formula cells are more like Jupyter cells in that you can individually run them or hit Run All.
(Also, I could just use ctrl-F in TreeSheets to locate things. I need to stop overengineering.)
Anyways, formulas seem kind of hard in hierarchial spreadsheets. As for who’s solving this particular pickle, the only other hierarchial spreadsheet software I know is:
Inflex
Inflex (currently in beta as of Jun 2021) has the same idea – cells can contain spreadsheets. My impression of Inflex’s ambition is something like “take the most popular declarative language in the world – spreadsheets – and make it really really damn functional”.
To this end, they’ve built support for lists and records as special spreadsheets. I guess the idea is that you can
map()
andfilter()
them and it’ll come out the other side as another list or record cell and that’s kind of pretty. There’s also functional debugging support (static typechecking, holes_
) which is also kind of pretty.Pretty but not super polished yet UX-wise – sure, you can paste
{a:2,b:[1,2,3]}
as text and it’ll be represented as a spreadsheet. But right now, you can’t use cell references (e.g.map(x:x+5, the cell to the left)
), so you have to copy the original data into the formula to use it. I honestly would use this if the UX were at the level of TreeSheets, so yeah let’s check back in a year or something.Future thoughts
Honestly I don’t think any hierarchial spreadsheets support good formulaic lookups which kind of sucks. I mean, I definitely don’t want to work with nested indices like
F1A1A1:F1A1A50
. Honestly indexing in spreadsheets should be behind-the-scenes details, stuff the user shouldn’t have to deal with at all. I just want to select a range and runlookup
or whatever!I’d love to think about hierarchial spreadsheets of the future, but I’m tired now so I’ll end this post here bye
tagged: pl, hci Hierarchial spreadsheets and you (permalink) (tweet)
-
June 25, 2020.
Thoughts on readable data processing
tags: pl
TLDR: dynamic views as a composable zero-cost abstraction. Or: like SQL but easy to read. (gosh, I should get into writing paper titles)
the big idea
Late last night I thought of an idea for a data processing language. Consider the following problem:
"just a second" -> "JUST a second" "bone apple tea" -> "BONE apple tea" "capitalize the first word" -> "CAPITALIZE the first word"
In most languages (like Python or regex), this would require one or more of the following:
- find where to stop capitalizing at (via a loop or via string matching)
- working with indices up to that index
- use a loop to capitalize each letter
and the resulting performance is dependent on how you wrote that.
What if instead of doing any of that, we can naively operate on a specific view of the string, namely the first word:
# run toUpper() on each char of the first word toUpper(input.words.first.chars);
and have it behave exactly as if we wrote a tight loop in C? like
for (int i = 0; i < input.length; i++) { if (!isSpace(input[i])) { input[i] = toUpper(input[i]); } else break; }
I doubt this is a new idea, since my thoughts here are very inspired by the idea of dynamic views and by the J language’s composability. It’s like SQL, except queries are performant AND readable.
fleshing out the concept
To simplify things, let’s work with flat, nonempty integer arrays. So the interesting problems include things like taking k-maximum, or finding the cumulative sum.
1 6 2 3 4.maximum(2) == 4 6 1 6 2 3 4.cumulativeSum == 1 7 9 12 16
How would we express these using the idea of ‘operating on an specified view’? I would imagine something like
input.maximum(k) := input.sorted.last(k) input.cumulativeSum := input.prefixes.sum
Takeaways:
maximum
: applying something like.sorted.last(k)
should not actually perform a sort, but instead find k-maximum (like via min heap when k > 3).cumulativeSum
: clearly I’m cheating a bit, since I said we’re working with flat arrays, but takingprefixes
of a flat array sounds like a jagged 2D array. I’m thinking thatprefixes
is actually some special intermediate object that expects a folding operation (sum
) to be performed on the prefixes, and then uses that folding operation to produce the cumulative sum. It’s directly inspired by J’s\
operator. This reminds me of Clojure’s transducers…
zero-cost property
How do we best-effort ensure that this dynamic view abstraction is zero-cost? Meaning that everything we write using dynamic views should be equivalent (performance-wise) to writing a tight loop. Make indirection invisible after compilation.
For starters, every dynamic view should be managed at compile-time. Under the hood, the compiler should translate
input.first.add(2)
toinput[0] += 2
in C. How?I’m thinking that each hardcoded view, like
.first
, should (at compile-time) change the modified data to be the first element. This sounds obvious, but becomes non-trivial when we get things like.sorted
where the modified data is now the sorted version of the array..sorted
requires some indirection under the hood. Taking another page from J, we could use keep around permutation vector as indices, so10 30 40 20
would have the permutation vector0 3 1 2
, meaning “smallest element is at index 0, second smallest is at index 3, …” The compiler handles the rest.This ad-hoc method needs to be implemented for every dynamic view, and maybe there should be optimizations for common compositions like
.sorted.last
.composition property
This approach to dynamic views is cool, but it has one flaw: composition is non-trivial. There are actually two parts to this:
- Language-level composition: how do we make it easy to work with a view of a view of a view?
- Compiler-level composition: how to make that work perfectly under the hood?
Language-level composition
My favorite example here would be taking the integer mean of an integer array. This requires summing an array, finding its length, and integer dividing these two numbers.
How would you express that? In J, it would look like
(+/ % #) input
meaning(sum divide length) input
. So J has composition down – it has a neat way to compose operators (the fork combinator).How do we get dynamic views to be just as powerful? For reference, a tight loop in another language would look like:
int total = 0; for (int i = 0; i < input.length; i++) { total += input[i]; } mean = total / input.length;
Would it just be
input.mean := input.sum.divide(input.length)
? It sounds messy. This needs some thinking.compiler-level composition
I mentioned earlier that common compositions of primitives, like
.sorted.last
, should be identified and replaced with a more performant version. Idempotent functions like.sorted.sorted
should translate to.sorted
. Involutions like.reverse.reverse
should translate to no-op. These are hardcoded solutions corresponding to Special Combinations in J.But how do we make things like this work in general? Take
toUpper(input.words.first.chars)
from the first example, using strings. How the hell do we go from that to…?for (int i = 0; i < input.length; i++) { if (!isSpace(input[i])) { input[i] = toUpper(input[i]); } else break; }
My current thought for this is to make every view apply a layer of indirection, which will get resolved at compile time. So
.sorted
generates an array of permutation indices, which the compiler can traverse later to get the actual sorted list representation. (This approach is very amenable to extension, since all that’s needed to support new view operations is to define the indices.)In our
toUpper
example,.words
could be a list of indices that point to the start of each word..first
gets the first index of this list..chars
gets the list of indices for each letter. The compiler will have all this information available. The challenge is, how do we skip computing the unused information, like indices for things that are not the first word? I need to do more research.(side note: In J we would turn
notIsSpace "abc 123"
into a predicate array1 1 1 0 1 1 1
, which can be used by other operations to split into words, for example. In our example, I’d imagine we’d throw out everything after seeing the first 0. How do we avoid computing the remaining 1s? Where does the earlybreak
come in? This needs research.)practical data representations
I really want this dynamic view idea for more than just integer arrays. I’m thinking strings obviously, but also associative arrays, custom ADTs, json, bit fields, dataframes, file handles.
Well, one can’t possibly write them all, but what if there was an API that allowed writing this stuff at a low level? Like having a low level way to specify views as lists of indices. The easiest solution would be specifying operations that the compiler can use in the compile step, though this makes compilation dangerous (might allow arbitrary execution?).
closing thoughts
This was just a thought experiment for a small data-processing language that compiles down to a fast language like C. If it turns out to be a solvable problem, then I wouldn’t feel comfortable about the implications. For one, if these operations are easy to specify while being zero cost, then there’s no point in learning algorithms for e.g. finding k-maximum, since you would just use an equivalent formulation like
.sorted.last(k)
at zero cost!If we know all the fast algorithms, and this syntax saves you from handwriting speed-saving constructions (like constructing min heaps by hand) and doesn’t require importing a library, then why ever learn algorithms outside of academia?
If most data-processing is expressible via manipulation of dynamic views, then why ever learn other forms of declarative programming with high level concepts like filters and folds?
Thoughts like this keep a practical computer science educator up at night.
addendum: approximate algorithms
One thing this abstraction does not allow is approximate algorithms, which may be much much faster than the best algorithm both time- and space-wise, at the cost of not guaranteeing correctness.
Maybe approximate algorithms could be a library? Adding things like
approxSort
? Maybe we do need libraries after all.addendum: time-efficiency vs space-efficiency
All this time I kind of ignored space as a problem. In the interest of optimizing for time, I silently assumed constructing auxiliary data structures would not be a big issue. Which is not true. Allocations are not free, and ignoring allocation might make the time-efficient algorithms slower.
This will definitely have to be a user setting. Maybe optionally annotate files or LOC or individual functions to run with certain efficiency constraints, so the compiler picks the more space-efficient translation over the time-efficient translation. Maybe have different levels of efficiency annotations. Will need to think about this too.
tagged: pl Thoughts on readable data processing (permalink) (tweet)
-
May 27, 2020.
How to not consent to cookies
tags: satire, 100days
You know, I really appreciate the innovative feel of all those “This page uses cookies” banners, all featuring a huge button “Understood” or “Agree”. I marvel at my wealth of user choice when I see the lack of a “Disagree” button.
I don’t want to burden these wonderful sites, you know? Clicking that Agree button fires quite a lot of tracking requests to… whatever hundred trackers that were birthed into existence this week. And isn’t that a huge load on the network?
Not to mention that by having cookies, I’m basically forcing the site to cross-reference and identify me for ad targeting purposes, which must be an utterly exhausting task for some database server somewhere. Ever open a phone book?
I bet it would be good for “User Retention” too, if 40% of my very limited screen space wasn’t blocked by a banner. I’ve learned a thing or two in my web design courses, and I think blocking half the screen for people who don’t hit Agree counts is not good UX or something. Isn’t good UX something sites care about?
Gosh, it would be such a good thing to do for the web if we could just have Disagree buttons on these cookie banners.
The real Disagree button was the friends we made along the way
One day a tech code programmer friend linked me to the Kill Sticky Headers bookmarklet by Alisdair McDiarmid. A bookmarklet is like a bookmark on your browser, except clicking it runs code or opens some data URI rather than taking you to a bookmarked website. This bookmarklet basically runs this code:
[...document.querySelectorAll("body *")] .filter(elem => getComputedStyle(elem).position.match(/fixed|sticky/)) .forEach(elem => elem.parentNode.removeChild(elem));
meaning “outright delete every element that’s displayed with
position: fixed
orposition: sticky
”.Apparently this includes
- banners that follow you as you scroll down
- sidebars that follow you as you scroll down
- social media sharing buttons that follow you as you scroll down
- a lot of modals, like “Subscribe to my newsletter today” popups
- cookie consent banners that follow you as you scroll down
outright delete… cookie consent banners…
Ah, so the Disagree button was in the bookmarks bar all along!
I’m an emacs user at heart
Jokes aside, the above is a perfectly workable solution for disagreeing with cookie banners. You get your screen space without having to click Agree and triggering some JavaScript that does who-knows-what.
It’s actually not what I use today, because I no longer use a bookmarks bar. Nowadays I use TamperMonkey to bind
ctrl alt s (s for sticky) to run that code above:window.addEventListener("load", ()=>{ window.addEventListener("keydown", e=>{ if (e.ctrlKey && e.altKey && e.keyCode == 83) [...document.querySelectorAll("body *")] .filter(elem => getComputedStyle(elem).position.match(/fixed|sticky/)) .forEach(elem => elem.parentNode.removeChild(elem)); }); });
and it became second nature to tap
ctrl alt s to clear up some screen real estate.This is my Disagree button. This is how to not consent to cookies. Think of the network load you’re saving, and the phone book cookie database servers you’re liberating, and that precious “site quality” you’re optimizing. Save the web, folks.
sidenote: The only reason I don’t have this run by default is because it sometimes hides useful things. Like site navigation bars, or various floaty hamburger menus. Some sites use sticky elements to great effect. (example)
Now if you were really cool, you could extend
ctrl alt s to add the current domain to some blacklist, and delete sticky by default for that blacklist. Because really cool programmers spend some fraction of their time doing instead of reading blog posts all day. If only I was a really cool programmer…
I’m writing this as part of #100DaysToOffload. You can directly check out other participating blogs or take part yourself.
tagged: satire, 100days How to not consent to cookies (permalink) (tweet)
-
May 24, 2020.
Why even own a Raspberry Pi?
tags: linux, 100days
When I first got my Raspberry Pi, I was convinced it’ll be my personal smart home device. I was thinking of all the neat projects I’d do with the hardware bits. You know, the GPIO pins and the USB ports. Not to mention the onboard wifi card, which would save me from having unsightly wires taped around my apartment.
It’d be just like an Arduino but also Linux.
Software is the worst
Arduino is really easy because everything’s set up for you. You don’t need to worry about versions or support because you’re pretty sure that your Arduino supports Arduino code.
Not so with Raspberry Pi and Linux distros. As soon as you’re worrying about a Linux machine, you have a million moving parts that all depend on each other.
To get to the point, I spent weeks trying to figure out how to interface with the GPIO pins. There are a handful of GPIO python libraries. I think I tried a C library as well. None of them worked. I’m pretty sure it was a firmware problem, or something involving kernel extensions.
At that point, I gave up trying.
Having a personal Linux server is pretty great
So instead of a smart home toy, my personal Pi is just a general-purpose Linux server. I have a Syncthing relay running on it, as well as various game servers. It’s also a music library (which plays via Bluetooth) and a general-purpose database server. At some point it was running a Discord bot and a Phoenix site. If I wanted to, I could run a Pi-hole, or seed torrents, or do whatever you people do with a general-purpose headless Linux box.
Most importantly, it’s a nice playground for various Linux toying and misdeeds. A lot of my often-used commands.md notes were a direct result of Pi hacking.
It’s pretty great, even if only because I don’t need to rent any more Amazon EC2 boxes (for whenever I need a 24/7 Linux box).
Is it worth it?
No. Yes. My expectations for the Raspberry Pi weren’t exactly met, but I’ve gotten enough use and learning out of it to justify the $35 price tag.
I’m writing this as part of #100DaysToOffload. You can directly check out other participating blogs or take part yourself.
tagged: linux, 100days Why even own a Raspberry Pi? (permalink) (tweet)
-
May 20, 2020.
Karma-agnosticism
tags: 100days
Why do you post anything online?
To add your perspective? To tell someone they’re wrong? To meme?
To make someone happy?
To make yourself happy?
To gain karma?
Let’s talk about the last one.
Karma as motivator
I’ve always been iffy about this whole karma thing on link aggregator sites. (Reddit, Hacker News, lobste.rs)
The only personal value I’ve seen in it is feedback. Not good feedback, obviously, but probably enough feedback to judge how many eyeballs liked what they saw in your words.
But for me, karma became a motivator for posting itself – “I want to post in order to increase my internet points,” and I found myself optimizing for popularity. What better way to do that, than to present a divisive opinion, or dive into the nuances, and point out why OP is wrong! All so more people can say “ah yes very good point, OP is wrong!”. No, this doesn’t promote interesting discussion.
Just delete karma
Since I don’t really use karma as positive feedback, I decided to opt out, which meant removing all traces of karma from my perception. Sometime in January I wrote some CSS to remove all instances of popularity counts on Reddit, Hacker News, and Lobsters.
The result was pretty disarming at first.
As a result, I’ve spent a good while now being blissfully ignorant of the existence of karma counts. My findings so far:
- When someone is wrong on the internet, it makes more sense to DM them, rather than go the harmful route and call them out in a comment reply to get internet points. I admit to doing this in the past. Most people would appreciate a DM over having to publicly defend themselves, anyways.
- It is much easier to ask questions, since you won’t care about inevitably being downvoted by the big brains. You know, the people who can’t believe you don’t know (thing), just because public disbelief gets you internet points.
- Unpopular opinions (not the “unpopular opinions” that are actually popular) are easier to write.
- There’s a lot more motivation to praise the post or praise other comments or pose innocent, non-divisive opinions.
Life without karma
I don’t know if I am in general nicer on the internet from this, but it couldn’t have hurt.
When I read comments I’m looking for opinions to praise rather than opinions to take apart.
I’ll ask where the misconceptions come from rather than immediately pointing out the wrong things in a post and getting more internet points via people agreeing.
Posting is less about getting internet points by making provoking conversation points. It’s more about being understanding of how my posts impact the people I respond to.
This extends to more than just karma, but I’m going to call this “karma-agnosticism”.
"Think of calliagnosia as a kind of assisted maturity. It lets you do what you should: ignore the surface, so you can look deeper." - from the short story "Liking What You See", by Ted Chiang
I’m writing this as part of #100DaysToOffload. You can directly check out other participating blogs or take part yourself.
tagged: 100days Karma-agnosticism (permalink) (tweet)
-
May 19, 2020.
Dear ad networks
tags: rant, 100days
Dear ad networks,
Please, stop recommending me a hundred brands of lactase.
No, I’m not worried about qualifying for a mortgage,
and no, I don’t want more colorful, soul-sucking mobile games.
I’m not interested in things I’ve already got.
It’s almost like you’re training your network on the past.Have you heard of newsletter ads?
Well, I subscribe to some newsletters
solely for their curated ads.
Like ads for apps I may find useful.
Ads for books I might enjoy reading.
They introduce me to things I haven’t already got.
You could say they explore the world around my interests.
That’s the one thing training on the past can’t do.Cheers,
Marv
I’m writing this as part of #100DaysToOffload. You can directly check out other participating blogs or take part yourself.
tagged: rant, 100days Dear ad networks (permalink) (tweet)
-
April 30, 2020.
J is my calculator
tags: pl
I used to use
node
(javascript) as my CLI calculator. (I know, I know.) After realizing how silly I was for using such an inexpressive language, I tried to learncalc
andbc
. The next few I went through were:python
ghci
(Haskell)R
octave
(matlab equivalent)- and finally, J
I don’t have a good reason to settle with J. Maybe I’ll change it soon. But here’s my review of J, collected slowly over a few months of usage:
Pros:
- Hassle-free rationals: write
1%2r127+3r160
instead of1/((2/127)+(3/160))
, and get20320r701
() as a result. - No need to repeat an operator. Write
*/12 52 65 13 42
rather than12*52*65*13*42
- Built-in polynomial evaluation at multiple points. Write
_4 2 1 1 p. 1 2 3
to evaluate x³+x²+2x-4 at x={1,2,3}. - The idiom for convolution (polynomial multiplication) is
+//. @: (*/)
(take outer product*/
and sum+/
the antidiagonals/.
) - Any user defined function is an operator just like
+
- Hooks and forks make some expressions incredibly easy to write. Examples:
(+/%#) 2.252 2.284
evaluates to the mean of the list:2.268
2.268 (-,+) 0.016
evaluates to the interval2.252 2.284
(f,g,h) x
evaluates multiple functions on the same datax
(say, mean median range)- recent example for me:
(13.342-7.666)%%:+/18.140 14.378(*:@[*%)578 379
evaluates
Cons:
- You can only share with other J users (not like you’re going to share one-off calculations though)
- Need to get used to how every operator associates right without exception
- Have to understand how J evaluates things in order to refactor expressions
- Configuration is not obvious: e.g.
(9!:11)4
sets displayed floating point precision to 4. - Jagged or non-homogenous arrays requires you to learn the messy art of value boxing/unboxing
- Working in hex (or any base) is a hassle if you want hex output. You can input hex with
16bdeadbeef
for example, but all output is in decimal (unless you pass it through the hex-printing(4+#@$) }. 2&(3!:3)
or use the printf addon). - It’s hard to find the right operator for the job if you don’t know it by heart. It’s just like in mathematics. Languages like J suffer from lack of discoverability.
-
March 29, 2020.
OCR at your fingertips (part 3)
tags: tutorials, mac
In the real world, you’ll encounter text that isn’t very machine-readable. You’ll ask OCR to extract text, and it will fail miserably.
Using tesseract out of the box like
... | tesseract stdin stdout | ...
got me very okay results, but certainly not the near-perfect results I showed off in part 1. There are three different solutions to this, and they can be combined. Briefly, they are: preprocessing, postprocessing, and tesseract config. (Skip to the bottom for the resulting command.)Preprocessing
The best step you can take towards improving OCR accuracy is not giving it a noisy image in the first place.
One preprocessing “trick”, is just having more detailed (read: bigger) images. Apparently
tesseract
data is trained on 600 dpi (highly detailed) images, while Mac retina displays hover at around 200-300 dpi. In practical terms: if the OCR fails, just zoom in and take a bigger (and therefore less pixelated looking) screenshot.Otherwise, there are a huge number of tools for OCR image preprocessing. My favorites involve imagemagick, specifically the textcleaner script by Fred Weinhaus. My preprocessing consists of the following line:
... | textcleaner -g -e stretch -f 75 -o 10 -t 50 -s 1 - png:- | ...
I never remember what these flags are (though they are all explained well in the link above). What matters is that these flags have the following result:
The left image has a little bit of noisiness to it due to the white-outlined text and fuzzy font. OCR has less trouble with the right image, which is sharpened and black/white. The right image gets scanned without error.
So doing some preprocessing (before scanning the image with OCR) makes a huge difference in the result.
Postprocessing
Even with preprocessing, the output of OCR can be noisy, out of order, bizarrely spaced, etc.
One error that’s pretty common, in my experience, is the strange spacing. I solve this by removing leading and trailing spaces on each line. This can be fixed by inserting the following
sed
command into the pipeline from the last post.... | sed -Ee 's/^[[:space:]]+|[[:space:]]+$//g' | pbcopy
But most errors do not have a simple programmable fix like this. They often warrant manual corrections. Misspellings, random breaks in words, text out of order, smart quotes when you don’t want them, etc.
Because of this, I often postprocess the OCR text output manually. You can see some examples of this manual cleanup in part 1.
Changing the flags to
tesseract
By default,
tesseract
meanstesseract -l eng --psm 3 --oem 3 --dpi 300
, which affects:-l
: trained dataset(s) to use (e.g.-l eng+fra
), see part 2 for more on language datasets--psm
: what kind of text to expect from the image, seetesseract --help-psm
--oem
: underlying engine for tesseract, seetesseract --help-oem
--dpi
: density in dpi of the input image
Typically I add
--psm 6 --dpi 226
, for “6: Assume a single uniform block of text” and since my MacBook display has a density of 226 dpi.All together now
Using all of the above techniques for enhancing OCR results, you might arrive at something like this (assuming you’ve installed imagemagick and textcleaner):
PATH="/usr/local/bin/:$PATH" pngpaste - \ | textcleaner -g -e stretch -f 75 -o 10 -t 50 -s 1 - png:- \ | tesseract --psm 6 --dpi 226 stdin stdout \ | sed -Ee 's/^[[:space:]]+|[[:space:]]+$//g' \ | pbcopy osascript -e 'tell application "System Events" to keystroke "v" using {command down}'
This is what I’d bind to my
tagged: tutorials, mac OCR at your fingertips (part 3) (permalink) (tweet)cmd ctrl v as described back in part 2.
-
March 28, 2020.
OCR at your fingertips (part 2)
tags: tutorials, mac
Here’s a five minute tutorial on how to bind OCR capabilities to a shortcut on Mac. It’s really that simple!
High-level overview
The key insight is that those tasks become trivial if only I could take a screenshot, and magically paste it as text. Let’s develop this magic. The plan is:
- You hit
ctrl shift cmd 4 to save a portion of the screen in pasteboard - You hit a special paste shortcut that we create:
ctrl cmd v - Image in pasteboard is sent through OCR
- Resulting text is pasted onto the frontmost app
which all turn out to be doable using the following open-source packages:
Installing dependencies
With Homebrew:
brew install pngpaste tesseract
or manually install pngpaste and tesseract. Ensure that they are installed by running:
pngpaste -v
and
tesseract --version
New toys
Try it out! Hit
ctrl shift cmd 4 and grab a screenshot of some text. Then in a terminal, typepngpaste - | tesseract stdin stdout | pbcopy
This uses
pngpaste
to send the image to our OCR tool,tesseract
. The result is sent topbcopy
which places the resulting scanned text into our pasteboard, ready to be pasted. Try selecting this paragraph, running the above, and pasting the result into this text box:Hopefully the results are acceptable! Hint: The next article will deal with refining our OCR results.
If nothing happened, try running
pngpaste - | tesseract stdin stdout
in a terminal, with an image in your pasteboard. Most likely, you need to set up Tesseract with language data.The rise of automation
To truly get OCR to our fingertips, we’d like to run
pngpaste - | tesseract stdin stdout | pbcopy
at the touch of a button.Open up Automator.app (preinstalled on all Macs) and create a Service. Then drag in “Run Shell Script” from the left, and enter what we had above:
PATH="/usr/local/bin/:$PATH" pngpaste - | tesseract stdin stdout | pbcopy
(I prepended
/usr/local/bin/
toPATH
, since that is where Homebrew installspngpaste
andtesseract
for me.)Add this to the end if you want to automatically paste the result afterwards!
osascript -e 'tell application "System Events" to keystroke "v" using {command down}'
Important: At the top, select “Service receives no input in any application.” We use
pngpaste
for input, so Automator would otherwise complain about input.Here’s what you should end up with:
Save and give this service a name, like “Run OCR”.
Binding to a shortcut
ctrl cmd v After saving this service with a name (say, Run OCR), open up System Preferences > Keyboard > Shortcuts > Services, and scroll all the way down to find Run OCR. All that’s left to do is click on Run OCR to bind a shortcut. I use
ctrl cmd v .Usage: Try taking a screenshot of this paragraph with
ctrl shift cmd 4 , and use your shortcut in the text box below, as if you were pasting text. (That’s why I choosectrl cmd v – it’s almost like pasting).Feeling powerful yet?
Different languages
Something I do with OCR is translate comics that aren’t in English. For
tesseract
, this means you’d need to download language data to recognize languages that are not English. You can find this language data here. If you want to havetesseract
recognize Korean as well as English, for example, then download and movekor.traineddata
into the$TESSDATA_PREFIX
directory. Then change thetesseract
command like so:tesseract -l eng+kor stdin stdout
You can make the list as long as you want, like
eng+chi_sim+chi_tra+jpn+kor
. Be warned that runtime becomes noticeably long after more than three languages, at least in my experience.Wrap-up
With very little code, we’ve bound OCR to a hotkey. In the next post we’ll explore ways to get more accurate results with OCR.
tagged: tutorials, mac OCR at your fingertips (part 2) (permalink) (tweet)
- You hit
-
March 26, 2020.
OCR at your fingertips (part 1)
tags: tutorials, mac
What is OCR? Why should I care?
If you’re a student like me, taking online classes, then OCR is a lifesaver.
OCR (Optical Character Recognition) is a fancy way of saying “turn a picture of text, into normal text that you can copy and paste.” Sounds useful? It is! It makes notetaking from live slideshow lectures a breeze.
I find it really helpful to have OCR just as an everyday tool. Here are some situations where OCR saves me a lot of typing:
- Grabbing text from videos, slideshows, streams, video lectures:
- Grabbing text from a game application
- Translating images
- Copying one column from a web table
- Copy a numbered list including its numbers:
- Selecting code from tutorials without selecting prefixes, prompts, line numbers:
- Copying text in cases where I can’t select: (here, the text is on a badly designed button)
So even when I don’t really need it, OCR really saves me time and cognitive load.
Today, the role of OCR technology is mostly to scan documents in bulk, like turning books into ebooks. This means most OCR apps require quite a bit of setup and work. But with the automation capabilities on Mac, it’s really quick and easy to bring OCR technology to your fingertips, which you can see above! See this next post for a (very) short tutorial on how to actually accomplish this.
tagged: tutorials, mac OCR at your fingertips (part 1) (permalink) (tweet)
- Grabbing text from videos, slideshows, streams, video lectures:
-
February 7, 2020.
A cynical take on grades
tags: fiction, edu, rant
You’re studying hard the day before the midterm, though of course you’re not studying for tomorrow’s midterm.
After all, challenging yourself with learning matrix transformations and eigenvectors while taking LinAlg would be utterly stupid. Challenge means possibly making mistakes which always means an imperfect grade. It makes much more sense to learn all of LinAlg before taking LinAlg. The material’s all there – past recorded lectures and slides, or MIT OCW, or simply attending LinAlg office hours (even though you aren’t enrolled). Then when the time comes, LinAlg would pose absolutely no challenge. Easy A+. You wouldn’t have to learn anything during the course, much less attend it. No, you have bigger things to think about, such as next term’s courses.
So of course you aren’t studying for tomorrow’s midterm. You’ve already studied for it – last term.
Your friend tells you you’re wasting your money going to uni if you’re not benefiting from uni. You say in response that you’re not paying for an education, you’re paying for a degree.
tagged: fiction, edu, rant A cynical take on grades (permalink) (tweet)