The Impossible Optimization, and the Metaprogramming to Achieve It

67 points by melodyogonna 5 days ago

omnicognate 18 hours ago

The language here is Mojo, which the article seems to assume you know and doesn't say enough for you to deduce until half way through and after multiple code examples. I don't know how you're supposed to know this as even the blog it's on is mostly about Vale. From the intro I was expecting it to be about C++.

totalperspectiv 17 hours ago

The author works for Modular. He shared the write up on the Mojo Discord. I think Mojo users were the intended audience.

Xcelerate 19 hours ago

> Eliminate redundant matrix operations (like two transposes next to each other)

In 2016, I was trying to construct orthogonal irreducible matrix representations of various groups (“irreps”). The problem was that most of the papers describing how to construct these matrices used a recursive approach that depended on having already constructed the matrix elements of a lower dimensional irrep. Thus the irrep dimension n became quite an annoying parameter, and function calls were very slow because you had to construct the irrep for each new group element from the ground up on every single call.

I ended up using Julia’s @generated functions to dynamically create new versions of the matrix construction code for each distinct value of n for each type of group. So essentially it would generate “unrolled” code on the fly and then use LLVM to compile that a single time, after which all successive calls for a specific group and irrep dimension were extremely fast. Was really quite cool. The only downside was that you couldn’t generate very high dimensional irreps because LLVM would begin to struggle with the sheer volume of code it needed to compile, but for my project at the time that wasn’t much of a concern.

Archit3ch 16 hours ago

> Mojo, D, Nim, and Zig can do it, and C++ as of C++20. There are likely some other languages that can do it, but these are the only ones that can truly run normal run-time code at compile time

Pretty sure Julia can do it.

jburgy 15 hours ago

https://bur.gy/2022/05/27/what-makes-julia-delightful.html confirms your hunch

abeppu 11 hours ago

To tie this specific example to a larger framework: In scala land, Tiark Rompf's Lightweight Modular Staging system handled this class of metaprogramming elegantly, and the 'modular' part included support of multiple compilation targets. The idea was that one could incrementally define/extend DSLs that produce an IR, optimizations in that IR, and code generation for chunks of DSLs. Distinctions about stage are straight-forward type-signature changes. The worked example in this post is very similar to one of the tutorials for that system: https://scala-lms.github.io/tutorials/regex.html

Unfortunately, so far as I can tell:

- LMS has not been updated for years and never moved to scala 3. https://github.com/TiarkRompf/virtualization-lms-core

- LMS was written to also use "scala-virtualized" which is in a similar situation

There's a small project to attempt to support it with virtualization implemented in scala 3 macros, but it's missing some components: https://github.com/metareflection/scala3-lms?tab=readme-ov-f...

I'd love to see this fully working again.

aappleby 10 hours ago

I did something similar to this with C++ templates - it's Parsing Expression Grammar based, so not full regex, but enough for a lot of tasks:

  using sign      = Atoms<'+', '-'>;
  using digit     = Range<'0', '9'>;
  using onenine   = Range<'1', '9'>;
  using digits    = Some<digit>;
  using integer   = Seq<Opt<Atom<'-'>>, Oneof<Seq<onenine, digits>, digit>>;
  using fraction  = Seq<Atom<'.'>, digits>;
  using exponent  = Seq<Atoms<'e', 'E'>, Opt<sign>, digits>;
  using number    = Seq<integer, Opt<fraction>, Opt<exponent>>;

and I've confirmed that it does all get inlined and optimized on -O3.

JSON parser example here - https://github.com/aappleby/matcheroni/blob/main/examples/js...

lisper 10 hours ago

> can compilers really execute general code at compile-time?

Cue the smug Lisp weenies laughing quietly in the background.

cadamsdotcom 8 hours ago

Seems like an optimization that could be applied quite generally - as the author mentions at the end there’s lots of places this could be used.

The problem with applying this technique generally is the amount of code generated. But what if you can optimize that too.. perhaps share the common parts of the AST between the copies of the code that are generated, and overlay the changes with some datastructure.

taeric 17 hours ago

Gave me a smile to see the shout out to LISP in there.

Reading this take on it, it feels like a JIT compiler could also accomplish a fair bit of this? I'm also reminded of the way a lot of older programs would generate tables during build time. I'm assuming that is still fairly common?

pfdietz 15 hours ago

Yes, this is all straightforward with Lisp macros. Beyond that, you can call the compile function in Common Lisp and do all this at run time too.
BoingBoomTschak 14 hours ago

In fact, there's https://github.com/telekons/one-more-re-nightmare for CL.

SuperV1234 17 hours ago

https://github.com/hanickadot/compile-time-regular-expressio...

canucker2016 12 hours ago

FYI: This is a C++ template version of compile time regex class.
A 54:47 presentation at CppCon 2018 is worth more than a thousand words...
see https://www.youtube.com/watch?v=QM3W36COnE4
followup CppCon 2019 video at https://www.youtube.com/watch?v=8dKWdJzPwHw
As the above github repo mentions, more info at https://www.compile-time.re/

fragmede 18 hours ago

Depending on the userbase of the site, simply checking for @gmail.com at the end, I'd bet, would result in a quick win, as well as restricting the username's alphabet to allowed Gmail characters.

The other optimization I'd guess at would be to async/thread/process the checking before and after the @ symbol, so they can run in parallel (ish). Extra cpu time, but speed > CPU cycle count for this benchmark.

gus_massa 17 hours ago

[Rehashing an old comment]
In the math department, we had a Moodle the students in the first year of my university in Argentina.
When we started like 15 years ago, the emails of the students and TA were evenly split in 30% Gmail, 30% Yahoo!, 30% Hotmail and 10% others (very aproxímate numbers).
Now the students have like 80% Gmail, 10% Live/Outlook/Hotmail and 10% others/Yahoo. Some of the TA are much older, so perhaps "only" 50% use Gmail.
The difference is huge. I blame the mandatory gmail account for the cell phone.
So, checking only @gmail.com is too strict, but a first fast check for @gmail.com and later the complete regex may improve the speed a lot in the real word.
- bee_rider 14 hours ago
  
  Maybe I am old, but I like to keep as much communication as possible going through the university email. It just feels more official somehow.
embedding-shape 17 hours ago

Tell us you used to work at Google, without telling us.
"simply do X" is such a programmer fallacy at this point I'm surprised we don't have a catchy name for it yet, together with a XKCD for making the point extra clear.
- fragmede 12 hours ago
  
  Tell us you don't actually work with any Google engineers... blah blah blah
  The trope is "At Google we..." and then casually mention "violating" the CAP theorum with Spanner or something.
  It is simple, and I really do hope any first year CS student could extract a substring from a string. Have LLMs so atrophied our programming ability that extraction of a substring is considered evidence of a superior programmer?

spectraldrift 15 hours ago

Having never heard of mojo before, I found this article fascinating. It provides a great example of how a toy regex parser works and an excellent explanation of why vanilla regex tends to be slow. It also presents a novel solution: compiling the regex into regular code, which can then be optimized by the compiler.

convolvatron 10 hours ago

this is literally how 'lex' works. the one written in 1987 by Vern Paxson.
- jlokier 9 hours ago
  
  The original is 'lex', written in 1975 by Mike Lesk and Eric Schmidt.
  Yes, that Eric Schmidt, CEO of Google.
  1987 was the clone, 'flex' :-)
  It did "compiling the regex into regular code, which can then be optimized by the compiler" before the C programming language as we know it was created. I think 'lex' was compiling regex to C before the C language even had 'struct' types, 'printf' or 'malloc'.