<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Eshed Schacham</title>
    <description>Ramblings on code and life</description>
    <link>https://ashdnazg.github.io/</link>
    <atom:link href="https://ashdnazg.github.io/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Sat, 08 Nov 2025 18:43:52 +0000</pubDate>
    <lastBuildDate>Sat, 08 Nov 2025 18:43:52 +0000</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    <item>
          <title>Abusing the Harvard Architecture in Nand2Tetris Assembly</title>
          <description>&lt;p&gt;Every now and then, when one writes assembly code by hand, one notices opportunities for optimisation.&lt;/p&gt;

&lt;p&gt;Some are concise and elegant, it’s usually a good idea to use those.&lt;/p&gt;

&lt;p&gt;Some are complicated or convoluted, you might use them if the benefit they bring is significant.&lt;/p&gt;

&lt;p&gt;Then there are those who are outright ludicrous. You’d never use these in any production code, but if you think of one, you’re definitely going to jokingly suggest it to your coworkers just to see their horrified faces. Or maybe write about them in a blog post.&lt;/p&gt;

</description>
          <pubDate>Sat, 08 Nov 2025 00:00:00 +0000</pubDate>
          <link>https://ashdnazg.github.io/articles/25/Abusing-Harvard-Architecture</link>
          <guid isPermaLink="true">https://ashdnazg.github.io/articles/25/Abusing-Harvard-Architecture</guid>
          
          
        </item><item>
          <title>16-bit Fast Inverse Square Root</title>
          <description>&lt;p&gt;If you’re not familiar with the &lt;a href=&quot;https://en.wikipedia.org/wiki/Fast_inverse_square_root&quot;&gt;Fast Inverse Square Root&lt;/a&gt; algorithm, I thoroughly recommend reading about it before continuing. Not necessarily because the details are important, but because it’s such a mind-blowingly impressive hack.&lt;/p&gt;

&lt;p&gt;If you are already familiar with it, you might remember that it’s based on a magical constant (0x5F3759DF&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;), carefully chosen to minimise the algorithm’s relative error.&lt;/p&gt;

&lt;p&gt;This constant was chosen by smart people who reasoned about how the algorithm works and derived formulas for picking the best constant for the job. Using the same process, they also derived the constant for double precision (64-bit) floating point numbers - 0x5FE6EB50C7B537A9. If you’re into that kind of thing, you can read all about it in Matthew Robertson’s &lt;a href=&quot;https://mrober.io/papers/rsqrt.pdf&quot;&gt;A Brief History of InvSqrt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Nowadays, with 16-bit floating point numbers being all the rage, whether &lt;a href=&quot;https://en.wikipedia.org/wiki/Half-precision_floating-point_format&quot;&gt;half-precision&lt;/a&gt; or &lt;a href=&quot;https://en.wikipedia.org/wiki/Bfloat16_floating-point_format&quot;&gt;bfloat16&lt;/a&gt;, I wondered if anybody derived the magic number for these formats. A short search produced no results, so I figured I might just do it myself.&lt;/p&gt;

&lt;p&gt;Unfortunately, this late at night I’m incapable of following all the sophisticated math done by the aforementioned smart people, but I do have one advantage - there are only 2&lt;sup&gt;16&lt;/sup&gt; 16-bit floating point numbers&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; and only 2&lt;sup&gt;16&lt;/sup&gt; possible magic numbers, totalling just over 4 billion combinations. So let’s take an inspiration from one of my favourite articles - &lt;a href=&quot;https://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all&quot;&gt;There are Only Four Billion Floats–So Test Them All!&lt;/a&gt; - and simply test them all!&lt;/p&gt;

&lt;p&gt;With the &lt;a href=&quot;https://crates.io/crates/half&quot;&gt;half&lt;/a&gt; library giving me a pleasant interface in Rust for 16-bit floats, and the algorithm already existing, all that is left to do is to hack together two nested loops:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;half&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;num_traits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Switch which line is commented out to choose between half and bfloat16&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f16&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;half&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// type f16 = half::bf16;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;isqrt_half&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;magic&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f16&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THREE_HALVES&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f16&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;f16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;from_f32_const&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;f16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;from_f32_const&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.to_bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;magic&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.wrapping_sub&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;f16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;from_bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;THREE_HALVES&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_err&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;f16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ZERO&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;u16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;usize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;magic&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;u16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..=&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;u16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;u16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..=&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;u16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;f16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;from_bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.is_normal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;f16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ZERO&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;continue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;expected&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;f16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;from_f64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.to_f64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.sqrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.recip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;isqrt_half&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;magic&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;err&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;expected&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.abs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;expected&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;max_err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;magic&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;err&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;magic&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;constant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_err&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;.iter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;.enumerate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;.filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(|(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.is_normal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;.min_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(|(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.partial_cmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.unwrap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;.unwrap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Best magic: {constant:#04X}, Relative error: {}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Less than a minute later, we get the results: for the half-precision format, the best constant is &lt;strong&gt;0x59B7&lt;/strong&gt; and for the bfloat16 format, the constant is &lt;strong&gt;0x5F35&lt;/strong&gt;&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Let’s compare the new constants with the ones for 32- and 64-bit floats:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Format&lt;/th&gt;
      &lt;th&gt;Constant&lt;/th&gt;
      &lt;th&gt;Maximal Relative Error&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;half&lt;/td&gt;
      &lt;td&gt;0x59B7&lt;/td&gt;
      &lt;td&gt;0.0028362274&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;bfloat16&lt;/td&gt;
      &lt;td&gt;0x5F35&lt;/td&gt;
      &lt;td&gt;0.008483887&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;single (32-bit)&lt;/td&gt;
      &lt;td&gt;0x5F375A86&lt;/td&gt;
      &lt;td&gt;0.0017512378&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;double (64-bit)&lt;/td&gt;
      &lt;td&gt;0x5FE6EB50C7B537A9&lt;/td&gt;
      &lt;td&gt;0.0017511837&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;We see that the accuracy of the algorithm using the half format is much better than using bfloat16, likely due to the latter’s lower precision&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. But even with bfloat16, the maximal error is lower than 1%, a testament to the robustness of the algorithm.&lt;/p&gt;

&lt;p&gt;Is this useful for anything? Probably not, since the algorithm itself is more or less &lt;a href=&quot;https://en.wikipedia.org/wiki/Fast_inverse_square_root#Obsolescence&quot;&gt;obsolete&lt;/a&gt;. Nevertheless, these magical constants deserve to be put on the internet, so here they are.&lt;/p&gt;

&lt;hr /&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Later the constant 0x5F375A86 was shown to produce better results. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;We actually have even less than 2&lt;sup&gt;16&lt;/sup&gt;, since the algorithm is only defined for normal positive numbers, of which there are 30720 half precision floats and 32512 bfloat16s. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;When skipping the Newton-Raphson step, the constant for half is 0x59BB and the constant for bfloat16 is 0x5F37. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;bfloat16 has 7 bits of mantissa, while half has 10. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
          <pubDate>Sat, 23 Aug 2025 00:00:00 +0000</pubDate>
          <link>https://ashdnazg.github.io/articles/25/16-bit-Fast-Inverse-Square-Root</link>
          <guid isPermaLink="true">https://ashdnazg.github.io/articles/25/16-bit-Fast-Inverse-Square-Root</guid>
          
          
        </item><item>
          <title>Solving Super Six</title>
          <description>&lt;p&gt;Super Six (a.k.a Rio and plenty of other names) is not a good game, it’s repetitive, heavily luck dependent, and repetitive.
Nevertheless, it has two major qualities - it’s short and simple. These qualities translate into a much more interesting one - it’s solvable for two players!&lt;/p&gt;

&lt;p&gt;While the game itself is a mediocre subject for a blog post, the maths behind solving it are not only interesting themselves, but they can also be used to solve other two-player games.&lt;/p&gt;

</description>
          <pubDate>Fri, 02 Aug 2024 00:00:00 +0000</pubDate>
          <link>https://ashdnazg.github.io/articles/24/Solving-Super-Six</link>
          <guid isPermaLink="true">https://ashdnazg.github.io/articles/24/Solving-Super-Six</guid>
          
          
        </item><item>
          <title>Encoding tic-tac-toe in 13 bits</title>
          <description>&lt;p&gt;The other day I happened upon a post by &lt;a href=&quot;https://cbarrick.dev/&quot;&gt;Chris Barrick&lt;/a&gt; detailing how to &lt;a href=&quot;https://cbarrick.dev/posts/2024/02/19/tic-tac-toe&quot;&gt;encode a tic-tac-toe game state in 15 bits&lt;/a&gt;, itself a response to an earlier post by &lt;a href=&quot;https://github.com/blyxyas&quot;&gt;Alejandra González&lt;/a&gt; detailing how to &lt;a href=&quot;https://blog.goose.love/posts/tictactoe/&quot;&gt;encode a tic-tac-toe state in 18 bits&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In their posts, Chris and Alejandra mention 10 bits as the absolute minimum, since there are only 765 possible game states.
This number, however, takes mirroring and rotations into account&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, while Chris’ and Alejandra’s solutions do not. If you include mirrorings and rotations, the number of states &lt;a href=&quot;https://stackoverflow.com/a/25358690&quot;&gt;jumps to 5477&lt;/a&gt; that gives us a theoretical minimum of \(\left\lceil{\log_2{5477}}\right\rceil=\left\lceil{12.419...}\right\rceil=13\) bits.&lt;/p&gt;

&lt;p&gt;For physicists, the difference between 13 and 15 might look negligible, but hey, consuming 15% more space is no joke.&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The 765 figure considers the following two boards (and a few others) as the same state: &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
          <pubDate>Mon, 26 Feb 2024 00:00:00 +0000</pubDate>
          <link>https://ashdnazg.github.io/articles/24/Encoding-tic-tac-toe-in-13-bits</link>
          <guid isPermaLink="true">https://ashdnazg.github.io/articles/24/Encoding-tic-tac-toe-in-13-bits</guid>
          
          
        </item><item>
          <title>Finding Binary &amp; Decimal Palindromes</title>
          <description>&lt;p&gt;Recently I’ve had my 33rd birthday and since I’m a nerd I also checked the binary representation of that number, which is 100001.&lt;/p&gt;

&lt;p&gt;OMG, I thought to myself, 33 is a palindrome in both bases! I immediately checked when my next palindromtastic birthday would be:&lt;/p&gt;

&lt;p&gt;99&lt;/p&gt;

&lt;p&gt;That’s a tad over my life expectancy, but arguably reachable. After that, the numbers get big quite fast. Exponentially fast. The &lt;a href=&quot;https://oeis.org/&quot;&gt;On-Line Encyclopedia of Integer Sequences&lt;/a&gt; has &lt;a href=&quot;https://oeis.org/A007632&quot;&gt;an entry&lt;/a&gt; for the sequence of these numbers, with a list of the &lt;a href=&quot;https://oeis.org/A007632/b007632.txt&quot;&gt;147 known elements&lt;/a&gt; (at the time of writing this post). The largest known number is 9335388324586156026843333486206516854238835339 with 46 decimal and 153 binary digits. That’s quite large indeed.&lt;/p&gt;

&lt;p&gt;Life is short, let’s spend what little we have on finding even bigger palindromes!&lt;/p&gt;

</description>
          <pubDate>Sun, 15 May 2022 00:00:00 +0000</pubDate>
          <link>https://ashdnazg.github.io/articles/22/Finding-Really-Big-Palindromes</link>
          <guid isPermaLink="true">https://ashdnazg.github.io/articles/22/Finding-Really-Big-Palindromes</guid>
          
          
        </item><item>
          <title>Finding Evidence for GPL Violation with Ghidra and Friends</title>
          <description>&lt;p&gt;Let’s say you’ve worked a few years on an open source project released under the &lt;a href=&quot;https://www.gnu.org/licenses/quick-guide-gplv3.html&quot;&gt;GNU General Public License (GPL)&lt;/a&gt;
and let’s say you’ve spied an application that is eerily similar to your project but whose owner claims no infringement of your rights.&lt;br /&gt;
How would you find evidence for such infringement or lack thereof?&lt;br /&gt;
Obviously, I don’t know the answer to this question as it will vary depending on your case.
I can, however, tell what &lt;em&gt;I&lt;/em&gt; did when faced with this situation, which I hope you will find educational or even entertaining.
This post is somewhere between a write-up and a tutorial. Hopefully it is interesting as both.&lt;/p&gt;

&lt;p&gt;Hang on to your trousers, we’re going in!&lt;/p&gt;

</description>
          <pubDate>Sun, 14 Jul 2019 00:00:00 +0000</pubDate>
          <link>https://ashdnazg.github.io/articles/19/Finding-Evidence-for-GPL-Violation</link>
          <guid isPermaLink="true">https://ashdnazg.github.io/articles/19/Finding-Evidence-for-GPL-Violation</guid>
          
          
        </item>
  </channel>
</rss>
